Sorry - I have not followed this thread very closely, but I noticed you wrote:
"I saw (not surprisingly) that I had 2.6 million Strings in it (a majority of them _very_ small strings" When you have lots of strings with the same value it is important to use String.intern(). Strings are handled in a very special way by the JVM, compared to other objects. Interned strings are kept in a pool and there is ONLY ONE INSTANCE FOR EACH VALUE. You can compare interned strings with == instead of .equals(). String.intern() can reduce memory consumption dramatically. I am sure others in this group can elaborate on the mechanics. Hugh At 05:53 PM 7/17/2005, Brent Verner wrote: >[2005-07-17 17:33] Richard O. Hammer said: >| I do not suspect the StringBuffer is a big part of your memory problem. > > Yeah, StringBuffer doesn't look to be the culprit. Not directly >anyway. After doing an object count in the data structure, I saw >(not surprisingly) that I had 2.6 million Strings in it (a majority >of them _very_ small strings -- smaller than the String memory overhead >of 40 bytes). > >| Rather, I guess that you are reading the entire 30 MB file into a single >| data structure (perhaps some Collection which holds all your Records) >| which will reside entirely in memory at once. Is that right? > > Yes. Iterating over the Records from RecordReader doesn't >cause the memory bloat, but I have to hold one of these structures >in memory to merge data from another similar export file to populate >a hibernate-persisted object graph... > >| Why does your Field extend TreeMap? Do you need the functionality of >| TreeMap for what appears to be a simple field? A TreeMap has to be a >| large object I would think, and you are making possibly 220,000 of these. > > Yeah, I made the it this way to simplify the generation of a >MD5 checksum in the Record containing the Field to determine if >a Record has been modified. This could be done by ordering by >the keys of the Field when generating the checksum... > >| Objects do not become garbage collectible until they are unreachable. >| So if you are keeping a pointer to all your Records somewhere, then all >| of the Fields remain reachable, and not eligible for garbage collection. > > This I understand, but what amazed me was that 30MB of input >data could turn into a 300MB data structure :-) > >| Thanks for mentioning hprof. This is the first I've heard of it. I >| hope I've interpreted its output correctly. > > I'm not sure I've interpreted it correctly...I read it that >StringBuffer held 207M of referenced memory. > > b -- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.323 / Virus Database: 267.9.0/50 - Release Date: 7/16/2005 _______________________________________________ Juglist mailing list [email protected] http://trijug.org/mailman/listinfo/juglist_trijug.org
