Sorry - I have not followed this thread very closely, but I noticed you wrote:

"I saw (not surprisingly) that I had 2.6 million Strings in it (a majority of 
them _very_ small strings"

When you have lots of strings with the same value it is important to use 
String.intern(). Strings are handled in a very special way by the JVM, compared 
to other objects. Interned strings are kept in a pool and there is ONLY ONE 
INSTANCE FOR EACH VALUE. You can compare interned strings with == instead of 
.equals().

String.intern() can reduce memory consumption dramatically.

I am sure others in this group can elaborate on the mechanics.

Hugh

At 05:53 PM 7/17/2005, Brent Verner wrote:

>[2005-07-17 17:33] Richard O. Hammer said:
>| I do not suspect the StringBuffer is a big part of your memory problem.
>
>  Yeah, StringBuffer doesn't look to be the culprit.  Not directly
>anyway.  After doing an object count in the data structure, I saw
>(not surprisingly) that I had 2.6 million Strings in it (a majority
>of them _very_ small strings -- smaller than the String memory overhead
>of 40 bytes).
>
>| Rather, I guess that you are reading the entire 30 MB file into a single 
>| data structure (perhaps some Collection which holds all your Records) 
>| which will reside entirely in memory at once.  Is that right?
>
>  Yes.  Iterating over the Records from RecordReader doesn't
>cause the memory bloat, but I have to hold one of these structures
>in memory to merge data from another similar export file to populate
>a hibernate-persisted object graph...
>
>| Why does your Field extend TreeMap?  Do you need the functionality of 
>| TreeMap for what appears to be a simple field?  A TreeMap has to be a 
>| large object I would think, and you are making possibly 220,000 of these.
>
>  Yeah, I made the it this way to simplify the generation of a
>MD5 checksum in the Record containing the Field to determine if
>a Record has been modified.  This could be done by ordering by 
>the keys of the Field when generating the checksum...
>
>| Objects do not become garbage collectible until they are unreachable. 
>| So if you are keeping a pointer to all your Records somewhere, then all 
>| of the Fields remain reachable, and not eligible for garbage collection.
>
>  This I understand, but what amazed me was that 30MB of input
>data could turn into a 300MB data structure :-)
>
>| Thanks for mentioning hprof.  This is the first I've heard of it.  I 
>| hope I've interpreted its output correctly.
>
>  I'm not sure I've interpreted it correctly...I read it that 
>StringBuffer held 207M of referenced memory.
>
>        b



-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.323 / Virus Database: 267.9.0/50 - Release Date: 7/16/2005



_______________________________________________
Juglist mailing list
[email protected]
http://trijug.org/mailman/listinfo/juglist_trijug.org

Reply via email to