Hugh Allen wrote:
Sorry - I have not followed this thread very closely, but I noticed you wrote:
"I saw (not surprisingly) that I had 2.6 million Strings in it (a majority of them
_very_ small strings"
When you have lots of strings with the same value it is important to use String.intern(). Strings are handled in a very special way by the JVM, compared to other objects. Interned strings are kept in a pool and there is ONLY ONE INSTANCE FOR EACH VALUE.
You can compare interned strings with == instead of .equals().
Finding out that one string was outside of being interned would really
be painful.
String.intern() can reduce memory consumption dramatically.
> I am sure others in this group can elaborate on the mechanics.
>
http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html#intern()
" When the intern method is invoked, if the pool already contains a
string equal to this String object as determined by the equals(Object)
method, then the string from the pool is returned. Otherwise, this
String object is added to the pool and a reference to this String object
is returned."
I also vaguely recall == being less than guaranteed accross threads, but
that may be a false recollection or out of date.
-andy
Hugh
At 05:53 PM 7/17/2005, Brent Verner wrote:
[2005-07-17 17:33] Richard O. Hammer said:
| I do not suspect the StringBuffer is a big part of your memory problem.
Yeah, StringBuffer doesn't look to be the culprit. Not directly
anyway. After doing an object count in the data structure, I saw
(not surprisingly) that I had 2.6 million Strings in it (a majority
of them _very_ small strings -- smaller than the String memory overhead
of 40 bytes).
| Rather, I guess that you are reading the entire 30 MB file into a single
| data structure (perhaps some Collection which holds all your Records)
| which will reside entirely in memory at once. Is that right?
Yes. Iterating over the Records from RecordReader doesn't
cause the memory bloat, but I have to hold one of these structures
in memory to merge data from another similar export file to populate
a hibernate-persisted object graph...
| Why does your Field extend TreeMap? Do you need the functionality of
| TreeMap for what appears to be a simple field? A TreeMap has to be a
| large object I would think, and you are making possibly 220,000 of these.
Yeah, I made the it this way to simplify the generation of a
MD5 checksum in the Record containing the Field to determine if
a Record has been modified. This could be done by ordering by
the keys of the Field when generating the checksum...
| Objects do not become garbage collectible until they are unreachable.
| So if you are keeping a pointer to all your Records somewhere, then all
| of the Fields remain reachable, and not eligible for garbage collection.
This I understand, but what amazed me was that 30MB of input
data could turn into a 300MB data structure :-)
| Thanks for mentioning hprof. This is the first I've heard of it. I
| hope I've interpreted its output correctly.
I'm not sure I've interpreted it correctly...I read it that
StringBuffer held 207M of referenced memory.
b
_______________________________________________
Juglist mailing list
[email protected]
http://trijug.org/mailman/listinfo/juglist_trijug.org