Hugh Allen wrote:
Sorry - I have not followed this thread very closely, but I noticed you wrote:

"I saw (not surprisingly) that I had 2.6 million Strings in it (a majority of them 
_very_ small strings"

When you have lots of strings with the same value it is important to use String.intern(). Strings are handled in a very special way by the JVM, compared to other objects. Interned strings are kept in a pool and there is ONLY ONE INSTANCE FOR EACH VALUE. You can compare interned strings with == instead of .equals().


Finding out that one string was outside of being interned would really be painful.

String.intern() can reduce memory consumption dramatically.

> I am sure others in this group can elaborate on the mechanics.
>


http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html#intern()

" When the intern method is invoked, if the pool already contains a string equal to this String object as determined by the equals(Object) method, then the string from the pool is returned. Otherwise, this String object is added to the pool and a reference to this String object is returned."

I also vaguely recall == being less than guaranteed accross threads, but that may be a false recollection or out of date.

-andy

Hugh

At 05:53 PM 7/17/2005, Brent Verner wrote:


[2005-07-17 17:33] Richard O. Hammer said:
| I do not suspect the StringBuffer is a big part of your memory problem.

Yeah, StringBuffer doesn't look to be the culprit.  Not directly
anyway.  After doing an object count in the data structure, I saw
(not surprisingly) that I had 2.6 million Strings in it (a majority
of them _very_ small strings -- smaller than the String memory overhead
of 40 bytes).

| Rather, I guess that you are reading the entire 30 MB file into a single | data structure (perhaps some Collection which holds all your Records) | which will reside entirely in memory at once. Is that right?

Yes.  Iterating over the Records from RecordReader doesn't
cause the memory bloat, but I have to hold one of these structures
in memory to merge data from another similar export file to populate
a hibernate-persisted object graph...

| Why does your Field extend TreeMap? Do you need the functionality of | TreeMap for what appears to be a simple field? A TreeMap has to be a | large object I would think, and you are making possibly 220,000 of these.

Yeah, I made the it this way to simplify the generation of a
MD5 checksum in the Record containing the Field to determine if
a Record has been modified. This could be done by ordering by the keys of the Field when generating the checksum...

| Objects do not become garbage collectible until they are unreachable. | So if you are keeping a pointer to all your Records somewhere, then all | of the Fields remain reachable, and not eligible for garbage collection.

This I understand, but what amazed me was that 30MB of input
data could turn into a 300MB data structure :-)

| Thanks for mentioning hprof. This is the first I've heard of it. I | hope I've interpreted its output correctly.

I'm not sure I've interpreted it correctly...I read it that StringBuffer held 207M of referenced memory.

      b







_______________________________________________
Juglist mailing list
[email protected]
http://trijug.org/mailman/listinfo/juglist_trijug.org

Reply via email to