Am 31.03.2010 22:10, schrieb Scott A Crosby:
On Wed, 31 Mar 2010 21:13:49 +0200, WanMil<wmgc...@web.de>  writes:

I noticed that mkgmap does not intern any strings. In particular, this
tile, generated by the splitter, fails to build with -Xmx3000m on
64-bit jdk under linux. With my patch, mkgmap generates the tile with
-Xmx1000m.

      <bounds minlat='55.1953125' minlon='9.4921875' maxlat='56.6015625'
      maxlon='11.513671875'/>

This tile has 1m nodes. Among the nodes and ways on this tile, there
are 12m tags, yet only 100k distinct tag key/value pairs; on average
each value occurs 120 times.

I explicitly do not use normal string interning because
String.intern() strings are kept forever, and I want these strings to
be GC'able after the tile is done. I trade GCability for having the
occasional string duplicated in memory by flushing the interning table
every 10k unique strings.

This code is not presently multithread safe; Ideally there should be
one string interning table for each parser/thread.

Scott


Hi Scott!

I think that's a good idea to intern the strings.
As far as I know the LossyIntern class is not needed. The .intern()
function of a string does exactly the same.

You are right. String intern does not intern forever at least since
Java 1.2.

Some time ago I sent a very similar patch to the mailing list which
is not yet committed. Could you please test with your use case if it
performs a similar memory reduction?

You can run it if you want, but from the numbers I gave above for this
tile, interning values as in my patch will decrease the number of
strings in RAM from 12M to<100k values. Interning only keys would
reduce the number of Strings in RAM from 24M to 12M.


The patch is thread safe and does not intern all strings. In my
opinion the value of a name tag should not be interned because there
is a high probability that this tag is used once only.

Thats probably true for many or most tiles, but not for the tile I
referenced above, where on average each value occurs 120 times. That
tile is unbuildable with a 3gb heap without my patch and buildable
with 1gb heap with my patch.

Shall I post an updated patch without FuzzyIntern?

Scott

Scott,

my patch interned all keys and additionally the values of a limited number of keys. Maybe it's not necessary to limit the interning of values. So I have attached the very simple but hopefully very effective patch regarding the memory footprint of mkgmap.

Regarding your patch: I don't understand the function of the FuzzyIntern class. You build a HashMap from (uninterned) Strings to the interned String. Then you are looking up new strings in this HashMap and use the interned variant. Where's the difference to the (hopefully) very performance optimized intern() method?

> String intern does not intern forever
I didn't know that. Do you have any link where this is specified?

WanMil


Index: src/uk/me/parabola/mkgmap/reader/osm/Tags.java
===================================================================
--- src/uk/me/parabola/mkgmap/reader/osm/Tags.java      (revision 1624)
+++ src/uk/me/parabola/mkgmap/reader/osm/Tags.java      (working copy)
@@ -65,12 +65,14 @@
                Integer ind = keyPos(key);
                if (ind == null)
                        assert false : "keyPos(" + key + ") returns null - size 
= " + size + ", capacity = " + capacity;
-               keys[ind] = key;
+               // use .intern() to reduce memory footprint
+               keys[ind] = key.intern();
 
                String old = values[ind];
                if (old == null)
                        size++;
-               values[ind] = value;
+               
+               values[ind] = value.intern();
 
                return old;
        }
_______________________________________________
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Reply via email to