I noticed that mkgmap does not intern any strings. In particular, this
tile, generated by the splitter, fails to build with -Xmx3000m on
64-bit jdk under linux. With my patch, mkgmap generates the tile with
-Xmx1000m.

     <bounds minlat='55.1953125' minlon='9.4921875' maxlat='56.6015625'
     maxlon='11.513671875'/>

This tile has 1m nodes. Among the nodes and ways on this tile, there
are 12m tags, yet only 100k distinct tag key/value pairs; on average
each value occurs 120 times.

I explicitly do not use normal string interning because
String.intern() strings are kept forever, and I want these strings to
be GC'able after the tile is done. I trade GCability for having the
occasional string duplicated in memory by flushing the interning table
every 10k unique strings.

This code is not presently multithread safe; Ideally there should be
one string interning table for each parser/thread.

Scott


Hi Scott!

I think that's a good idea to intern the strings.
As far as I know the LossyIntern class is not needed. The .intern() function of a string does exactly the same.

Some time ago I sent a very similar patch to the mailing list which is not yet committed. Could you please test with your use case if it performs a similar memory reduction?

The patch is thread safe and does not intern all strings. In my opinion the value of a name tag should not be interned because there is a high probability that this tag is used once only.

WanMil
Index: src/uk/me/parabola/mkgmap/reader/osm/Tags.java
===================================================================
--- src/uk/me/parabola/mkgmap/reader/osm/Tags.java      (revision 1566)
+++ src/uk/me/parabola/mkgmap/reader/osm/Tags.java      (working copy)
@@ -19,6 +19,7 @@
 import java.util.AbstractMap;
 import java.util.Arrays;
 import java.util.HashMap;
+import java.util.HashSet;
 import java.util.Iterator;
 import java.util.Map;
 
@@ -45,6 +46,18 @@
 
        private String[] keys;
        private String[] values;
+       
+       /** 
+        * Stores all tags which values should be stored as String intern. The 
values of
+        * these tags should have a limited number of different values to get a 
+        * reasonable memory footprint effect.
+        */
+       private final static HashSet<String> interableValueTags = new 
HashSet<String>(
+                       Arrays.asList("highway", "building", 
"addr:housenumber", "access",
+                               "natural", "waterway", "amenity", "oneway", 
"surface",
+                               "landuse", "lanes", "place", "layer", 
"tracktype", "maxspeed",
+                               "foot", "bridge", "height", "area", "railway", 
"admin_level",
+                               "power", "type", "leisure", "barrier"));
 
        public Tags() {
                keys = new String[INIT_SIZE];
@@ -65,11 +78,19 @@
                Integer ind = keyPos(key);
                if (ind == null)
                        assert false : "keyPos(" + key + ") returns null - size 
= " + size + ", capacity = " + capacity;
-               keys[ind] = key;
+               // use .intern() to reduce memory footprint
+               keys[ind] = key.intern();
 
                String old = values[ind];
                if (old == null)
                        size++;
+               
+               if (interableValueTags.contains(key)) {
+                       // use .intern() to reduce memory footprint for the most
+                       // common tags with a limited range of values
+                       value = value.intern();
+               }
+               
                values[ind] = value;
 
                return old;
_______________________________________________
mkgmap-dev mailing list
[email protected]
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Reply via email to