Hi Lars, While it's not solving exactly the same problem as you, the mkgmap splitter utility is faced with similar challenges. It is written in Java and uses various techniques to reduce the amount of memory required while processing the planet osm. I've spent quite a bit of time profiling and tuning it, so hopefully there are some ideas (or code) in there that can help you out. For example there are some custom collection-like classes for efficiently holding primitives, bit-level storage of data, and conditional use of different data structures depending on whether a common case or a uncommon case is encountered. Quite a bit of effort has also been put in to avoiding unnecessary object construction. Additionally, I checked in an update yesterday that creates a disk cache after parsing the planet file for the first time. After that it reads from this cache rather than making multiple passes over the planet XML file.
My suggestion is that you try doing something similar; make one pass over the XML that writes out the data to a custom binary format. Then you'll be able to make multiple passes over the data much more quickly, processing a subset of the data each time. You can choose an appropriate sized subset of the data depending on how much you want to trade off speed vs performance (that's exactly what the --max-areas parameter does with the splitter). You can grab the splitter from here if you want to take a look: http://www.mkgmap.org.uk/page/tile-splitter I've also worked on other similar problems at my job where I've used in-memory compression of data to greatly reduce the RAM required. This approach depends a lot on being able to find a good way to exploit any redundancy in the particular data you're working with. I'm happy to discuss this further with you offline if you like. Chris _______________________________________________ dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/dev

