Re: [mkgmap-dev] splitter - performance and memory

Lambertus Sun, 19 Jul 2009 13:48:17 -0700

By all means go ahead. I think many people would welcome a Splitter thatcan handle the whole world. I think that Steve Ratcliffe also announcedthat he would be working on new version but he will no doubt respond.

If your expectation of less then 50% memory usage is correct then youshould be able to handle the world on your 6GB machine as North andSouth America (~2/3 of the data) can be split using 8GB ram with thecurrent splitter.


Chris Miller wrote:

First off, let me say a quick thanks to everyone involved in thisproject. I've only just discovered it recently but wish I had found itmuch earlier, it really is incredible to see what has been done so far.
I've downloaded the complete planet-090715.osm.bz2 file, and have beenlooking at splitting it. I read the description and limitations of thesplitter.jar tool but decided to give it a go anyway since I have a 64bit OS with 6GB ram. Unfortunately it still failed with a -Xmx5200mheap. I have a 16GB machine at work I could try it on that might work,however instead I decided to take a look at the source code to see ifthere's any possibility of reducing the memory requirements.
I've only spent a short time looking at the code, but as far as I cantell the whole first step (computing the areas.list file) is using farmore memory than it actually needs. The SplitIntMap (which is what takesup all the memory) isn't required here for two reasons. One is that thecode never retrieves entries via .get(), rather it only uses an iteratorso a list/array would suffice. Second, the node IDs aren't used in thisstage so we don't even need to parse them let alone hold on to them.Assuming we replace the SplitIntMap with a wrapper around an int[] (ormultiple int[] to mitigate the double-memory-on-copy problem), we'd belooking at memory savings of >50%.
Does that make sense or have I missed something? If it sounds sensibleI'd be happy to have a go at implementing it. Also, given the nature ofthe algorithm it wouldn't be too hard on performance if the lat+longvalues were written out to disk rather than held in memory which wouldmean splitting the whole dataset would be feasible even on a 32bit machine.
I haven't yet looked at possibilities for tuning the second step but Iassume that some sort of map/lookup is still required here. I figurethere's a few options - perform multiple passes processing a subset ofthe splits at a time (limited by the total number of nodes we can holdin memory), optimise the existing data structures further, page some outto disk, etc.
I was also thinking a little about performance. Given the enormous sizeof the full .osm file, I'd suggest a move away from SAX over to a pullparser (http://www.extreme.indiana.edu/xgws/xsoap/xpp/mxp1/index.html).It's even faster than SAX and uses very little memory. In my job we useit to parse many GB of XML daily with very good results. Another idea isto parallelise the code by running parts of the split on differentthreads to take advantage of multi-core CPUs. Possibly the best gainhere would be when writing the files since gzip compression is fairlyCPU intensive.
What do people think? I'm happy to work on the above though I mustconfess up front that my spare time is quite limited so please don'texpect too much too soon!
Chris



_______________________________________________
mkgmap-dev mailing list
[email protected]
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


_______________________________________________
mkgmap-dev mailing list
[email protected]
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Re: [mkgmap-dev] splitter - performance and memory

Reply via email to