Hi Gerd,
1) always release memory for mdr19, not only if it was filled for the device. 2) check if sort key has zero length, if yes, don't allocate new buffer 3) don't use MultiSortKey for mdr7 if --x-split-name-index is not used 4) count occurences of the generated key strings, cache sort keys for those keys which occur more than 3 times. 5) create smaller copy of byte array if number of allocated bytes was too high @Steve: Please review, I think 1-4 are okay, 5) may cause problems if my understanding of Sort.fillkey() is wrong.
Looks OK. I agree to add cache in Mdr7, don't know why I omitted it to begin with. But does the complex counted cache improve memory use over the simple case? As the cache is temporary, after the routine returns there will be a greater memory use from all the keys that are not de-duplicated. It seems to me that the temporary memory required during preWriteImpl() would also be larger unless there are a vast number of single use keys.
I was also surprised to see that we require such long sort keys, even with --latin1. I see many 0 bytes in the created keys, maybe that can be optimized further?
Its possible, see for example: http://www.unicode.org/reports/tr10/#Implementation_Notes As I was working out how the Srt files worked, I gradually came to realise that it was pretty much the same way that Java does collation. I changed the syntax of the resource/sort/*.txt files to more closely match the language that is used by RuleBasedCollator(). There is also the icu4j project http://site.icu-project.org/ I can't remember how close either of them were to working as needed for mkgmap. In the end my code was faster, so I stayed with it. Its possible that the built in java code could be made to work and it might be faster now, or perhaps I missed a trick to make it so at the time. ..Steve _______________________________________________ mkgmap-dev mailing list [email protected] http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
