On Fri, Aug 14, 2009 at 3:16 PM, Andy Allan<[email protected]> wrote: > On Fri, Aug 14, 2009 at 12:54 PM, Frederik Ramm<[email protected]> wrote: >> Hi, >> >> Frederik Ramm wrote: >>> The result file should have been something like 400 bytes. This sounds >>> trivial but in the original case where the .osc contained a large number >>> of these characters, I suddenly had 2 MB of data in one tag. >> >> I forgot to mention: I'm posting this here on dev and not on the osmosis >> list because it seems that other (at least Java) programs are also >> affected; someone fixed then node later with a commit comment of "JOSM >> says string too long" or so... > > The code points for these gothic characters are fine. See the > following (awesome) site: > > http://decodeunicode.org/en/gothic > > A rough transliteration is HEJSPANOA. However, they lie outside the > Basic Multilingual Plane (BMP) and can't be represented by a 16bit > integer. Java stores characters internally as 16-bit UCS-2 characters > and so everything is going horribly wrong.
Installing an SMP-aware font shows what JOSM is doing more easily than reading Unicode code-points. http://code2000.net/code2001.htm I'll keep my (horrid) transliterations going here for the sake of everyone else. v31 - HEJSPANOA v32 - HHEHEJHEJSHEJSPHEJSPAHEJSPANHEJSPANOHEJSPANOA i.e. the first letter, the first two letters, the first three letters etc. I can see how you can quickly end up with a 2MB tag using this encoding scheme! Cheers, Andy _______________________________________________ dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/dev

