Re: [CODE4LIB] Handling non-Unicode characters (was: Unicode persistence)

stuart yeates Mon, 03 May 2010 13:28:19 -0700

Jonathan Rochkind wrote:

Hmm, you could theoretically assign chars in the private unicode area tothe chars you need -- but then have your application replace those charsby small images on rendering/display.
This seems as clean a solution as you are likely to find. Your TEIsolution still requires chars-as-images for these unusual chars, right?So this is no better with regard to copying-and-pasting, browserdisplay, and general interoperability than your TEI solution, but noworse either -- it's pretty much the same thing. But it may be better interms of those considerations for chars that actually ARE currentlyunicode codepoints.

I think you misunderstand the TEI option that we're using. The TEIoption gives us a full abstraction of the novel glyphs, includingabstract names, etc. Even without the images, the TEI is readable /maintainable / manipulatable.

If any of your "private" chars later become non-private unicodecodepoints, you could always globally replace your private codepointswith the new standard ones.
With 137K "private codepoints" available, you _probably_ wouldn't runout.

That's the the same order of magnitude of characters as appear in alarge novel. If you have a bunch of hand-written novel-length works andyou're not 100% sure of the boundary between the glyphs for one letterand the glyphs for another, there won't be enough unicode private usepoints to encode them, but the TEI approach has no problem.

Actually, the TEI approach has several different ways of dealing withthis kind of problem, all of which scale very nicely in my experience,so it's probably best to ask for advise on the TEI mailing list ifyou're faced with a problem like this.


cheers
stuart
--
Stuart Yeates
http://www.nzetc.org/       New Zealand Electronic Text Centre
http://researcharchive.vuw.ac.nz/     Institutional Repository

Re: [CODE4LIB] Handling non-Unicode characters (was: Unicode persistence)

Reply via email to