Thank you for hints on where to get more information. :)  I will, though my
question was about the situation with Debian stuff.

Unicode character is fine if the output file is in Unicode.  Is the output of
debiandoc2* are in Unicode?  I believe the problem *you* encountered with
processing Russian translation has origins in fact that the output files are
not in Unicode.  For example, if we translate dselect-beginner.ru.sgml into
HTML format, we get a plain text file that has `Content-Type; text/html;
charset=koi8-r' at the very beginning.  All © in source file will appear
as 8-bit characters since we have

    <!ENTITY copy CDATA "&#169">

For all charset that have (C) symbol for code 169, the output will look fine.
Then, when you try to process the latex output from debiandoc2latex, you get a
lot of errors since in cyrillic font there is no symbol with code 169.

So the question is: what to do?

> > These are different definitions and while in the second case I could process
> > this SDATA [copy  ] for producing &copy; in HTML output and \copyright in 
> > TeX
> > output, I lack this possibility in first case.
> 
> Why do you say that?  As far as I am aware there are TeX packages that can 
> handle Unicode.
The one we use for making the documentation from DebianDoc DTD is Unicode
aware?  And do we really supply it with Unicode file?

> Well, basically, the SDATA mappings are entirely arbitrary.
> Therefore, for the standard entity-sets which I have shipped with the
> sgml-data package, I use the Unicode entity mappings, which is handled
> fine by advanced browsers and the SGML tool-chain (nsgmls, jade, etc).
Does nsgmls have to be compiled in multi-byte mode for being Unicode aware or
not?  If yes, is it as of sp 1.3.3-1.2.1-7?

> I definately am willing to ship an alternate SDATA style entity sets
> for SGML (XML requires the Unicode ones).  I suppose either I could
> use a different FPI for that, or else I could even use SGML "marked
> sections" and a conditional parameter (i.e., use 'nsgmls
> -iuse-sdata-entities ...') to switch between whatever representation
> of entities you might want.  In either case, the default, IMHO, should
> be the Unicode representation.
Why?  I believe (I have not checked that yet) this should break sgml-tools
package (yes, yes sgml-tools v1).  It makes use of SDATA entities for
producing proper output.

Actually, I have only a practical aim in mind: to make Russian documents
correct.  So how to make &copy; look (C) in all versions of
dselect-beginer.ru? &smile;

--
Mike

Reply via email to