Thank you for hints on where to get more information. :) I will, though my
question was about the situation with Debian stuff.
Unicode character is fine if the output file is in Unicode. Is the output of
debiandoc2* are in Unicode? I believe the problem *you* encountered with
processing Russian translation has origins in fact that the output files are
not in Unicode. For example, if we translate dselect-beginner.ru.sgml into
HTML format, we get a plain text file that has `Content-Type; text/html;
charset=koi8-r' at the very beginning. All © in source file will appear
as 8-bit characters since we have
<!ENTITY copy CDATA "©">
For all charset that have (C) symbol for code 169, the output will look fine.
Then, when you try to process the latex output from debiandoc2latex, you get a
lot of errors since in cyrillic font there is no symbol with code 169.
So the question is: what to do?
> > These are different definitions and while in the second case I could process
> > this SDATA [copy ] for producing © in HTML output and \copyright in
> > TeX
> > output, I lack this possibility in first case.
>
> Why do you say that? As far as I am aware there are TeX packages that can
> handle Unicode.
The one we use for making the documentation from DebianDoc DTD is Unicode
aware? And do we really supply it with Unicode file?
> Well, basically, the SDATA mappings are entirely arbitrary.
> Therefore, for the standard entity-sets which I have shipped with the
> sgml-data package, I use the Unicode entity mappings, which is handled
> fine by advanced browsers and the SGML tool-chain (nsgmls, jade, etc).
Does nsgmls have to be compiled in multi-byte mode for being Unicode aware or
not? If yes, is it as of sp 1.3.3-1.2.1-7?
> I definately am willing to ship an alternate SDATA style entity sets
> for SGML (XML requires the Unicode ones). I suppose either I could
> use a different FPI for that, or else I could even use SGML "marked
> sections" and a conditional parameter (i.e., use 'nsgmls
> -iuse-sdata-entities ...') to switch between whatever representation
> of entities you might want. In either case, the default, IMHO, should
> be the Unicode representation.
Why? I believe (I have not checked that yet) this should break sgml-tools
package (yes, yes sgml-tools v1). It makes use of SDATA entities for
producing proper output.
Actually, I have only a practical aim in mind: to make Russian documents
correct. So how to make © look (C) in all versions of
dselect-beginer.ru? ⌣
--
Mike