Re: A small question

Michael Sobolev 2 Jul 1999 16:54:42 -0000

On Fri, Jul 02, 1999 at 11:52:56AM -0400, Adam Di Carlo wrote:
> >You see, the construct \|...\| can be easily cought since it's a special 
> >thing
> >(`\' in input will be escaped with \ giving \\ in output).  Well, in case of
> >SDATA-entities, I see how to make use of them.
> 
> I don't see why \|...\| just as easily as ╘.  They are both unique!
> Furthermore, if we can get the charset of the debiandoc char stream
> sorted out, you can hook up *standard*, already written tools to go
> from one char set to another.
Hmm...  It looks I just did not make it clear.  Well, I stated that the output
stream is in unknown character set (that is, CDATA is just copied to output),
this means that the 8-bit code 169 stands for unknown symbol: if we knew that
this is iso-8859-1, then it's (C), if it's koi8-r it's '_|'.  If we find a way
for making sure that output is in UCS-2, UCS-4, UTF* or other encoding that
permit to have a lot of symbols from different languages, then yes, processing
\|...\| is as easy as ╘, but we have a stream of 8-bit characters of unknown
charset, so we have nothing but to create an external logic (like everything
that starts with \ has special meaning) for distinguishing what we need.


> >I am sorry to say that the freshly downloaded and unpacked in a separate
> >directory sgml-data package has ISO* files that define SDATA-entities.
> 
> Yes indeed.  This inconsistency seems to be a bug.
OK.  Should I file it?

> >Well, and now returning to `stock' SGML entities.  copy, and certain other
> >entities (like nbsp, for example) are from ISOnum, while in sgml-data package
> >they are defined in both of them (and they are different, BTW).
> 
> Some overlap may be ok.  ISO defines it -- not Debian!
I beg your pardon?  How this could be?  Well, unfortunately, I do not have a
copy of UNICODE standard.  But I doubt that a <emphasis>standard</emphasis>
could define the same thing in two or more ways: this is not even an ambiguity.
Yes, I agree that we could have two sets of entities: defining UNICODE codes
and system data.  I believe in current situation we have a severe problem: first
included set wins.  That's really bad.

> >As for working out this problem.  There are two possibilities: to make use of
> >SDATA entities in all programs that come with Debian; or to use some Unicode
> >encoding for intermediate/output files.
> 
> I opt for unicode.  Unless there is a standard that the copyright
> circle 'c' glyph needs to be '[copy   ]' and not '[copy ]' nor 
> '[COPY  ]', that is, unless I am given a guidelines by which to 
> distinguish the proper notation from the impostor, I am very hesitant
> to do that.
Adam, I opt for whatever permits us to deal with the problem: what we get is
not what we want.

I believe SDATA just provide a convenient way for dealing with certain symbols.
Please understand that I do not insist on using SDATA-entities only, no, I just
want to see circled c in text of Russian documentation as well as in all other
versions too.

--
Mike

Re: A small question

Reply via email to