On Sat, Aug 25, 2018 at 10:02:27AM +0200, Helge Kreutzmann wrote:
> reopen 549233
> found 549233 1:2.0.0-42
> severity 549233 minor
> thanks
> 
> Hello Chris,
> On Mon, Aug 20, 2018 at 10:27:11AM +0000, Debian Bug Tracking System wrote:
> > This is an automatic notification regarding your Bug report
> > which was filed against the docbook-to-man package:
> > 
> > #549233: docbook-to-man: Does not accept (some) (unicode) characters
> > 
> > > It appears that docbook-to-man is not UTF-8 ready. If you compile the
> > > attached man page "as is" then you'll get the following error:
> > > /usr/bin/nsgmls:demo.man.sgml:60:6:E: non SGML character number 156
> > > /usr/bin/nsgmls:demo.man.sgml:60:6: open elements: REFENTRY REFSECT1[1] 
> > > PARA[1] (#PCDATA[1])
> > > /usr/bin/nsgmls:demo.man.sgml:62:9:E: non SGML character number 159
> > > /usr/bin/nsgmls:demo.man.sgml:62:9: open elements: REFENTRY REFSECT1[1] 
> > > PARA[1] (#PCDATA[1])
> > 
> > This is no longer reproducible; so closing :)
> 
> Well, in my environment (current testing) it is:
> helge@samd:~/download$ recode latin1..utf8 demo.man.sgml
> helge@samd:~/download$ file *.sgml
> demo.man.sgml:   HTML document, UTF-8 Unicode text
> helge@samd:~/download$ docbook-to-man demo.man.sgml > demo.1
> /usr/bin/nsgmls:demo.man.sgml:60:6:E: non SGML character number 156
> /usr/bin/nsgmls:demo.man.sgml:60:6: open elements: REFENTRY REFSECT1[1] 
> PARA[1] (#PCDATA[1])
> /usr/bin/nsgmls:demo.man.sgml:62:9:E: non SGML character number 159
> /usr/bin/nsgmls:demo.man.sgml:62:9: open elements: REFENTRY REFSECT1[1] 
> PARA[1] (#PCDATA[1])
> 
> The same error happens with the file from Paul. (I did not see his e-mail
> earlier, because he did not CC me and adressed only the bug) and the
> output is the same for both.

Hi,

I recently tried to play with linuxdoc and utf-8 documents and run into the
same problem,

onsgmls: ... 01.precmdout:1559:71:E: non SGML character number 141

This time I was lucky and a web search pointed me to
https://bugzilla.redhat.com/show_bug.cgi?id=66179. After that suggestion, 

SP_CHARSET_FIXED=yes SP_ENCODING=xml sgml2html FAQ-CervanTeX-utf8.sgml

made that messages disappear with opensp. I am including that in
linuxdoc-tools as part of preliminary utf-8 support and may be of help here.

> > > Interestingly, some characters (like "ü") are accepted without
> > > problems while others (Ü,ß) yield the above errors.

May be it complains only about one part of the multi-byte representation,
not present in lowercase characters.

-- 
Agustin

Reply via email to