On Sat, Aug 25, 2018 at 10:02:27AM +0200, Helge Kreutzmann wrote: > reopen 549233 > found 549233 1:2.0.0-42 > severity 549233 minor > thanks > > Hello Chris, > On Mon, Aug 20, 2018 at 10:27:11AM +0000, Debian Bug Tracking System wrote: > > This is an automatic notification regarding your Bug report > > which was filed against the docbook-to-man package: > > > > #549233: docbook-to-man: Does not accept (some) (unicode) characters > > > > > It appears that docbook-to-man is not UTF-8 ready. If you compile the > > > attached man page "as is" then you'll get the following error: > > > /usr/bin/nsgmls:demo.man.sgml:60:6:E: non SGML character number 156 > > > /usr/bin/nsgmls:demo.man.sgml:60:6: open elements: REFENTRY REFSECT1[1] > > > PARA[1] (#PCDATA[1]) > > > /usr/bin/nsgmls:demo.man.sgml:62:9:E: non SGML character number 159 > > > /usr/bin/nsgmls:demo.man.sgml:62:9: open elements: REFENTRY REFSECT1[1] > > > PARA[1] (#PCDATA[1]) > > > > This is no longer reproducible; so closing :) > > Well, in my environment (current testing) it is: > helge@samd:~/download$ recode latin1..utf8 demo.man.sgml > helge@samd:~/download$ file *.sgml > demo.man.sgml: HTML document, UTF-8 Unicode text > helge@samd:~/download$ docbook-to-man demo.man.sgml > demo.1 > /usr/bin/nsgmls:demo.man.sgml:60:6:E: non SGML character number 156 > /usr/bin/nsgmls:demo.man.sgml:60:6: open elements: REFENTRY REFSECT1[1] > PARA[1] (#PCDATA[1]) > /usr/bin/nsgmls:demo.man.sgml:62:9:E: non SGML character number 159 > /usr/bin/nsgmls:demo.man.sgml:62:9: open elements: REFENTRY REFSECT1[1] > PARA[1] (#PCDATA[1]) > > The same error happens with the file from Paul. (I did not see his e-mail > earlier, because he did not CC me and adressed only the bug) and the > output is the same for both.
Hi, I recently tried to play with linuxdoc and utf-8 documents and run into the same problem, onsgmls: ... 01.precmdout:1559:71:E: non SGML character number 141 This time I was lucky and a web search pointed me to https://bugzilla.redhat.com/show_bug.cgi?id=66179. After that suggestion, SP_CHARSET_FIXED=yes SP_ENCODING=xml sgml2html FAQ-CervanTeX-utf8.sgml made that messages disappear with opensp. I am including that in linuxdoc-tools as part of preliminary utf-8 support and may be of help here. > > > Interestingly, some characters (like "ü") are accepted without > > > problems while others (Ü,ß) yield the above errors. May be it complains only about one part of the multi-byte representation, not present in lowercase characters. -- Agustin

