On 25 March 2011 04:38, Werner LEMBERG wrote: > > Justin, > > a simple example says more than thousand words... So please give us > an example we can examine.
Hear! Hear! > At a first glance, it seems you have an encoding problem (but this > doesn't explain the strange things you see). The default encoding of > groff is latin1, and your input file is probably UTF8. Starting with > version 1.20, groff can handle UTF8 by use a new preprocessor. > > The HTML output driver is still experimental (and basically > unmaintained currently due to lack of time and interest); it is easily > possible that you've found a bug. Equally -- perhaps more -- likely, Justin has encountered a hyphenation issue. This: > On the 11th in my groff file, an "â" character is found after 64 > characters have been printed, within the word hamburger, the text gets > parsed and printed as "hamâburger". If I change hamburger to donations > I have the "â" character show up at the 60th character on the line, > with donations being "donaâtions". is reminiscent of an issue I myself observed, earlier this week. I had run some informally structured ASCII text through a sed filter, and then through nroff, (v1.20.1), to produce an alternative layout. Although I had suppressed hyphenation (.hy 0), I did have several explicit ASCII hyphen characters in the input stream; each of these was replaced, in the output stream, by the three byte octal sequence 342 200 220, (which I guess represents u2010 -- the Unicode hyphen which groff_char(7) documents as the output form for hyphen). Viewing this output with "less", on my UTF-8 aware console, it looked absolutely fine, but after uploading as a package description file on my SourceForge downloads page, each hyphen was rendered, by Firefox, with unwanted whitespace surrounding it; rendered by Internet Explorer, each hyphen was replaced by three characters of garbage, amongst it being the "â" observed by Justin, IIRC. So yes, I guess what you actually see is dependent on encoding, (and how the viewer interprets the u2010 sequence, however it is encoded). In my case, I wanted real ASCII hyphens in my output stream; adding "-Tascii" to my nroff command gave me that. -- Regards, Keith.
