Hello
Yes, Keith was right, all the mumbo jumbo I wrote with the exception of the two sentences swapping spaces was indeed related to the hyphenation.I would have caught this, if my observation was of the parsed html text and not of the actual html file.
I now installed groff 1.21, seeing if it would made a difference. The problem with the two sentences swapping placing is now resolved.
The problem with hyphens, apostrophes, and dashes still remains. I'm including a sample of the results. On Fri, 25 Mar 2011, Keith Marshall wrote:
On 25 March 2011 04:38, Werner LEMBERG wrote:Justin, a simple example says more than thousand words... So please give us an example we can examine.Hear! Hear!At a first glance, it seems you have an encoding problem (but this doesn't explain the strange things you see). The default encoding of groff is latin1, and your input file is probably UTF8. Starting with version 1.20, groff can handle UTF8 by use a new preprocessor. The HTML output driver is still experimental (and basically unmaintained currently due to lack of time and interest); it is easily possible that you've found a bug.Equally -- perhaps more -- likely, Justin has encountered a hyphenation issue. This:On the 11th in my groff file, an "â" character is found after 64 characters have been printed, within the word hamburger, the text gets parsed and printed as "hamâburger". If I change hamburger to donations I have the "â" character show up at the 60th character on the line, with donations being "donaâtions".is reminiscent of an issue I myself observed, earlier this week. I had run some informally structured ASCII text through a sed filter, and then through nroff, (v1.20.1), to produce an alternative layout. Although I had suppressed hyphenation (.hy 0), I did have several explicit ASCII hyphen characters in the input stream; each of these was replaced, in the output stream, by the three byte octal sequence 342 200 220, (which I guess represents u2010 -- the Unicode hyphen which groff_char(7) documents as the output form for hyphen). Viewing this output with "less", on my UTF-8 aware console, it looked absolutely fine, but after uploading as a package description file on my SourceForge downloads page, each hyphen was rendered, by Firefox, with unwanted whitespace surrounding it; rendered by Internet Explorer, each hyphen was replaced by three characters of garbage, amongst it being the "â" observed by Justin, IIRC. So yes, I guess what you actually see is dependent on encoding, (and how the viewer interprets the u2010 sequence, however it is encoded). In my case, I wanted real ASCII hyphens in my output stream; adding "-Tascii" to my nroff command gave me that. -- Regards, Keith.
bill_hicks.tr
Description: groff file
