Hi Fred, On Wed, 17 Dec 2003, Fred L. Drake, Jr. wrote:
> > I'm using LaTeX2HTML to convert programmer's API documentation using a > fair bit of custom Perl code. > > Over the years, I've dealt with many places in our documentation where > the text "--" is contained as content rather than an markup for an > en-dash. In each case, I've avoided the en-dash conversion by adding > still more markup in the document text. While less than ideal, it has > worked. > > I recently decided it was time to tackle this problem in a more > general way. > > In each case where I've needed to deal with this conversion, the > affected "--" has occurred in content which is known to never need the > conversion based on the surrounding markup. What I've tried to do in > these cases is to convert the "--" to some other HTML spelling of > those two characters; I've tried both "--" and the XHTML-ish > "--". In both cases, the conversion still takes place. > > Where in LaTeX2HTML is this conversion being done? Is there some way > to suppress this without an uglier transformation of the "--"? It happens in the &text_cleanup routine: # This routine must be called once on the text only, # else it will "eat up" sensitive constructs. sub text_cleanup { # MRO: replaced $* with /m s/(\s*\n){3,}/\n\n/gom; # Replace consecutive blank lines with one s/<(\/?)P>\s*(\w)/<$1P>\n$2/gom; # clean up paragraph starts and ends s/$O\d+$C//go; # Get rid of bracket id's s/$OP\d+$CP//go; # Get rid of processed bracket id's s/(<!)?--?(>)?/(length($1) || length($2)) ? "$1--$2" : "-"/ge; ^^^^^^^^^^^^_________________________________^^_______^ here's the pattern! HTML comment delimiters pass unchanged other occurrences of -- are contracted > > I will note that converting "--" to "-<span>-</span>" or > "-<!--junk-->-" works, but both are incredibly ugly ways of doing > this. Sure. I'd suggest that you replace the above line by a subroutine call, then define the subroutine to do whatever replacements you think are best for you -- perhaps none at all. Theoretically, this replacement line is wrong, since it occurs on 'output' rather than on 'input', as it would do with a TeX engine. But I cannot find a better place for it, since it needs to act on the result of macro expansions, as well as the normal text of the document. Currently the replacement acts *after* all macro expansions have been done, and all environments have been processed, but *before* verbatim strings (and other 'sensitive' marked constructs) are re-inserted into the document. You say... > In each case where I've needed to deal with this conversion, the > affected "--" has occurred in content which is known to never need the > conversion based on the surrounding markup. What I've tried to do in So perhaps you should be using a construct that creates a 'sensitive' marker for the whole block of content, in a similar way to how verbatim-like environments are handled. (These have their content stored in a database, and a 'marker' inserted into the document; to be replaced much later by the &replace_sensitive_markers routine.) However, environments like {alltt} cannot be done this way, as for those macros still need to be expanded. Any further ideas would be welcome. > > Thanks! > Cheers, Happy New Year Ross > > -Fred > > -- > Fred L. Drake, Jr. <fdrake at acm.org> > PythonLabs at Zope Corporation > _______________________________________________ > latex2html mailing list > [EMAIL PROTECTED] > http://tug.org/mailman/listinfo/latex2html > _______________________________________________ latex2html mailing list [EMAIL PROTECTED] http://tug.org/mailman/listinfo/latex2html