I've been testing the advice given by Ross Moore regarding curly quotation marks (2006/04/18) and "passing though" raw unicode (2003/07/06):
------------------------------------------------------------------------- Curly quotation marks (2006/04/18): =================================== $USE_CURLY_QUOTES = 1; set this in an initialization file. Also set the following $USE_UTF=1; OR execute the job with options such as: latex2html -split 0 -html_version 4.0,latin1,unicode,utf8 myfile.tex 4.0 = satisfy HTML 4.0 recommendations (4.1 might work for HTML 4.01) latin1 = input encoding unicode = use unicode code-points in the output utf8 = use byte-sequences, rather than entity numbers (or names) whenever appropriate. Raw unicode (2003/07/06): ========================= You may need to specify on the commandline something like: latex2html -html_version 4.0,unicode ...other-options... <filename> or latex2html -html_version 4.0,unicode,utf8 ...... or even latex2html -html_version 4.0,unicode,unicode ...... Basically, the problem will be that you do *not* want LaTeX2HTML to assign special meaning to upper-8-bit codes and translate them into something else. ------------------------------------------------------------------------- In my testing I had three goals: 1. Output single quote marks as curly characters, 2. Output double quote marks as curly characters, and 3. Output raw unicode as unicode, e.g., —äß (em dash, a umlaut and scharfe s). Here are the results of my testing (display in monospace to align columns): initialisation file html-version options single double raw variable(s) quotes quotes unicode ------------------------- ---------------------- ------ ------ ------- 1. `' ``'' rubbish .1 2. USE_CURLY_QUOTES `' “” rubbish .2 3. USE_CURLY_QUOTES USE_UTF ** ERROR ** .3 4. USE_CURLY_QUOTES latin1,unicode,utf8 `' “” rubbish .4 5. USE_CURLY_QUOTES latin1,unicode,unicode `' “” unicode .5 6. USE_CURLY_QUOTES USE_UTF latin1,unicode,utf8 `' “” rubbish .6 7. USE_CURLY_QUOTES USE_UTF latin1,unicode,unicode ** ERROR ** .7 8. USE_UTF ** ERROR ** .8 9. USE_UTF latin1,unicode,utf8 `' ``'' rubbish .9 A. USE_UTF latin1,unicode,unicode ** ERROR ** .A B. latin1,unicode,utf8 `' ``'' rubbish .B C. latin1,unicode,unicode `' ``'' unicode .C * Runs that errored terminated prematurely with the message: "Undefined subroutine &main::convert_to_utf8 called at /usr/bin/latex2html line 7462." The latex2html version is '2002-2-1 (1.71)'. In case this email's encoding gets screwed up in transmission, the runs that resulted in curly double quotes were 2, 4, 5 and 6. Some observations/conclusions: - No method gave curly single quotes. - The only method that output curly double quotes was the init file variable "USE_CURLY_QUOTES". - The only method that output raw unicode was html_version options "unicode,unicode". - The init file variable USE_UTF caused a fatal error unless 'utf8' was included as a 'html_version' option. I'm curious to know two things. Firstly, is there is a way to get curly single quote output from latex2html? Secondly, I couldn't find documentation anywhere on USE_CURLY_QUOTES and USE_RTF after checking the manual, perldoc, man and info files. Are there any other such undocumented variables and, if so, where can I read up on them? Regards, David. _______________________________________________ latex2html mailing list latex2html@tug.org http://tug.org/mailman/listinfo/latex2html