> On Wed, Apr 24, 2002, Ross Moore wrote: > > OK; I've got it, and can reproduce the problem. > > > > The fix is easy, but first a question. > > You example HTML files correctly have charset = text/big5 . > > Where is this done in the processing, or do you do it yourself > > after LaTeX2HTML has finished ? > > It's because(if you don't mention this, i almost forget it.:), > I have ~/.latex2html-init, > > $ADDRESS = "<I>Compiled by Edward G.J. Lee ($address_data[1])</I>"; > $default_language = 'taiwanese'; > $TITLES_LANGUAGE = "taiwanese"; > $charset = "big5"; > $BOTTOM_NAVIGATION = 1;
Ahah; there's the culprit. > So, I didn't do anything after executing ``latex2html''. The > taiwanese is just for testing only. > > > By simply inserting 2 lines into CJK.perl the problem > > is fixed, and this charset is set automatically: > > > > > > package main; > > > > $charset = 'big5'; ## insert these 2 lines > > $CHARSET = 'big5'; ## > > > > > > This should be sufficient for documents have just Big5 characters. > > > > Please advise if you have example documents where this is not sufficient. > > Thanks, but I guess to config rc file maybe more convenient, > cause sometimes we might write an utf-8 or other charset HTML. Yes. Werner pointed out the same problem. I'm going to update the LaTeX2HTML repository with the following patch to CJK.perl : landau.ics.mq.edu.au> cvs diff CJK.perl Index: CJK.perl =================================================================== RCS file: /home/latex2ht/cvs/latex2html/user/styles/CJK.perl,v retrieving revision 1.5 diff -r1.5 CJK.perl 82a83,106 > # possible values for the 1st optional argument to \begin{CJK} > # and the corresponding charset: > > %CJK_charset = ( > 'Bg5' , 'big5' > , 'Bg5+' , 'big5+' > , 'GB' , 'gb_2312' > , 'GBt' , 'gbt_12345' > , 'GBK' , 'gbk' > , 'JIS' , 'jisx_0208' > , 'SJIS' , 'sjis' > , 'KS' , 'ks_1001' > , 'UTF8' , 'utf8' > , 'EUC-TW' , 'euc-tw' > , 'EUC-JP' , 'euc-jp' > ); > > # Use 'Bg5' => 'big5' as default charset, for both input and output, > # unless it is set already with a value for $CJK_AUTO_CHARSET > > $CJK_AUTO_CHARSET = '' unless (defined $CJK_AUTO_CHARSET); > $charset = $CHARSET = $CJK_AUTO_CHARSET || $CJK_charset{'Bg5'}; > > 118c142,155 < &get_next_optional_argument; --- > my ($cjk_enc) = &get_next_optional_argument; > $cjk_enc =~ s/^\s+|\s+$//g; > if ($cjk_enc) { > if (!defined $CJK_charset{$cjk_enc}) { > &write_warning ( "unknown charset code: $cjk_enc in CJK environment."); > } elsif (!$CJK_AUTO_CHARSET) { > $CJK_AUTO_CHARSET = $charset = $CHARSET = $CJK_charset{$cjk_enc}; > } elsif ($CHARSET eq $CJK_charset{$cjk_enc}) { > # compatible; do nothing. > } else { > &write_warning ( "Only one charset allowed per document: $CHARSET"); > &write_warning ( "Ignoring request for ".$CJK_charset{$cjk_enc}); > } > } Please advise ASAP if there is anything here that you think is incorrect or inadequate. Note how there is now a variable $CJK_AUTO_CHARSET which can be set in an initialisation file. If it is not set, then the first {CJK} or {CJK*} environment that has an encoding argument will change the encoding from the global default of 'big5'. Please apply the patch, and report any problems. All the best, Ross Moore > > The reason for the errors, without these charset settings, was because > > some 8-bit characters were being translated back to TeX accents, or > > to macros for mathematical symbols, according to the latin-1 use of those > > characters. This is clearly inappropriate for a CJK document. > > > > > > Hope this helps, > > > > Ross Moore > > I see, thanks for the clear explanations. You're welcome. Thanks for making me look at CJK.perl . Until today, I'd never studied that package. :-) > > Rgds, > Edward G.J. Lee _______________________________________________ latex2html mailing list [EMAIL PROTECTED] http://tug.org/mailman/listinfo/latex2html