So I figured out the previous problem, I'm pretty sure the answer is
to use "-dump-codepage utf8" in the options.
I am using v0.12-20080124.

The problem I'm experiencing now, is that between converted
characters, I'm getting the this data: FDBF BFBF BFBD

The code I'm converting is:
<headLine>&#x5168;&#x56fd;&#x9031;&#x9593;&#x4e88;&#x5831;&#xff08;<sngaiji
char="&#xf1da;">&#xff11;&#xff11;</sngaiji>&#x6642;&#xff09;</headLine>

The hex dump:
2020 2020 2020 2020 2020 2020 20E5 85A8
FDBF BFBF BFBD E59B BDFD BFBF BFBF BDE9
80B1 FDBF BFBF BFBD E996 93FD BFBF BFBF

The 1st char, 5168 correctly maps to E585A8 in hex
The 2nd char, 565d maps to E59BBD but only after printing an
additional FDBF BFBF BFBD
and again 9031 maps et E99693

Anyone know why this is occurring? I looked at intl/charsets.h and
there are some special unicode values listed, but none match the
value that is being inserted above.

Note: The file I'm reading in is HTML in UTF8, and I wish to write out
UTF8, which I am doing, but with that magic value in the middle.

Thanks in advance,
Rick Richardson

-- 
"Myths and legends die hard in America. We love them for the extra
dimension they provide, the illusion of near-infinite possibility to
erase the narrow confines of most men's reality. Weird heroes and
mould-breaking champions exist as living proof to those who need it
that the tyranny of 'the rat race' is not yet final."  -- Hunter S.
Thompson
_______________________________________________
elinks-users mailing list
[email protected]
http://linuxfromscratch.org/mailman/listinfo/elinks-users

Reply via email to