So I figured out the previous problem, I'm pretty sure the answer is to use "-dump-codepage utf8" in the options. I am using v0.12-20080124.
The problem I'm experiencing now, is that between converted characters, I'm getting the this data: FDBF BFBF BFBD The code I'm converting is: <headLine>全国週間予報(<sngaiji char="">11</sngaiji>時)</headLine> The hex dump: 2020 2020 2020 2020 2020 2020 20E5 85A8 FDBF BFBF BFBD E59B BDFD BFBF BFBF BDE9 80B1 FDBF BFBF BFBD E996 93FD BFBF BFBF The 1st char, 5168 correctly maps to E585A8 in hex The 2nd char, 565d maps to E59BBD but only after printing an additional FDBF BFBF BFBD and again 9031 maps et E99693 Anyone know why this is occurring? I looked at intl/charsets.h and there are some special unicode values listed, but none match the value that is being inserted above. Note: The file I'm reading in is HTML in UTF8, and I wish to write out UTF8, which I am doing, but with that magic value in the middle. Thanks in advance, Rick Richardson -- "Myths and legends die hard in America. We love them for the extra dimension they provide, the illusion of near-infinite possibility to erase the narrow confines of most men's reality. Weird heroes and mould-breaking champions exist as living proof to those who need it that the tyranny of 'the rat race' is not yet final." -- Hunter S. Thompson _______________________________________________ elinks-users mailing list [email protected] http://linuxfromscratch.org/mailman/listinfo/elinks-users
