I just noticed the HTML exporter doesn't generate international characters
like "&aring" to produce an "�", it just exports the character as is.
Perhaps this indeed is well formed HTML according to a comment in wv/text.c
(wvConvertUnicodeToHtml)

    As the output encoding for HTML was chosen as UTF-8,
    we don't need Ä etc. etc. I removed all but sz
    -- MV 6.4.2000

The HTML importer fails if trying to import such a document. Even if
manually changing the HTML to read "&aring" the importer fails.

To reproduce you might try
<html><body><p>&auml</p></body></html>

Unfortunately I haven't got a clue of where to start to look for solutions.
I tried to debug xmlparse.c, but its horrendous use of macros made it quite
clear that the code didn't want to be debugged.

Ideas anyone?

/Mike - please don't cc


Reply via email to