Ignacio Renuncio <[EMAIL PROTECTED]> wrote:
> 
> Hi again,
> 
> I've found a little problem with the XML importer related to the charset
> used:
> 
> I took a sample XML and imported it ok, but when I tried to do it with a
> real one, some characters displayed wrong when viewing the new records with
> the "my_editors" editor.
> 
> After examining the XML I could view that the offending characters were
> related to the encoding. I changed the encoding to ISO-8859-1 and tried to
> save the XML, but XMLSPY complained about it telling me:
> 
> "Your document contains 13 character(s) that cannot be represented in the
> ISO 8859-1 (Latin-1/West European) character-set encoding. (...blah
> blah...)"
> 
> BTW, the offending characters are 0x2026 (three dots character) and 0x2013
> (typographical dash), they seem to have been "auto-formatted" MS Word when
> typing the texts.


I suppose XMLImporter uses a decent XML-parser. XML is defaultly encoded in
UTF-8, so you should have written your data as UTF-8. If you change that,
you should also change the actual encoding of the characters. I suppose that
is immpossible for the two said characters because they are not part of the
iso-8859-1 characters set. You should use a kind of word-filter, or stick to
UTF-8.

But, anyhow, it should not matter how the source XML is encoded, as long as
it is correctly encoded.

> 1st question: Which encoding does MMBase use to display data?

That is not determined. Internally MMBase is java, which does not specify a
encoding for Strings. Strings can contain any character from the Unicode
character set.

The 'basic jsp' editors use UTF-8 for displaying, and I suppose also the
my_editors do that.

> 2nd question: Is this a fault in the XML Importer module?

That is possible (which would mean that it does not use an decent XML parser
as I suggested earlier), the other possibility is that there is an error in
your XML's encoding.

Michiel


-- 
Michiel Meeuwissen
Mediacentrum 140 H'sum 
+31 (0)35 6772979
nl_NL eo_XX en_US
mihxil'
 [] ()

Reply via email to