[Help-smalltalk] Unicode problem on parsing XML

Bèrto ëd Sèra Wed, 05 May 2010 08:52:03 -0700

Hi all,

lately I started to see an out-of-index error when parsing Unicode
text from XML files. I am *not* 100% sure this isn't due to some
changes database-side that are now exposing a wider amount of text to
the parser, so I cannot safely claim it's new. Yet, now even common
accented Latin chars get warped into something unusable when read from
the parser, and this *surely* was not happening the last time I worked
on the interface, say 3 months ago, with Iliad 7.0.


Now I'm using gst 3.2 and iliad 0.8. What I get from the following code:
content := 'taxonomy.xml' asFile.
parser := XML.XMLParser new.
parser validate: false.
parser parse: content readStream.

is an error you can easily reply by putting
http://eng.i-iter.org/graph/taxonomy.xml file into your local dir.
Before you get crazy (as I did) digging around the text looking for
the guilty chars I can tell
you the breakers are, for example:
1)...the æ and œ ligatures, ...
2) Devanāgarī script for Hindi
3) Japanese Rōmaji script

I was wondering what changed... or, most probably, what kind of silly
mistake I'm making...

Bèrto

-- 
==============================
Constitution du 24 juin 1793 - Article 35. - Quand le gouvernement
viole les droits du peuple, l'insurrection est, pour le peuple et pour
chaque portion du peuple, le plus sacré des droits et le plus
indispensable des devoirs.


_______________________________________________
help-smalltalk mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

[Help-smalltalk] Unicode problem on parsing XML

Reply via email to