On 05/05/2010 05:36 AM, Bèrto ëd Sèra wrote:
Now I'm using gst 3.2 and iliad 0.8. What I get from the following code:
content := 'taxonomy.xml' asFile.
parser := XML.XMLParser new.
parser validate: false.
parser parse: content readStream.

Not related, but it's best to use "XML.SAXParser defaultParserClass
new". Unlike XMLParser, other parsers may not construct the DOM by default, so you end up with:

    PackageLoader fileInPackage: 'XML-XMLParser'.
    content := 'taxonomy.xml' asFile.
    parser := XML.SAXParser defaultParserClass new.
    parser validate: false.
    parser saxDriver: (driver := XML.DOM_SAXDriver new).
    parser parse: content readStream
    driver document

This won't fix the bug but will make you a good citizen (see NEWS file in GST 3.2).

the breakers are, for example:
1)...the æ and œ ligatures, ...
2) Devanāgarī script for Hindi
3) Japanese Rōmaji script

The breaker is _entities_, not characters.

I was wondering what changed... or, most probably, what kind of silly
mistake I'm making...

Nothing, it's a bug. The easiest way to fix it is to use the XML-Expat package. You just have to replace the first line above with these two:

    PackageLoader fileInPackage: 'XML-Expat'.
    PackageLoader fileInPackage: 'XML-DOM'.

It's _thousands_ of times faster too.

But if you insist, this patch fixes it:

diff --git a/packages/xml/parser/XML.st b/packages/xml/parser/XML.st
index 309cf36..a9ebb7f 100644
--- a/packages/xml/parser/XML.st
+++ b/packages/xml/parser/XML.st
@@ -2950,7 +2950,7 @@ Instance Variables:
            ifTrue:
                [sax fatalError: (BadCharacterSignal new
messageText: 'A character with Unicode value %1 is not legal' % {n})].
-       data nextPut: (Character value: n).
+       data display: (Character codePoint: n).
        self getNextChar
     ]

diff --git a/packages/xml/parser/package.xml b/packages/xml/parser/package.xml
index 2e0bcce..fc72811 100644
--- a/packages/xml/parser/package.xml
+++ b/packages/xml/parser/package.xml
@@ -13,6 +13,7 @@

   <prereq>XML-SAXParser</prereq>
   <prereq>XML-DOM</prereq>
+  <prereq>Iconv</prereq>

   <filein>XML.st</filein>
   <file>XML.st</file>

Paolo


_______________________________________________
help-smalltalk mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

Reply via email to