Hi!

I found a problem with XMLBeans parsing documents with
elements such as
 <!DOCTYPE SYSTEM SYSTEM "xxx.dtd">

In fact, the problem seems to be in Piccolo - below's
the mail I sent to Piccolo's maintainer.
I'm forwarding it here, too, because the problem directly
affects XMLBeans also.

    Panu


---------------------------- Original Message ----------------------------
Subject: Piccolo doctype tag parsing problem (and a solution)
From:    Panu Hällfors <mail_protected>
Date:    Thu, September 8, 2005 3:01 pm
To:      [EMAIL PROTECTED]
--------------------------------------------------------------------------

Hi!

I'm using Piccolo indirectly by using XMLBeans
(http://xmlbeans.apache.org/).

There seems to be a problem with Piccolo's lexical
scanner when processing !DOCTYPE, !ELEMENT and !NOTATION tags.

The problem arises when the name of the doctype
is the same as one of the keywords. That is, something like:
 <!DOCTYPE SYSTEM SYSTEM "xxx.dtd">

As far as I can read the XML spec, this is a legal
definition - and I also happen to have tons of XML files
I have to process with such definitions.

However, Piccolo's lexer is built in a way that the first
"SYSTEM" string in the !DOCTYPE/!ELEMENT/!NOTATION element
is taken as the SYSTEM keyword. What the lexer should do instead,
is first take the name of the element and only then proceed
to scanning for the keywords.


A concept-level fix for this would perhaps be:
[in src/com/bluecast/xml/PiccoloLexer.flex]
 1. create state DTD_TAG_NAME
 2. Copy (or move?) the {Name} pattern from state
    DTD_TAG to DTD_TAG_NAME
 3. Proceed from DTD_TAG_NAME to DTD_TAG
 4. make the lexer enter DTD_TAG_NAME when
    !DOCTYPE, !ELEMENT or !NOTATION is found
 5. make sure that correct values are returned
    and all the other boring stuff I'm omitting :)


Thanks for a great parser!
Hope you'll get this in order!

    Panu Hällfors

-- 
 http://panu.hallfors.com



-- 
 http://panu.hallfors.com


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to