[
https://issues.apache.org/jira/browse/XERCESC-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758732#action_12758732
]
Igor Ignatyuk edited comment on XERCESC-1288 at 9/23/09 8:25 AM:
-----------------------------------------------------------------
The program in the attached xercesc_wrong_position.cpp prints the next message:
line 1, column 23: invalid byte '<' at position 2 of a 3-byte sequence
while parsing xercesc_wrong_position.xml.
The line number should be 2, the column number should be 8 (position of the
character '<') or 7 (position of the character 'ä').
was (Author: igna):
The program prints the next:
line 1, column 23: invalid byte '<' at position 2 of a 3-byte sequence
The line number should be 2, the column number should be 8 (position of the
character '<') or 7 (position of the character 'ä').
> Wrong line/column number in UTFDataFormatException
> --------------------------------------------------
>
> Key: XERCESC-1288
> URL: https://issues.apache.org/jira/browse/XERCESC-1288
> Project: Xerces-C++
> Issue Type: Bug
> Components: DOM, Non-Validating Parser, SAX/SAX2
> Affects Versions: 2.5.0, 2.6.0
> Environment: Linux (SUSE 9.1, Fedora core 2, Redhat 9) on Intel,
> Solaris 7 on SPARC, various gcc versions.
> Reporter: Valerio Gionco
> Priority: Minor
> Attachments: xercesc_wrong_position.cpp, xercesc_wrong_position.xml
>
>
> I've the following (bad) XML file:
> --------------- bad.xml ----------------------------
> <?xml version="1.0" encoding="UTF-8"?>
> <block>
> <field>Blah blah</field>
> <field>Blah blah ò blah blah</field>
> <field>Blah blah</field>
> </block>
> ----------------------------------------------------
> (note the accented 'o' in the 2nd "field" line - hope it won't be
> destroyed...)
> The file is bad because the accented 'o' is represented with a single
> byte, 0xf2. This is the hed dump:
> 3e 42 6c 61 68 20 62 6c 61 68 20 f2 20 62 6c 61 |>Blah blah . bla|
> Problem is, when I run "SAXPrint bad.xml" i get the following error:
> Fatal Error at file /users/valerio/tmp/bad.xml, line 1, char 39
> Message: An exception occurred! Type:UTFDataFormatException,
> Message:invalid byte 2 ( ) of a 4-byte sequence.
> The row and column reported by SAXParseException::getColumnNumber()
> and SAXParseException::getLineNumber() are wrong. I seem to recall
> this was not the case with older (2.0 or 2.2?) versions of Xerces-C,
> but I'm not sure.
> I noticed the issue with 2.5, then tried with 2.6 but there was
> no apparent difference. Can somebody take care of this? We often
> have big XML files to parse, and not knowing where the error
> really is is a real pain.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]