[
https://issues.apache.org/jira/browse/ABDERA-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689275#action_12689275
]
jv ning commented on ABDERA-222:
--------------------------------
HttpClient is using a ChunkedInputStream under the covers, which forces no read
to span a chunk boundary.
The jetty server on the other side is arranging chunks so that the multi-byte
characters, start the chunks.
> Parse failures reading utf-8 xml files that have attribute values that
> contain non US-ASCII valid utf-8 characters
> ------------------------------------------------------------------------------------------------------------------
>
> Key: ABDERA-222
> URL: https://issues.apache.org/jira/browse/ABDERA-222
> Project: Abdera
> Issue Type: Bug
> Affects Versions: 0.4.0
> Environment: solarix x86_64, MaxOS Leopard x86_64, linux x86_64
> Reporter: jv ning
>
> When parsing XML files that are items fetched by http-client 3.1
> The same items parse correctly, if written to a byte array and then a
> ByteArrayInputStream on the byte array, is passed to parse.
> parser.parse(response.getResponseBodyAsStream());
> Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character
> (NULL, unicode 0) encountered: not valid in any content
> at [row,col {unknown-source}]: [3,56]
> at
> com.ctc.wstx.sr.StreamScanner.constructNullCharException(StreamScanner.java:615)
> at
> com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:644)
> at
> com.ctc.wstx.sr.BasicStreamReader.readTextPrimary(BasicStreamReader.java:4554)
> at
> com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2886)
> at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)
> at
> org.apache.abdera.parser.stax.FOMBuilder.getNextElementToParse(FOMBuilder.java:163)
> at org.apache.abdera.parser.stax.FOMBuilder.next(FOMBuilder.java:187)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.