[ 
https://issues.apache.org/jira/browse/ABDERA-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689256#action_12689256
 ] 

jv ning commented on ABDERA-222:
--------------------------------

This appears to trigger when the socket read boundaries fall such that the 
first byte of a multi byte character is the first byte in a read from the 
network socket.

In our failing case, there are 3 reads issed against the input stream returned 
by the httpmethod.
1 for 4 bytes
1 for 196 bytes
1 for 3800 bytes
and then for 4 k bytes.

In our failing case, the read for 196 bytes does returns less that 196 bytes, 
and the first character read in the next read is the start byte of our 
multibyte character.
The multi-byte character is returned in the 3rd READ_ARRAY call and written to 
position 200 in the input buffer.
When the mutli-byte character is not the first byte sequence returned by read, 
there is no exception.

"TIME"  "method"        "read byte count"       "read byte count after mark 
resets"     "where read data is written into the buffer passed to read"     
"read request size"     "count read"
1238017735367   " AVAILABLE"    0       0       0       4       4
1238017735367   "READ_ARRAY"    0       0                       
1238017735367   " AVAILABLE"    4       4                       
1238017735367   "READ_ARRAY"    4       4       4       196     158
1238017735367   " AVAILABLE"    162     162                     
1238017735367   "READ_ARRAY"    162     162     200     3800    2890
1238017735370   "     CLOSE"    3052    3052                    


> Parse failures reading utf-8 xml files that have attribute values that 
> contain non US-ASCII valid utf-8 characters
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: ABDERA-222
>                 URL: https://issues.apache.org/jira/browse/ABDERA-222
>             Project: Abdera
>          Issue Type: Bug
>    Affects Versions: 0.4.0
>         Environment: solarix x86_64, MaxOS Leopard x86_64, linux x86_64
>            Reporter: jv ning
>
> When parsing XML files that are items fetched by http-client 3.1 
> The same items parse correctly, if written to a byte array and then a 
> ByteArrayInputStream on the byte array, is passed to parse.
> parser.parse(response.getResponseBodyAsStream());
> Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character 
> (NULL, unicode 0) encountered: not valid in any content
>  at [row,col {unknown-source}]: [3,56]
>         at 
> com.ctc.wstx.sr.StreamScanner.constructNullCharException(StreamScanner.java:615)
>         at 
> com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:644)
>         at 
> com.ctc.wstx.sr.BasicStreamReader.readTextPrimary(BasicStreamReader.java:4554)
>         at 
> com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2886)
>         at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)
>         at 
> org.apache.abdera.parser.stax.FOMBuilder.getNextElementToParse(FOMBuilder.java:163)
>         at org.apache.abdera.parser.stax.FOMBuilder.next(FOMBuilder.java:187) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to