[jira] Commented: (AXIS2C-859) guththila parser fails to handle escape sequences for ampersand, less than, greater than

Bill Mitchell (JIRA) Wed, 30 Jan 2008 14:39:55 -0800

    [ 
https://issues.apache.org/jira/browse/AXIS2C-859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12564199#action_12564199
 ]


Bill Mitchell commented on AXIS2C-859:
--------------------------------------

Lahira, after yesterday I researched again the XML spec and I find that it says 
that replacement of XML characters and entity references happens on the URI to 
generate the normalized value.  So it seems we have to do this character 
replacement logic on the attribute value string before we process it as a 
possible namespace declaration.  Just another extra wrinkle.  

My "second" item above alluded to a different solution, built into 
guththila_next() instead of guththila_token_close().  One could imagine, in the 
"right" loops in guththila_next where we are looking at the characters one at a 
time anyway, we could detect the leading ampersand, check the next 4 or 5 
characters against the XML character reference values, and replace the 
character there, again as above sliding the leading part of the token to abut 
the smaller single character.  This would avoid a second pass over the token 
characters looking for the ampersands, but I suspect it would make 
guththila_next() much harder to understand than it already is.  So my second 
point above was just to say that I think you have chosen the better approach, 
to handle this issue of XML character entities in guthtila_token_close() well 
separate from the token parsing in guththila_next().

> guththila parser fails to handle escape sequences for ampersand, less than, 
> greater than
> ----------------------------------------------------------------------------------------
>
>                 Key: AXIS2C-859
>                 URL: https://issues.apache.org/jira/browse/AXIS2C-859
>             Project: Axis2-C
>          Issue Type: Bug
>          Components: guththila
>    Affects Versions: Current (Nightly)
>         Environment: Windows XP, Visual Studio 2005, guththila parser, libcurl
>            Reporter: Bill Mitchell
>         Attachments: diff.txt
>
>
> When an incoming message contains within text the escaped ampersand sequence, 
> "&amp;", this sequence is being passed to the client as raw text without 
> being converted to the single ampersand character.  Clearly, this action must 
> take place at the level of the parser, as only the parser knows whether it is 
> seeing simple text, and conversion is required, or text embedded in a CDATA 
> section, where conversion is not allowed.  I have tested the build with the 
> libxml parser, and of course the libxml parser behaves correctly: the text 
> passed to the client contains only the single ampersand character, not the 
> escaped sequence.  (See section 2.4 of XML 1.0 spec.)
> Looking at the code, I expect the same problem occurs with all escaped 
> sequences, less than and greater than as well as ampersand, on both input and 
> output.  I also don't see where CDATA sections are handled, but as I am not 
> seeing CDATA in the messages from the service I am hitting, I have not tested 
> this case.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (AXIS2C-859) guththila parser fails to handle escape sequences for ampersand, less than, greater than

Reply via email to