[ 
https://issues.apache.org/jira/browse/XMLBEANS-295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wing Yew Poon reassigned XMLBEANS-295:
--------------------------------------

    Assignee: Cezar Andrei

> setLoadStripWhitespace() api errors when trimming white space characters
> ------------------------------------------------------------------------
>
>                 Key: XMLBEANS-295
>                 URL: https://issues.apache.org/jira/browse/XMLBEANS-295
>             Project: XMLBeans
>          Issue Type: Bug
>          Components: Validator
>    Affects Versions: Version 2.2.1
>         Environment: SunOS 5.9 and Microsoft Windows XP SP2, Java 1.4.2
>            Reporter: David RR Webber
>            Assignee: Cezar Andrei
>             Fix For: TBD
>
>
> Situation Summary
> We implemented to production using the setLoadStripWhitespace() api in 
> XMLBeans.  After some days we started getting intermittent failures from 
> occasional XML transactions.
> After a week of investigation we realized that flushText() method itself was 
> the cause - having eliminated all other factors.  Specifically we have 
> determined that character strings containing the & character result in spaces 
> being stripped immediately after the & - e.g. <company>B & H Photo</company> 
> becomes <company>B &H Photo</company>.
> We realize that there is a patch available for & processing - and we are 
> currently testing that to see if is cures the problem relating to & 
> (http://issues.apache.org/jira/browse/XMLBEANS-274 )
> However we are also seeing an intermittent problem in our UNIX environment 
> associated with colon : (could be other characters as well - we do not have 
> definitive list). What we found is intermittent spaces being trimmed in 
> various fields that do not contain "&" (the original XMLBEAN-274 bug 
> reported).  This one we cannot reproduce in our Windows development systems - 
> but it is happening intermittently in SunOS. 
> Again space either immediately following the colon or in subsequent string is 
> stripped - for tokenized elements - e.g.  <urgent>Yes: Y</urgent>  becomes 
> <urgent>Yes:Y</urgent> and then the object returns NULL value because this is 
> then not a valid allowed value for the tokenized list. Similarly 
> <location>USA: United States</location> became <location>USA: 
> UnitedStates</location>.  We suspect that there is a prior character before 
> the colon that might be triggering this behaviour but we have not yet 
> determined when or how.  This illustrates how complex this issue is in terms 
> of the current XMLBeans implementation approach.
> Analysis
> We have looked at how and where XMLBeans is doing the white space trim during 
> the unmarshalling of the XML content.  When it detects a white space - it 
> then invokes a stripRight() method loop.  We are not convinced that this is 
> architecturally sound at the point it is employed - it is leading to 
> complexity and obviously a lot of edge conditions and some combinations of 
> characters that are not handled consistently and correctly.
> Our preferred approach would be to defer the white space trim until 
> post-unmarshalling - so the initial process can treat the XML content "as is" 
> between the angle brackets - then once extracted - then apply the trim().  At 
> that point a simple java string object trim() can be employed.  This could be 
> provided as an alternate method call to the current setLoadStripWhitespace() 
> api that would iterate through the entire structure of objects instead of the 
> original XML stream.  The only check that would be necessary is if the XML 
> markup itself set the xml:space="preserve" attribute option for an element 
> object - in which case the trim() would be automatically skipped for that 
> content object item.  What is happening right now is that the existing 
> flushText() method is mixing up XML markup and the content - instead there 
> needs to be a clear separation between the element angle brackets and 
> attribute quotes - and the content itself.
> Again the caveat maybe here - maybe the current approach is intended to be 
> prior to error checking on tokenized lists - to prevent failure there due to 
> extra spaces?   However - even so it is not cleanly enough separated - and 
> clearly again it would be simpler to use a java string class trim method 
> within the tokenized evaluation itself on just the string.
> Suggested Solution
> Re-factor the current white space setLoadStripWhitespace() api to delay 
> string manipulation on content until after unpacking of the content and XML 
> markup - instead of prior-to as is currently happening.  This makes for much 
> simpler white space trim logic (can simply use the Java string class method) 
> that does not need to look for markup artifacts as well.
> We are not clear on who owns this particular feature in XMLBeans - whether 
> they are currently available to assist on this - but we would be prepared to 
> work with the team to develop a better solution here.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@xmlbeans.apache.org
For additional commands, e-mail: dev-h...@xmlbeans.apache.org

Reply via email to