[ https://issues.apache.org/jira/browse/XMLBEANS-295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wing Yew Poon reassigned XMLBEANS-295: -------------------------------------- Assignee: Cezar Andrei > setLoadStripWhitespace() api errors when trimming white space characters > ------------------------------------------------------------------------ > > Key: XMLBEANS-295 > URL: https://issues.apache.org/jira/browse/XMLBEANS-295 > Project: XMLBeans > Issue Type: Bug > Components: Validator > Affects Versions: Version 2.2.1 > Environment: SunOS 5.9 and Microsoft Windows XP SP2, Java 1.4.2 > Reporter: David RR Webber > Assignee: Cezar Andrei > Fix For: TBD > > > Situation Summary > We implemented to production using the setLoadStripWhitespace() api in > XMLBeans. After some days we started getting intermittent failures from > occasional XML transactions. > After a week of investigation we realized that flushText() method itself was > the cause - having eliminated all other factors. Specifically we have > determined that character strings containing the & character result in spaces > being stripped immediately after the & - e.g. <company>B & H Photo</company> > becomes <company>B &H Photo</company>. > We realize that there is a patch available for & processing - and we are > currently testing that to see if is cures the problem relating to & > (http://issues.apache.org/jira/browse/XMLBEANS-274 ) > However we are also seeing an intermittent problem in our UNIX environment > associated with colon : (could be other characters as well - we do not have > definitive list). What we found is intermittent spaces being trimmed in > various fields that do not contain "&" (the original XMLBEAN-274 bug > reported). This one we cannot reproduce in our Windows development systems - > but it is happening intermittently in SunOS. > Again space either immediately following the colon or in subsequent string is > stripped - for tokenized elements - e.g. <urgent>Yes: Y</urgent> becomes > <urgent>Yes:Y</urgent> and then the object returns NULL value because this is > then not a valid allowed value for the tokenized list. Similarly > <location>USA: United States</location> became <location>USA: > UnitedStates</location>. We suspect that there is a prior character before > the colon that might be triggering this behaviour but we have not yet > determined when or how. This illustrates how complex this issue is in terms > of the current XMLBeans implementation approach. > Analysis > We have looked at how and where XMLBeans is doing the white space trim during > the unmarshalling of the XML content. When it detects a white space - it > then invokes a stripRight() method loop. We are not convinced that this is > architecturally sound at the point it is employed - it is leading to > complexity and obviously a lot of edge conditions and some combinations of > characters that are not handled consistently and correctly. > Our preferred approach would be to defer the white space trim until > post-unmarshalling - so the initial process can treat the XML content "as is" > between the angle brackets - then once extracted - then apply the trim(). At > that point a simple java string object trim() can be employed. This could be > provided as an alternate method call to the current setLoadStripWhitespace() > api that would iterate through the entire structure of objects instead of the > original XML stream. The only check that would be necessary is if the XML > markup itself set the xml:space="preserve" attribute option for an element > object - in which case the trim() would be automatically skipped for that > content object item. What is happening right now is that the existing > flushText() method is mixing up XML markup and the content - instead there > needs to be a clear separation between the element angle brackets and > attribute quotes - and the content itself. > Again the caveat maybe here - maybe the current approach is intended to be > prior to error checking on tokenized lists - to prevent failure there due to > extra spaces? However - even so it is not cleanly enough separated - and > clearly again it would be simpler to use a java string class trim method > within the tokenized evaluation itself on just the string. > Suggested Solution > Re-factor the current white space setLoadStripWhitespace() api to delay > string manipulation on content until after unpacking of the content and XML > markup - instead of prior-to as is currently happening. This makes for much > simpler white space trim logic (can simply use the Java string class method) > that does not need to look for markup artifacts as well. > We are not clear on who owns this particular feature in XMLBeans - whether > they are currently available to assist on this - but we would be prepared to > work with the team to develop a better solution here. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@xmlbeans.apache.org For additional commands, e-mail: dev-h...@xmlbeans.apache.org