[ http://issues.apache.org/jira/browse/XMLBEANS-274?page=all ]
Peter Rodgers updated XMLBEANS-274: ----------------------------------- Summary: Over zealous whitespace cropping after parsing entity like & (was: Over zelous whitespace cropping after parsing entity like &) Description: When white space stripping is specified the parser does not detect XML entities such as & and strips the whitespace following each entity. For example <root>dog & cat</root> is parsed as <root>dog &cat</root> The cause of the problem is the stripLeft() method in the org.apache.xmlbeans.impl.store.CharUtil Below is a fixed version of the method that detects the ';' character after an entity which indicates that whitespace is significant and must be preserved. Note this code does not fix the case where the iteration is a for loop. public Object stripLeft ( Object src, int off, int cch ) { assert isValid( src, off, cch ); if (cch > 0) { if (src instanceof char[]) { char[] chars = (char[]) src; while ( cch > 0 && isWhiteSpace( chars[ off ] ) && chars[off - 1]!=';' ) //Fix for & etc { cch--; off++; } } else if (src instanceof String) { String s = (String) src; while ( cch > 0 && isWhiteSpace( s.charAt( off ) ) && s.charAt(off - 1)!=';' ) //Fix for & etc { cch--; off++; } } else { int count = 0; for ( _charIter.init( src, off, cch ) ; _charIter.hasNext() ; count++ ) if (!isWhiteSpace( _charIter.next() )) break; _charIter.release(); off += count; } } if (cch == 0) { _offSrc = 0; _cchSrc = 0; return null; } _offSrc = off; _cchSrc = cch; return src; } was: When white space stripping is specified the parser does not detect XML entities such as & and strips the whitespace following each entity. For example <root>dog & cat</root> is parsed as <root>doc &cat</root> The cause of the problem is the stripLeft() method in the org.apache.xmlbeans.impl.store.CharUtil Below is a fixed version of the method that detects the ';' character after an entity which indicates that whitespace is significant and must be preserved. Note this code does not fix the case where the iteration is a for loop. public Object stripLeft ( Object src, int off, int cch ) { assert isValid( src, off, cch ); if (cch > 0) { if (src instanceof char[]) { char[] chars = (char[]) src; while ( cch > 0 && isWhiteSpace( chars[ off ] ) && chars[off - 1]!=';' ) //Fix for & etc { cch--; off++; } } else if (src instanceof String) { String s = (String) src; while ( cch > 0 && isWhiteSpace( s.charAt( off ) ) && s.charAt(off - 1)!=';' ) //Fix for & etc { cch--; off++; } } else { int count = 0; for ( _charIter.init( src, off, cch ) ; _charIter.hasNext() ; count++ ) if (!isWhiteSpace( _charIter.next() )) break; _charIter.release(); off += count; } } if (cch == 0) { _offSrc = 0; _cchSrc = 0; return null; } _offSrc = off; _cchSrc = cch; return src; } > Over zealous whitespace cropping after parsing entity like & > ---------------------------------------------------------------- > > Key: XMLBEANS-274 > URL: http://issues.apache.org/jira/browse/XMLBEANS-274 > Project: XMLBeans > Type: Bug > Versions: Version 2.1 > Environment: All > Reporter: Peter Rodgers > > When white space stripping is specified the parser does not detect XML > entities such as & and strips the whitespace following each entity. > For example > <root>dog & cat</root> > is parsed as > <root>dog &cat</root> > The cause of the problem is the stripLeft() method in the > org.apache.xmlbeans.impl.store.CharUtil > Below is a fixed version of the method that detects the ';' character after > an entity which indicates that whitespace is significant and must be > preserved. Note this code does not fix the case where the iteration is a for > loop. > public Object stripLeft ( Object src, int off, int cch ) > { > assert isValid( src, off, cch ); > if (cch > 0) > { > if (src instanceof char[]) > { > char[] chars = (char[]) src; > while ( cch > 0 && isWhiteSpace( chars[ off ] ) && chars[off > - 1]!=';' ) //Fix for & etc > { cch--; off++; } > } > else if (src instanceof String) > { > String s = (String) src; > while ( cch > 0 && isWhiteSpace( s.charAt( off ) ) && > s.charAt(off - 1)!=';' ) //Fix for & etc > { cch--; off++; } > } > else > { > int count = 0; > > for ( _charIter.init( src, off, cch ) ; _charIter.hasNext() ; > count++ ) > if (!isWhiteSpace( _charIter.next() )) > break; > > _charIter.release(); > off += count; > } > } > if (cch == 0) > { > _offSrc = 0; > _cchSrc = 0; > > return null; > } > _offSrc = off; > _cchSrc = cch; > return src; > } -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]