Hoss Man created SOLR-11136:
-------------------------------
Summary: XMLResponseParser.readDocument makes dangerous
assumptions / fails when indent=true and [child] doc transformer
Key: SOLR-11136
URL: https://issues.apache.org/jira/browse/SOLR-11136
Project: Solr
Issue Type: Bug
Security Level: Public (Default Security Level. Issues are Public)
Reporter: Hoss Man
Assignee: Hoss Man
Some buggy code in XMLResponseParser.readDocument causes it to indirectly
assume that once it encounters a nested START_ELEMENT 'doc' (which it can
recursively parse) the only other XML stream events it will find will either be
an END_ELEMENT, or more 'doc' START_ELEMENTs...
{code}
protected SolrDocument readDocument( XMLStreamReader parser ) throws
XMLStreamException
{
if( XMLStreamConstants.START_ELEMENT != parser.getEventType() ) {
throw new RuntimeException( "must be start element, not:
"+parser.getEventType() );
}
// ...
while( true )
{
switch (parser.next()) {
case XMLStreamConstants.START_ELEMENT:
depth++;
builder.setLength( 0 ); // reset the text
type = KnownType.get( parser.getLocalName() );
// ...
// NOTE: nothing in this loop modifies 'type'
// so the 'while' is totally inappropriate even if there was no bug
while( type == KnownType.DOC) {
doc.addChildDocument(readDocument(parser));
int event = parser.next(); // PROBLEMATIC
if (event == XMLStreamConstants.END_ELEMENT) { //Doc ends
return doc;
}
}
// ...
{code}
Because of how the server side XML Writer code works, it's _currently_ true
that child documents should always come "after" any other fields or
transformers -- but depending on that is sketchy. Where this code actually
causes real problems is if the server/client uses {{indent=true}} because then
the {{parser.next();}} call (labeled {{PROBLEMATIC}}) can return
{{XMLStreamConstants.CHARACTER}} (or {{XMLStreamConstants.WHITESPACE}}) because
the blank space inbetween sibling child docs, or after the last child doc,
causing the recursive call to {{readDocument(parser)}} to fail (because it
expects to find the reader positioned at a START_ELEMENT)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]