Hoss Man created SOLR-11136:
-------------------------------

             Summary: XMLResponseParser.readDocument makes dangerous 
assumptions / fails when indent=true and [child] doc transformer
                 Key: SOLR-11136
                 URL: https://issues.apache.org/jira/browse/SOLR-11136
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Hoss Man
            Assignee: Hoss Man



Some buggy code in XMLResponseParser.readDocument causes it to indirectly 
assume that once it encounters a nested START_ELEMENT 'doc' (which it can 
recursively parse) the only other XML stream events it will find will either be 
an END_ELEMENT, or more 'doc' START_ELEMENTs...

{code}
protected SolrDocument readDocument( XMLStreamReader parser ) throws 
XMLStreamException
{
  if( XMLStreamConstants.START_ELEMENT != parser.getEventType() ) {
    throw new RuntimeException( "must be start element, not: 
"+parser.getEventType() );
  }

  // ...

  while( true ) 
  {
    switch (parser.next()) {
    case XMLStreamConstants.START_ELEMENT:
      depth++;
      builder.setLength( 0 ); // reset the text
      type = KnownType.get( parser.getLocalName() );

      // ...
      
      // NOTE: nothing in this loop modifies 'type' 
      // so the 'while' is totally inappropriate even if there was no bug
      while( type == KnownType.DOC) {
        doc.addChildDocument(readDocument(parser));
        int event = parser.next();                                // PROBLEMATIC
        if (event == XMLStreamConstants.END_ELEMENT) { //Doc ends
          return doc;
        }
      }
      
      // ...
{code}

Because of how the server side XML Writer code works, it's _currently_ true 
that child documents should always come "after" any other fields or 
transformers -- but depending on that is sketchy.  Where this code actually 
causes real problems is if the server/client uses {{indent=true}} because then 
the {{parser.next();}} call (labeled {{PROBLEMATIC}}) can return 
{{XMLStreamConstants.CHARACTER}} (or {{XMLStreamConstants.WHITESPACE}}) because 
the blank space inbetween sibling child docs, or after the last child doc, 
causing the recursive call to {{readDocument(parser)}} to fail (because it 
expects to find the reader positioned at a START_ELEMENT)




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to