[jira] [Commented] (SOLR-11136) XMLResponseParser.readDocument makes dangerous assumptions / fails when indent=true and [child] doc transformer

ASF subversion and git services (JIRA) Fri, 21 Jul 2017 17:35:08 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-11136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16097031#comment-16097031
 ]


ASF subversion and git services commented on SOLR-11136:
--------------------------------------------------------

Commit c4d85a5849b8ae09fc957af94ca042fd5ad225dd in lucene-solr's branch 
refs/heads/branch_7_0 from Chris Hostetter
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c4d85a5 ]

SOLR-11136: Fix solrj XMLResponseParser when nested docs transformer is used 
with indented XML

(cherry picked from commit eae2efcbd9cffadc94aaf89691eea4ee453940e2)


> XMLResponseParser.readDocument makes dangerous assumptions / fails when 
> indent=true and [child] doc transformer
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-11136
>                 URL: https://issues.apache.org/jira/browse/SOLR-11136
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>            Assignee: Hoss Man
>             Fix For: 7.0, master (8.0), 7.1
>
>         Attachments: SOLR-11136.patch
>
>
> Some buggy code in XMLResponseParser.readDocument causes it to indirectly 
> assume that once it encounters a nested START_ELEMENT 'doc' (which it can 
> recursively parse) the only other XML stream events it will find will either 
> be an END_ELEMENT, or more 'doc' START_ELEMENTs...
> {code}
> protected SolrDocument readDocument( XMLStreamReader parser ) throws 
> XMLStreamException
> {
>   if( XMLStreamConstants.START_ELEMENT != parser.getEventType() ) {
>     throw new RuntimeException( "must be start element, not: 
> "+parser.getEventType() );
>   }
>   // ...
>   while( true ) 
>   {
>     switch (parser.next()) {
>     case XMLStreamConstants.START_ELEMENT:
>       depth++;
>       builder.setLength( 0 ); // reset the text
>       type = KnownType.get( parser.getLocalName() );
>       // ...
>       
>       // NOTE: nothing in this loop modifies 'type' 
>       // so the 'while' is totally inappropriate even if there was no bug
>       while( type == KnownType.DOC) {
>         doc.addChildDocument(readDocument(parser));
>         int event = parser.next();                                // 
> PROBLEMATIC
>         if (event == XMLStreamConstants.END_ELEMENT) { //Doc ends
>           return doc;
>         }
>       }
>       
>       // ...
> {code}
> Because of how the server side XML Writer code works, it's _currently_ true 
> that child documents should always come "after" any other fields or 
> transformers -- but depending on that is sketchy.  Where this code actually 
> causes real problems is if the server/client uses {{indent=true}} because 
> then the {{parser.next();}} call (labeled {{PROBLEMATIC}}) can return 
> {{XMLStreamConstants.CHARACTER}} (or {{XMLStreamConstants.WHITESPACE}}) 
> because the blank space inbetween sibling child docs, or after the last child 
> doc, causing the recursive call to {{readDocument(parser)}} to fail (because 
> it expects to find the reader positioned at a START_ELEMENT)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-11136) XMLResponseParser.readDocument makes dangerous assumptions / fails when indent=true and [child] doc transformer

Reply via email to