[
https://issues.apache.org/jira/browse/SOLR-11136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16097029#comment-16097029
]
ASF subversion and git services commented on SOLR-11136:
--------------------------------------------------------
Commit eae2efcbd9cffadc94aaf89691eea4ee453940e2 in lucene-solr's branch
refs/heads/master from Chris Hostetter
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=eae2efcb ]
SOLR-11136: Fix solrj XMLResponseParser when nested docs transformer is used
with indented XML
> XMLResponseParser.readDocument makes dangerous assumptions / fails when
> indent=true and [child] doc transformer
> ---------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-11136
> URL: https://issues.apache.org/jira/browse/SOLR-11136
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Hoss Man
> Assignee: Hoss Man
> Attachments: SOLR-11136.patch
>
>
> Some buggy code in XMLResponseParser.readDocument causes it to indirectly
> assume that once it encounters a nested START_ELEMENT 'doc' (which it can
> recursively parse) the only other XML stream events it will find will either
> be an END_ELEMENT, or more 'doc' START_ELEMENTs...
> {code}
> protected SolrDocument readDocument( XMLStreamReader parser ) throws
> XMLStreamException
> {
> if( XMLStreamConstants.START_ELEMENT != parser.getEventType() ) {
> throw new RuntimeException( "must be start element, not:
> "+parser.getEventType() );
> }
> // ...
> while( true )
> {
> switch (parser.next()) {
> case XMLStreamConstants.START_ELEMENT:
> depth++;
> builder.setLength( 0 ); // reset the text
> type = KnownType.get( parser.getLocalName() );
> // ...
>
> // NOTE: nothing in this loop modifies 'type'
> // so the 'while' is totally inappropriate even if there was no bug
> while( type == KnownType.DOC) {
> doc.addChildDocument(readDocument(parser));
> int event = parser.next(); //
> PROBLEMATIC
> if (event == XMLStreamConstants.END_ELEMENT) { //Doc ends
> return doc;
> }
> }
>
> // ...
> {code}
> Because of how the server side XML Writer code works, it's _currently_ true
> that child documents should always come "after" any other fields or
> transformers -- but depending on that is sketchy. Where this code actually
> causes real problems is if the server/client uses {{indent=true}} because
> then the {{parser.next();}} call (labeled {{PROBLEMATIC}}) can return
> {{XMLStreamConstants.CHARACTER}} (or {{XMLStreamConstants.WHITESPACE}})
> because the blank space inbetween sibling child docs, or after the last child
> doc, causing the recursive call to {{readDocument(parser)}} to fail (because
> it expects to find the reader positioned at a START_ELEMENT)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]