[jira] [Commented] (LUCENE-8666) NPE in o.a.l.codecs.perfield.PerFieldPostingsFormat

Jeremie Miserez (JIRA) Mon, 01 Apr 2019 08:35:08 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-8666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16806916#comment-16806916
 ]


Jeremie Miserez commented on LUCENE-8666:
-----------------------------------------

Thanks, had the same issue. Two comments concerning the patch:

1) There is an additional case a few lines above that results in the same NPE 
(when there are no clauses at all and allSpanClauses.length == 0):
{code:java}
if (numNegatives == 0) {
  // The simple case - no negative elements in phrase
  return new SpanNearQuery(allSpanClauses, slopFactor, inOrder);
}{code}
which would also need to be fixed the same way:
{code:java}
if (numNegatives == 0) {
  // The simple case - no negative elements in phrase
  if (allSpanClauses.length == 0) {
    // Invent a positive clause out of thin air.
    return new SpanTermQuery(new Term(field,
        "Dummy clause because no terms found - must match nothing"));
  }
  return new SpanNearQuery(allSpanClauses, slopFactor, inOrder);
}{code}
2) You mention "a single synthetic clause that matches either everything or 
nothing". I tested this with phrase queries and it seems to make no difference. 
However, it does indeed make a difference in the case where stop-words or 
special characters are stripped away by the Analyzer during query 
parsing/rewrite. The QueryParserBase#getBooleanQuery() method has a comment to 
that effect:
{code:java}
 protected Query getBooleanQuery(List<BooleanClause> clauses) throws 
ParseException {
  if (clauses.size()==0) {
    return null; // all clause words were filtered away by the analyzer.
  }
 // ...{code}
While returning a "*" wildcard query or a MatchAllDocsQuery or similar will 
work to prevent the NPE, it will yield wrong results for phrase queries or 
other queries: searching for a special char or stopword will then match 
everything which would be incorrect. So matching nothing like in the proposed 
patch is most likely the correct solution.

> NPE in o.a.l.codecs.perfield.PerFieldPostingsFormat 
> ----------------------------------------------------
>
>                 Key: LUCENE-8666
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8666
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/codecs
>    Affects Versions: 7.5, master (9.0)
>         Environment: Running on Unix, using a git checkout close to master.
> h2. Steps to reproduce
>  * Build commit ea2c8ba of Solr as described in the section below.
>  * Build the films collection as described below.
>  * Start the server using the command {{“./bin/solr start -f -p 8983 -s 
> /tmp/home”}}
>  * Request the URL above.
> h2. Compiling the server
> {noformat}
> git clone https://github.com/apache/lucene-solr
> cd lucene-solr
> git checkout ea2c8ba
> ant compile
> cd solr
> ant server
> {noformat}
> h2. Building the collection
> We followed Exercise 2 from the quick start tutorial 
> ([http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html#exercise-2]) - 
> for reference, I have attached a copy of the database.
> {noformat}
> mkdir -p /tmp/home
> echo '<?xml version="1.0" encoding="UTF-8" ?><solr></solr>' > 
> /tmp/home/solr.xml
> {noformat}
> In one terminal start a Solr instance in foreground:
> {noformat}
> ./bin/solr start -f -p 8983 -s /tmp/home
> {noformat}
> In another terminal, create a collection of movies, with no shards and no 
> replication:
> {noformat}
> bin/solr create -c films
> curl -X POST -H 'Content-type:application/json' --data-binary '\{"add-field": 
> {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' 
> http://localhost:8983/solr/films/schema
> curl -X POST -H 'Content-type:application/json' --data-binary 
> '{"add-copy-field" : {"source":"*","dest":"_text_"}}' 
> [http://localhost:8983/solr/films/schema]
> ./bin/post -c films example/films/films.json
> {noformat}
>            Reporter: Johannes Kloos
>            Priority: Minor
>              Labels: diffblue, newdev, patch-available
>         Attachments: 0001-Fix-NullPointerException.patch, home.zip
>
>
> Requesting this URL in SOLR gives a 500 error with a stack trace pointing to 
> Lucene:
> {{http://localhost:8983/solr/films/select?q=\{!complexphrase}genre:"-om*"}}
> The stack trace is (cut down to the reasonably relevant part):
> {{java.lang.NullPointerException\n\tat 
> java.util.TreeMap.getEntry(TreeMap.java:347)
> at java.util.TreeMap.get(TreeMap.java:278)
> at 
> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.terms(PerFieldPostingsFormat.java:311)
> at org.apache.lucene.index.CodecReader.terms(CodecReader.java:106)
> at org.apache.lucene.index.FilterLeafReader.terms(FilterLeafReader.java:351)
> at 
> org.apache.lucene.index.ExitableDirectoryReader$ExitableFilterAtomicReader.terms(ExitableDirectoryReader.java:91)
> at 
> org.apache.lucene.search.spans.SpanNearQuery$SpanNearWeight.getSpans(SpanNearQuery.java:208)
> at 
> org.apache.lucene.search.spans.SpanNotQuery$SpanNotWeight.getSpans(SpanNotQuery.java:127)
> at org.apache.lucene.search.spans.SpanWeight.scorer(SpanWeight.java:135)
> at org.apache.lucene.search.spans.SpanWeight.scorer(SpanWeight.java:46)
> at org.apache.lucene.search.Weight.bulkScorer(Weight.java:177)
> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:649)
> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:443)
> at 
> org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:200)
> at 
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1604)}}{{The
>  error is actually a bit deeper and can be traced back to the 
> o.a.l.queryparser.complexPhrase.ComplexPhraseQueryParser class.}}
> Handling this query involves constructing a SpanQuery, which happens in the 
> rewrite method of ComplexPhraseQueryParser. In particular, the expression is 
> decomposed into a BooleanQuery, which has exactly one clause, namely the 
> negative clause -genre:”om*”. The rewrite method then further transforms this 
> into a SpanQuery; in this case, it goes into the path that handles complex 
> queries with both positive and negative clauses. It extracts the subset of 
> positive clauses - note that this set of clauses is empty for this query. The 
> positive clauses are then combined into a SpanNearQuery (around line 340), 
> which is then used to build a SpanNotQuery. Further down the line, the field 
> attribute of the SpanNearQuery is accessed and used as an index into a 
> TreeMap. But since we had an empty set of positive clauses, the SpanNearQuery 
> does not have its field attribute set, so we get a null here - this leads to 
> an exception. A possible fix would be to detect the situation where we have 
> an empty set of positive clauses and include a single synthetic clause that 
> matches either everything or nothing. See attached file 
> 0001-Fix-NullPointerException.patch.
> This bug was found using [Diffblue Microservices 
> Testing|http://www.diffblue.com/labs]. Find more information on this [test 
> campaign|https://www.diffblue.com/blog/2018/12/19/diffblue-microservice-testing-a-sneak-peek-at-our-early-product-and-results].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-8666) NPE in o.a.l.codecs.perfield.PerFieldPostingsFormat

Reply via email to