Re: SingleTerm vs MultiTerm in PhraseWildCardQuery class in the sandbox Lucene

Michael Froh Mon, 17 Feb 2020 12:01:43 -0800

Hi Baris,

The idea with PhraseWildcardQuery is that you can mix literal "exact" terms
with "MultiTerms" (i.e. any subclass of MultiTermQuery). Using addTerm is
for exact terms, while addMultiTerm is for things that may match a number
of possible terms in the given position.

If you want to search for term1 followed by any term that starts with a
given character, I would suggest using:

int maxMultiTermExpansions = ...; // Discussed below
PhraseWildCardQuery.Builder builder = new PhraseWildcardQuery("field",
maxMultiTermExpansions);
builder.addTerm(new BytesRef("term1")); // Add fixed term in position 0
builder.addMultiTerm(new PrefixQuery(new Term("field", "term2FirstChar")));
// Add multiterm in position 1
Query q = builder.build();

The PrefixQuery effectively gets expanded into a bunch of possible terms,
based on the term dictionary on each index segment. To avoid expanding to
cover too many terms (say, if you added a bunch of WildcardQuery),
maxMultiTermExpansions serves as a guard rail, to put a rough bound on
memory consumption and query execution time. If you're interested in
details of how the maxMultiTermExpansions budget is distributed across
MultiTerms, check out PhraseWildcardQuery.createWeight. If you're just
running an experiment in your IDE, you could probably set
maxMultiTermExpansions to Integer.MAX_VALUE. (If you're running in a
production environment, it's likely a good idea to tune it down based on
your memory/latency constraints.)

Incidentally, for tracking down the source code for anything in Lucene,
it's probably better to go to GitHub for the most up-to-date source:
https://github.com/apache/lucene-solr/blob/master/lucene/sandbox/src/java/org/apache/lucene/search/PhraseWildcardQuery.java
.

Hope that helps,
Michael

On Thu, 13 Feb 2020 at 12:29, <baris.ka...@oracle.com> wrote:

> Hi,-
>
> i hope everyone is doing great.
>
>   if i want to do the following search with PhraseWildCardQuery and
> thanks to this forum for letting me know about this class (Especially to
> David and Bruno)
>
> term1 term2FirstChar*
>
> i need to do two ways: (i found the source code at
>
> https://fossies.org/linux/lucene/sandbox/src/java/org/apache/lucene/search/PhraseWildcardQuery.java
> )
>
> /*
>
> maxMultiTermExpansions - The maximum number of expansions across all
> multi-terms and across all segments. It counts expansions for each
> segments individually, that allows optimizations per segment and unused
> expansions are credited to next segments. This is different from
> MultiPhraseQuery and SpanMultiTermQueryWrapper which have an expansion
> limit per multi-term.
>
> segmentOptimizationEnabled - Whether to enable the segment optimization
> which consists in ignoring a segment for further analysis as soon as a
> term is not present inside it. This optimizes the query execution
> performance but changes the scoring. The result ranking is preserved.
>
> */
>
>
> 1st way:
>
> PhraseWildCardQuery.Builder builder = PharseWildCardQuery.Builder(field,
> 2 _*/<<< i dont know what number to use here for
> maxMultiTermExpansions>>>/*_, true/*boolean segmentOptimizationEnabled*/)
>
> pwcqBuilder.addTerm(field, new Term(field, "term1"));
>
> pwcqBuilder.addTerm(field,new Term(field, "term2FirstChar"));
>
> PhraseWildCardQuery pwcq = pwcqBuilder.build();
>
> or
>
> 2nd way:
>
> pwcqBuilder.addMultiTerm(MultiTermQuery object here contaning {field,
> "term1"} and {field ,"term2FirstChar"});
>
> PhraseWildCardQuery pwcq = pwcqBuilder.build();
>
>
> Then this pwcq object will be fed into IndexSearcher's as the query
> parameter.
>
>
> Now, it looks like the first way will not consider expansions or in
> other words wildcard? Am i right?
>
> i also need to understand this maxMultiTermExpansions parameter better.
> For instance if first way is used, will maxMultiTermExpansions be
> meaningful?
>
>
> Thanks
>
>

Re: SingleTerm vs MultiTerm in PhraseWildCardQuery class in the sandbox Lucene

Reply via email to