Re: SingleTerm vs MultiTerm in PhraseWildCardQuery class in the sandbox Lucene

baris . kazar Tue, 18 Feb 2020 13:53:00 -0800

Michael and Forum,-
Thanks for thegreat explanations.

one question please:


why is PrefixQuery used instead of WildCardQuery in the below snippet?

Best regards

> On Feb 17, 2020, at 3:01 PM, Michael Froh <msf...@gmail.com> wrote:
> 
> Hi Baris,
> 
> The idea with PhraseWildcardQuery is that you can mix literal "exact" terms
> with "MultiTerms" (i.e. any subclass of MultiTermQuery). Using addTerm is
> for exact terms, while addMultiTerm is for things that may match a number
> of possible terms in the given position.
> 
> If you want to search for term1 followed by any term that starts with a
> given character, I would suggest using:
> 
> int maxMultiTermExpansions = ...; // Discussed below
> PhraseWildCardQuery.Builder builder = new PhraseWildcardQuery("field",
> maxMultiTermExpansions);
> builder.addTerm(new BytesRef("term1")); // Add fixed term in position 0
> builder.addMultiTerm(new PrefixQuery(new Term("field", "term2FirstChar")));
> // Add multiterm in position 1
> Query q = builder.build();
> 
> The PrefixQuery effectively gets expanded into a bunch of possible terms,
> based on the term dictionary on each index segment. To avoid expanding to
> cover too many terms (say, if you added a bunch of WildcardQuery),
> maxMultiTermExpansions serves as a guard rail, to put a rough bound on
> memory consumption and query execution time. If you're interested in
> details of how the maxMultiTermExpansions budget is distributed across
> MultiTerms, check out PhraseWildcardQuery.createWeight. If you're just
> running an experiment in your IDE, you could probably set
> maxMultiTermExpansions to Integer.MAX_VALUE. (If you're running in a
> production environment, it's likely a good idea to tune it down based on
> your memory/latency constraints.)
> 
> Incidentally, for tracking down the source code for anything in Lucene,
> it's probably better to go to GitHub for the most up-to-date source:
> https://urldefense.com/v3/__https://github.com/apache/lucene-solr/blob/master/lucene/sandbox/src/java/org/apache/lucene/search/PhraseWildcardQuery.java__;!!GqivPVa7Brio!ONqQgLIltNBUuSo5Cn_Fz7-wuR1LQv68YS_z-6g7X-S86PHQtT9tKl7VbIq9tVLYyw$
>  
> .
> 
> Hope that helps,
> Michael
> 
>> On Thu, 13 Feb 2020 at 12:29, <baris.ka...@oracle.com> wrote:
>> 
>> Hi,-
>> 
>> i hope everyone is doing great.
>> 
>>  if i want to do the following search with PhraseWildCardQuery and
>> thanks to this forum for letting me know about this class (Especially to
>> David and Bruno)
>> 
>> term1 term2FirstChar*
>> 
>> i need to do two ways: (i found the source code at
>> 
>> https://urldefense.com/v3/__https://fossies.org/linux/lucene/sandbox/src/java/org/apache/lucene/search/PhraseWildcardQuery.java__;!!GqivPVa7Brio!ONqQgLIltNBUuSo5Cn_Fz7-wuR1LQv68YS_z-6g7X-S86PHQtT9tKl7VbIpV8n29nQ$
>>  
>> )
>> 
>> /*
>> 
>> maxMultiTermExpansions - The maximum number of expansions across all
>> multi-terms and across all segments. It counts expansions for each
>> segments individually, that allows optimizations per segment and unused
>> expansions are credited to next segments. This is different from
>> MultiPhraseQuery and SpanMultiTermQueryWrapper which have an expansion
>> limit per multi-term.
>> 
>> segmentOptimizationEnabled - Whether to enable the segment optimization
>> which consists in ignoring a segment for further analysis as soon as a
>> term is not present inside it. This optimizes the query execution
>> performance but changes the scoring. The result ranking is preserved.
>> 
>> */
>> 
>> 
>> 1st way:
>> 
>> PhraseWildCardQuery.Builder builder = PharseWildCardQuery.Builder(field,
>> 2 _*/<<< i dont know what number to use here for
>> maxMultiTermExpansions>>>/*_, true/*boolean segmentOptimizationEnabled*/)
>> 
>> pwcqBuilder.addTerm(field, new Term(field, "term1"));
>> 
>> pwcqBuilder.addTerm(field,new Term(field, "term2FirstChar"));
>> 
>> PhraseWildCardQuery pwcq = pwcqBuilder.build();
>> 
>> or
>> 
>> 2nd way:
>> 
>> pwcqBuilder.addMultiTerm(MultiTermQuery object here contaning {field,
>> "term1"} and {field ,"term2FirstChar"});
>> 
>> PhraseWildCardQuery pwcq = pwcqBuilder.build();
>> 
>> 
>> Then this pwcq object will be fed into IndexSearcher's as the query
>> parameter.
>> 
>> 
>> Now, it looks like the first way will not consider expansions or in
>> other words wildcard? Am i right?
>> 
>> i also need to understand this maxMultiTermExpansions parameter better.
>> For instance if first way is used, will maxMultiTermExpansions be
>> meaningful?
>> 
>> 
>> Thanks
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: SingleTerm vs MultiTerm in PhraseWildCardQuery class in the sandbox Lucene

Reply via email to