On Feb 18, 2020, at 8:33 PM, Michael Froh <msf...@gmail.com> wrote:
In your example, it looks like you wanted the second term to match
based on the first character, or prefix, of the term.
While you could use a WildcardQuery with a term value of
"term2FirstChar*", PrefixQuery seemed like the simpler approach.
WildcardQuery can handle more general cases, like if you want to
match on something like "a*b*c".
Technically, the PrefixQuery compiles down to a slightly simpler
automaton, but I only figured that out by writing a simple unit test:
public void testAutomata() {
Automaton prefixAutomaton = PrefixQuery.toAutomaton(new
BytesRef("a"));
Automaton wildcardAutomaton = WildcardQuery.toAutomaton(new
Term("foo", "a*"));
System.out.println("PrefixQuery(\"a\")");
System.out.println(prefixAutomaton.toDot());
System.out.println("WildcardQuery(\"a*\")");
System.out.println(wildcardAutomaton.toDot());
}
That produces the following output:
PrefixQuery("a")
digraph Automaton {
rankdir = LR
node [width=0.2, height=0.2, fontsize=8]
initial [shape=plaintext,label=""]
initial -> 0
0 [shape=circle,label="0"]
0 -> 1 [label="a"]
1 [shape=doublecircle,label="1"]
1 -> 1 [label="\\U00000000-\\U000000ff"]
}
WildcardQuery("a*")
digraph Automaton {
rankdir = LR
node [width=0.2, height=0.2, fontsize=8]
initial [shape=plaintext,label=""]
initial -> 0
0 [shape=circle,label="0"]
0 -> 1 [label="a"]
1 [shape=doublecircle,label="1"]
1 -> 2 [label="\\U00000000-\\U0010ffff"]
2 [shape=doublecircle,label="2"]
2 -> 2 [label="\\U00000000-\\U0010ffff"]
}
On Tue, 18 Feb 2020 at 13:52, <baris.ka...@oracle.com
<mailto:baris.ka...@oracle.com>> wrote:
Michael and Forum,-
Thanks for thegreat explanations.
one question please:
why is PrefixQuery used instead of WildCardQuery in the below
snippet?
Best regards
> On Feb 17, 2020, at 3:01 PM, Michael Froh <msf...@gmail.com
<mailto:msf...@gmail.com>> wrote:
>
> Hi Baris,
>
> The idea with PhraseWildcardQuery is that you can mix literal
"exact" terms
> with "MultiTerms" (i.e. any subclass of MultiTermQuery). Using
addTerm is
> for exact terms, while addMultiTerm is for things that may
match a number
> of possible terms in the given position.
>
> If you want to search for term1 followed by any term that
starts with a
> given character, I would suggest using:
>
> int maxMultiTermExpansions = ...; // Discussed below
> PhraseWildCardQuery.Builder builder = new
PhraseWildcardQuery("field",
> maxMultiTermExpansions);
> builder.addTerm(new BytesRef("term1")); // Add fixed term in
position 0
> builder.addMultiTerm(new PrefixQuery(new Term("field",
"term2FirstChar")));
> // Add multiterm in position 1
> Query q = builder.build();
>
> The PrefixQuery effectively gets expanded into a bunch of
possible terms,
> based on the term dictionary on each index segment. To avoid
expanding to
> cover too many terms (say, if you added a bunch of
WildcardQuery),
> maxMultiTermExpansions serves as a guard rail, to put a rough
bound on
> memory consumption and query execution time. If you're
interested in
> details of how the maxMultiTermExpansions budget is distributed
across
> MultiTerms, check out PhraseWildcardQuery.createWeight. If
you're just
> running an experiment in your IDE, you could probably set
> maxMultiTermExpansions to Integer.MAX_VALUE. (If you're running
in a
> production environment, it's likely a good idea to tune it down
based on
> your memory/latency constraints.)
>
> Incidentally, for tracking down the source code for anything in
Lucene,
> it's probably better to go to GitHub for the most up-to-date
source:
>
https://urldefense.com/v3/__https://github.com/apache/lucene-solr/blob/master/lucene/sandbox/src/java/org/apache/lucene/search/PhraseWildcardQuery.java__;!!GqivPVa7Brio!ONqQgLIltNBUuSo5Cn_Fz7-wuR1LQv68YS_z-6g7X-S86PHQtT9tKl7VbIq9tVLYyw$
> .
>
> Hope that helps,
> Michael
>
>> On Thu, 13 Feb 2020 at 12:29, <baris.ka...@oracle.com
<mailto:baris.ka...@oracle.com>> wrote:
>>
>> Hi,-
>>
>> i hope everyone is doing great.
>>
>> if i want to do the following search with
PhraseWildCardQuery and
>> thanks to this forum for letting me know about this class
(Especially to
>> David and Bruno)
>>
>> term1 term2FirstChar*
>>
>> i need to do two ways: (i found the source code at
>>
>>
https://urldefense.com/v3/__https://fossies.org/linux/lucene/sandbox/src/java/org/apache/lucene/search/PhraseWildcardQuery.java__;!!GqivPVa7Brio!ONqQgLIltNBUuSo5Cn_Fz7-wuR1LQv68YS_z-6g7X-S86PHQtT9tKl7VbIpV8n29nQ$
>> )
>>
>> /*
>>
>> maxMultiTermExpansions - The maximum number of expansions
across all
>> multi-terms and across all segments. It counts expansions for
each
>> segments individually, that allows optimizations per segment
and unused
>> expansions are credited to next segments. This is different from
>> MultiPhraseQuery and SpanMultiTermQueryWrapper which have an
expansion
>> limit per multi-term.
>>
>> segmentOptimizationEnabled - Whether to enable the segment
optimization
>> which consists in ignoring a segment for further analysis as
soon as a
>> term is not present inside it. This optimizes the query
execution
>> performance but changes the scoring. The result ranking is
preserved.
>>
>> */
>>
>>
>> 1st way:
>>
>> PhraseWildCardQuery.Builder builder =
PharseWildCardQuery.Builder(field,
>> 2 _*/<<< i dont know what number to use here for
>> maxMultiTermExpansions>>>/*_, true/*boolean
segmentOptimizationEnabled*/)
>>
>> pwcqBuilder.addTerm(field, new Term(field, "term1"));
>>
>> pwcqBuilder.addTerm(field,new Term(field, "term2FirstChar"));
>>
>> PhraseWildCardQuery pwcq = pwcqBuilder.build();
>>
>> or
>>
>> 2nd way:
>>
>> pwcqBuilder.addMultiTerm(MultiTermQuery object here contaning
{field,
>> "term1"} and {field ,"term2FirstChar"});
>>
>> PhraseWildCardQuery pwcq = pwcqBuilder.build();
>>
>>
>> Then this pwcq object will be fed into IndexSearcher's as the
query
>> parameter.
>>
>>
>> Now, it looks like the first way will not consider expansions
or in
>> other words wildcard? Am i right?
>>
>> i also need to understand this maxMultiTermExpansions
parameter better.
>> For instance if first way is used, will
maxMultiTermExpansions be
>> meaningful?
>>
>>
>> Thanks
>>
>>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
<mailto:java-user-unsubscr...@lucene.apache.org>
For additional commands, e-mail: java-user-h...@lucene.apache.org
<mailto:java-user-h...@lucene.apache.org>