Michael and Forum,- This is amazing, thanks. i will try both cases.
i can also have "term1 term2Char1term2Char2*" and so on with term2's next chars. I hope the latest version on github for this class works with Lucene Version 7.7.2. Best regards > On Feb 18, 2020, at 8:33 PM, Michael Froh <msf...@gmail.com> wrote: > > > In your example, it looks like you wanted the second term to match based on > the first character, or prefix, of the term. > > While you could use a WildcardQuery with a term value of "term2FirstChar*", > PrefixQuery seemed like the simpler approach. WildcardQuery can handle more > general cases, like if you want to match on something like "a*b*c". > > Technically, the PrefixQuery compiles down to a slightly simpler automaton, > but I only figured that out by writing a simple unit test: > > public void testAutomata() { > Automaton prefixAutomaton = PrefixQuery.toAutomaton(new > BytesRef("a")); > Automaton wildcardAutomaton = WildcardQuery.toAutomaton(new > Term("foo", "a*")); > > System.out.println("PrefixQuery(\"a\")"); > System.out.println(prefixAutomaton.toDot()); > System.out.println("WildcardQuery(\"a*\")"); > System.out.println(wildcardAutomaton.toDot()); > } > > That produces the following output: > > PrefixQuery("a") > digraph Automaton { > rankdir = LR > node [width=0.2, height=0.2, fontsize=8] > initial [shape=plaintext,label=""] > initial -> 0 > 0 [shape=circle,label="0"] > 0 -> 1 [label="a"] > 1 [shape=doublecircle,label="1"] > 1 -> 1 [label="\\U00000000-\\U000000ff"] > } > WildcardQuery("a*") > digraph Automaton { > rankdir = LR > node [width=0.2, height=0.2, fontsize=8] > initial [shape=plaintext,label=""] > initial -> 0 > 0 [shape=circle,label="0"] > 0 -> 1 [label="a"] > 1 [shape=doublecircle,label="1"] > 1 -> 2 [label="\\U00000000-\\U0010ffff"] > 2 [shape=doublecircle,label="2"] > 2 -> 2 [label="\\U00000000-\\U0010ffff"] > } > > > >> On Tue, 18 Feb 2020 at 13:52, <baris.ka...@oracle.com> wrote: >> Michael and Forum,- >> Thanks for thegreat explanations. >> >> one question please: >> >> why is PrefixQuery used instead of WildCardQuery in the below snippet? >> >> Best regards >> >> > On Feb 17, 2020, at 3:01 PM, Michael Froh <msf...@gmail.com> wrote: >> > >> > Hi Baris, >> > >> > The idea with PhraseWildcardQuery is that you can mix literal "exact" terms >> > with "MultiTerms" (i.e. any subclass of MultiTermQuery). Using addTerm is >> > for exact terms, while addMultiTerm is for things that may match a number >> > of possible terms in the given position. >> > >> > If you want to search for term1 followed by any term that starts with a >> > given character, I would suggest using: >> > >> > int maxMultiTermExpansions = ...; // Discussed below >> > PhraseWildCardQuery.Builder builder = new PhraseWildcardQuery("field", >> > maxMultiTermExpansions); >> > builder.addTerm(new BytesRef("term1")); // Add fixed term in position 0 >> > builder.addMultiTerm(new PrefixQuery(new Term("field", "term2FirstChar"))); >> > // Add multiterm in position 1 >> > Query q = builder.build(); >> > >> > The PrefixQuery effectively gets expanded into a bunch of possible terms, >> > based on the term dictionary on each index segment. To avoid expanding to >> > cover too many terms (say, if you added a bunch of WildcardQuery), >> > maxMultiTermExpansions serves as a guard rail, to put a rough bound on >> > memory consumption and query execution time. If you're interested in >> > details of how the maxMultiTermExpansions budget is distributed across >> > MultiTerms, check out PhraseWildcardQuery.createWeight. If you're just >> > running an experiment in your IDE, you could probably set >> > maxMultiTermExpansions to Integer.MAX_VALUE. (If you're running in a >> > production environment, it's likely a good idea to tune it down based on >> > your memory/latency constraints.) >> > >> > Incidentally, for tracking down the source code for anything in Lucene, >> > it's probably better to go to GitHub for the most up-to-date source: >> > https://urldefense.com/v3/__https://github.com/apache/lucene-solr/blob/master/lucene/sandbox/src/java/org/apache/lucene/search/PhraseWildcardQuery.java__;!!GqivPVa7Brio!ONqQgLIltNBUuSo5Cn_Fz7-wuR1LQv68YS_z-6g7X-S86PHQtT9tKl7VbIq9tVLYyw$ >> > >> > . >> > >> > Hope that helps, >> > Michael >> > >> >> On Thu, 13 Feb 2020 at 12:29, <baris.ka...@oracle.com> wrote: >> >> >> >> Hi,- >> >> >> >> i hope everyone is doing great. >> >> >> >> if i want to do the following search with PhraseWildCardQuery and >> >> thanks to this forum for letting me know about this class (Especially to >> >> David and Bruno) >> >> >> >> term1 term2FirstChar* >> >> >> >> i need to do two ways: (i found the source code at >> >> >> >> https://urldefense.com/v3/__https://fossies.org/linux/lucene/sandbox/src/java/org/apache/lucene/search/PhraseWildcardQuery.java__;!!GqivPVa7Brio!ONqQgLIltNBUuSo5Cn_Fz7-wuR1LQv68YS_z-6g7X-S86PHQtT9tKl7VbIpV8n29nQ$ >> >> >> >> ) >> >> >> >> /* >> >> >> >> maxMultiTermExpansions - The maximum number of expansions across all >> >> multi-terms and across all segments. It counts expansions for each >> >> segments individually, that allows optimizations per segment and unused >> >> expansions are credited to next segments. This is different from >> >> MultiPhraseQuery and SpanMultiTermQueryWrapper which have an expansion >> >> limit per multi-term. >> >> >> >> segmentOptimizationEnabled - Whether to enable the segment optimization >> >> which consists in ignoring a segment for further analysis as soon as a >> >> term is not present inside it. This optimizes the query execution >> >> performance but changes the scoring. The result ranking is preserved. >> >> >> >> */ >> >> >> >> >> >> 1st way: >> >> >> >> PhraseWildCardQuery.Builder builder = PharseWildCardQuery.Builder(field, >> >> 2 _*/<<< i dont know what number to use here for >> >> maxMultiTermExpansions>>>/*_, true/*boolean segmentOptimizationEnabled*/) >> >> >> >> pwcqBuilder.addTerm(field, new Term(field, "term1")); >> >> >> >> pwcqBuilder.addTerm(field,new Term(field, "term2FirstChar")); >> >> >> >> PhraseWildCardQuery pwcq = pwcqBuilder.build(); >> >> >> >> or >> >> >> >> 2nd way: >> >> >> >> pwcqBuilder.addMultiTerm(MultiTermQuery object here contaning {field, >> >> "term1"} and {field ,"term2FirstChar"}); >> >> >> >> PhraseWildCardQuery pwcq = pwcqBuilder.build(); >> >> >> >> >> >> Then this pwcq object will be fed into IndexSearcher's as the query >> >> parameter. >> >> >> >> >> >> Now, it looks like the first way will not consider expansions or in >> >> other words wildcard? Am i right? >> >> >> >> i also need to understand this maxMultiTermExpansions parameter better. >> >> For instance if first way is used, will maxMultiTermExpansions be >> >> meaningful? >> >> >> >> >> >> Thanks >> >> >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >>