Alan,
Thanks again for the quick replies and assistance. As these things go, I did a
clean build and my query parser started working properly. Actually I wrote
some tests to test out the situation I reported and my tests confirmed how I
thought it should have worked and that got me to do a clean build.
Now the numbers look like this:
"facet_queries":{
"one-two-A":0,
"one-two-AB":1,
"one-two-ABC":0,
"two-three-A":0,
"two-three-AB":0,
"two-three-ABC":0,
"one-two-three-A":0,
"one-two-three-AB":0,
"one-two-three-ABC":1},
Ahh, much better! One the two that match have three clauses and three payloads
that match those clauses.
My tests are below, at the Lucene level.
Erik
// I don’t think these tests add anything to TestPayloadCheckQuery, but helped
me understand things better:
public void testTesting() throws Exception {
Analyzer simplePayloadAnalyzer = new Analyzer() {
@Override
public TokenStreamComponents createComponents(String fieldName) {
Tokenizer tokenizer = new MockTokenizer(MockTokenizer.WHITESPACE,
false);
return new TokenStreamComponents(tokenizer, new
DelimitedPayloadTokenFilter(tokenizer,'|', new IdentityEncoder()));
}
};
directory = newDirectory();
RandomIndexWriter writer = new RandomIndexWriter(random(), directory,
newIndexWriterConfig(simplePayloadAnalyzer)
.setMaxBufferedDocs(TestUtil.nextInt(random(), 100,
1000)).setMergePolicy(newLogMergePolicy()));
Document doc = new Document();
doc.add(newTextField("field", "one|A two|B three|C", Field.Store.YES));
writer.addDocument(doc);
reader = writer.getReader();
searcher = newSearcher(reader);
writer.close();
checkMatch("one two", new String[] {"A"}, false);
checkMatch("one two", new String[] {"A", "B"}, true);
checkMatch("one two", new String[] {"A", "B", "C"}, false);
checkMatch("two three", new String[] {"A"}, false);
checkMatch("two three", new String[] {"A", "B"}, false);
checkMatch("two three", new String[] {"A", "B", "C"}, false);
// extra check just to make sure we can match on “two three” with the right
payloads
checkMatch("two three", new String[] {"B", "C"}, true);
checkMatch("one two three", new String[] {"A"}, false);
checkMatch("one two three", new String[] {"A", "B"}, false);
checkMatch("one two three", new String[] {"A", "B", "C"}, true);
}
private void checkMatch(String phrase, String[] payloadArray, boolean
willMatch) throws IOException {
String[] terms = phrase.split(" ");
List<SpanQuery> stqs = new ArrayList<SpanQuery>();
for (String term : terms) {
stqs.add(new SpanTermQuery(new Term("field", term)));
}
SpanNearQuery snq = new SpanNearQuery(stqs.toArray(new
SpanQuery[stqs.size()]), 0, true);
IdentityEncoder encoder = new IdentityEncoder();
List<BytesRef> payloads = new ArrayList<>();
for (String rawPayload : payloadArray) {
payloads.add(encoder.encode(rawPayload.toCharArray()));
}
SpanPayloadCheckQuery spcq = new SpanPayloadCheckQuery(snq, payloads);
System.out.println("spcq = " + spcq);
checkHits(spcq, willMatch ? new int[] {0} : new int[] {});
}
> On Apr 25, 2017, at 5:44 AM, Alan Woodward <[email protected]> wrote:
>
> Hm, maybe - a quick look at the tests suggests that we don’t have anything
> that explicitly checks more than 2 clauses. Can you open an issue and add
> something to TestPayloadCheckQuery?
>
>
>> On 25 Apr 2017, at 10:23, Erik Hatcher <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> Alan - thanks for the reply. Given your explanation is there an off by one
>> term issue? The matches I'm seeing would happen if the last term weren't
>> considered.
>>
>> Do you have an example of multiple payloads too?
>>
>> Erik
>>
>> On Apr 25, 2017, at 04:16, Alan Woodward <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>>> The query will only match a particular span if all the payloads in that
>>> span match the passed-in array. So for example, in your first query, the
>>> inner spanNear query matches two terms (words_dps:one and words_dps:two),
>>> so it needs to have an array of two payloads to match.
>>>
>>> You can use it for, for example, parts-of-speech tagging;
>>> spanPayCheck(spanTerm(text:run), payloadRef:noun) would only match
>>> instances of ‘run’ that are tagged as a noun, rather than a verb.
>>>
>>> I can see a case for a separate query that only matches when all of a
>>> span’s payloads match a single payload value
>>>
>>> Alan Woodward
>>> www.flax.co.uk <http://www.flax.co.uk/>
>>>
>>>
>>>> On 25 Apr 2017, at 01:40, Erik Hatcher <[email protected]
>>>> <mailto:[email protected]>> wrote:
>>>>
>>>> I’ve started a belated mission to leverage payloads from Solr (SOLR-1485),
>>>> mainly from float payload decoding for weighting in scoring, but while
>>>> digging in I’m exploring all that payloads now have to offer including the
>>>> SpanPayloadCheckQuery. However, I’m not yet understanding how to use it
>>>> effectively, and what kinds of use cases it _really_ is and can be used
>>>> for.
>>>>
>>>> I think it isn’t working as it should, or at least I’m not understanding
>>>> its behavior. Here’s what I’m indexing, by way of the
>>>> DelimitedPayloadTokenFilter:
>>>>
>>>> one|A two|B three|C
>>>>
>>>> and making the following queries (these translate to SpanNearQuery with
>>>> zero slop and inOrder=true):
>>>>
>>>> spanPayCheck(spanNear([words_dps:one, words_dps:two], 0, true),
>>>> payloadRef: A;)
>>>> *spanPayCheck(spanNear([words_dps:one, words_dps:two], 0, true),
>>>> payloadRef: A;B;)
>>>> spanPayCheck(spanNear([words_dps:one, words_dps:two], 0, true),
>>>> payloadRef: A;B;C;)
>>>> spanPayCheck(spanNear([words_dps:two, words_dps:three], 0, true),
>>>> payloadRef: A;)
>>>> *spanPayCheck(spanNear([words_dps:two, words_dps:three], 0, true),
>>>> payloadRef: A;B;)
>>>> spanPayCheck(spanNear([words_dps:two, words_dps:three], 0, true),
>>>> payloadRef: A;B;C;)
>>>> spanPayCheck(spanNear([words_dps:one, words_dps:two, words_dps:three], 0,
>>>> true), payloadRef: A;)
>>>> *spanPayCheck(spanNear([words_dps:one, words_dps:two, words_dps:three],
>>>> 0, true), payloadRef: A;B;)
>>>> spanPayCheck(spanNear([words_dps:one, words_dps:two, words_dps:three], 0,
>>>> true), payloadRef: A;B;C;)
>>>>
>>>> Only the ones(*) with the payloads array set to “A” and “B” did it match,
>>>> all the others failed to match. Is that expected? I’m confused on how
>>>> the SpanPayloadCheckQuery uses this payloads array to further filter the
>>>> matches on the associated SpanQuery.
>>>>
>>>> Could/would someone explain how this query works and why these matches are
>>>> working as they are? Thanks!
>>>>
>>>> Here’s my test platform below:
>>>>
>>>> ——
>>>>
>>>> bin/post -c payloads -type text/csv -out yes -d $'id,words_dps\n1,one|A
>>>> two|B three|C'
>>>> curl http://localhost:8983/solr/payloads/config/params
>>>> <http://localhost:8983/solr/payloads/config/params> -H
>>>> 'Content-type:application/json' -d '{
>>>> "set" : {
>>>> "payload-checks": {
>>>> "wt":"json",
>>>> "indent":"on",
>>>> "debug":"query",
>>>> "echoParams":"all",
>>>> "facet":"on",
>>>> "facet.query": [
>>>> "{!payload_check key=one-two-A f=words_dps payloads=\"A\"}one two",
>>>> "{!payload_check key=one-two-AB f=words_dps payloads=\"A B\"}one
>>>> two",
>>>> "{!payload_check key=one-two-ABC f=words_dps payloads=\"A B
>>>> C\"}one two",
>>>> "{!payload_check key=two-three-A f=words_dps payloads=\"A\"}two
>>>> three",
>>>> "{!payload_check key=two-three-AB f=words_dps payloads=\"A B\"}two
>>>> three",
>>>> "{!payload_check key=two-three-ABC f=words_dps payloads=\"A B
>>>> C\"}two three",
>>>> "{!payload_check key=one-two-three-A f=words_dps
>>>> payloads=\"A\"}one two three",
>>>> "{!payload_check key=one-two-three-AB f=words_dps payloads=\"A
>>>> B\"}one two three",
>>>> "{!payload_check key=one-two-three-ABC f=words_dps payloads=\"A B
>>>> C\"}one two three"
>>>> ]
>>>> }
>>>> }
>>>> }'
>>>> curl
>>>> "http://localhost:8983/solr/payloads/select?q=*:*&useParams=payload-checks
>>>> <http://localhost:8983/solr/payloads/select?q=*:*&useParams=payload-checks>”
>>>>
>>>> • facet_queries: {
>>>> • one-two-A: 0,
>>>> • one-two-AB: 1,
>>>> • one-two-ABC: 0,
>>>> • two-three-A: 0,
>>>> • two-three-AB: 1,
>>>> • two-three-ABC: 0,
>>>> • one-two-three-A: 0,
>>>> • one-two-three-AB: 1,
>>>> • one-two-three-ABC: 0
>>>> },
>>>>
>>>> —
>>>>
>>>> // not necessarily the latest code on SOLR-1485 - construction zone
>>>> public Query parse() throws SyntaxError {
>>>> String field = localParams.get(QueryParsing.F);
>>>> String value = localParams.get(QueryParsing.V);
>>>> String pStr = localParams.get("payloads","");
>>>>
>>>> IdentityEncoder encoder = new IdentityEncoder();
>>>> List<BytesRef> payloads = new ArrayList<>();
>>>> String[] rawPayloads = pStr.split(" ");
>>>> for (String rawPayload : rawPayloads) {
>>>> payloads.add(encoder.encode(rawPayload.toCharArray()));
>>>> }
>>>>
>>>> String[] terms = value.split(" ");
>>>> List<SpanQuery> stqs = new ArrayList<SpanQuery>();
>>>> for (String term : terms) {
>>>> stqs.add(new SpanTermQuery(new Term(field, term)));
>>>> }
>>>> SpanNearQuery snq = new SpanNearQuery(stqs.toArray(new
>>>> SpanQuery[0]), 0, true);
>>>>
>>>> Query spcq = new SpanPayloadCheckQuery(snq, payloads);
>>>>
>>>> return spcq;
>>>> }
>>>> };
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> <mailto:[email protected]>
>>>> For additional commands, e-mail: [email protected]
>>>> <mailto:[email protected]>
>>>>
>>>
>