Re: Lucene custom Query - efficiently and compare retrieve multiple document fields

Dominik Safaric Mon, 12 Feb 2018 13:04:54 -0800

Unfortunately you've misunderstood my question. The thing is that the 
FuzzyQuery does not unfortunately satisfy the given requirements of mine, in 
particular it is based on Levenshtein and not Hamming distance. Hence the need 
to implement the custom Query instance.


As asked, how does Lucene internally store multi valued fields and is it 
possible to retrieve them in the same order as they were stored? In particular, 
I'd like to retrieve a multi valued keyword field in such a way. 
 
Kind regards,
Dominik

> On 12 Feb 2018, at 19:34, Adrien Grand <[email protected]> wrote:
> 
> Filtering by one query and scoring by a different query is easy: just put
> the filter in a FILTER clause of a BooleanQuery and the scoring query in a
> SHOULD clause. Documents that do not match the SHOULD clause will have a
> score of zero.
> 
> I'm wondering that maybe you are looking for something like this:
> 
> Query q = new BooleanQuery.Builder()
>  .add(new FuzzyQuery(new Term("coarse_grained", "search_term")),
> Occur.FILTER)
>  .add(new FuzzyQuery(new Term("fine_grained", "search_term")),
> Occur.SHOULD)
>  .build();
> 
> It's not clear to me why you need to retain order: the order of your values
> should not matter?
> 
> Le lun. 12 févr. 2018 à 11:23, Dominik Safaric <[email protected]> a
> écrit :
> 
>> In particular, I have a document schema as follows:
>> 
>> {
>> "images": [{
>> "image_id": 1,
>> "features": {
>> "coarse_grained": <keyword>,
>> "fine_grained": [*<keyword>*]
>> }
>> }]
>> }
>> 
>> In the first run, using a custom Query instance I'd like to hit documents
>> by matching the *coarse_grained *field. A document is said to be matching
>> if the Hamming distance between the value of a document's
>> *coarse_grained* field,
>> compared to the one passed through the REST API, is less or equal then a
>> set threshold. On the other hand, I'd like to score the hit documents using
>> the *fine_grained *field values, which is an array of keywords. A similar
>> method using Hamming distance as a similarity measure applies in this case
>> as well.
>> 
>> What I'm concerned with is the following: in the second (the scoring) phase
>> I'd like to score documents using all fields of the *fine_grained* array of
>> keywords. How can I effectively retrieve these values for each document,
>> such that their order is equal to the one as they were inserted?
>> 
>> Thanks in advance,
>> Dominik
>> 
>> 2018-02-12 8:56 GMT+01:00 Adrien Grand <[email protected]>:
>> 
>>> Whether this is doable is going to depend on what you mean by "match[ing]
>>> documents according to criteria X". Can you give an example?
>>> 
>>> Le ven. 9 févr. 2018 à 14:47, Dominik Safaric <[email protected]>
>> a
>>> écrit :
>>> 
>>>> Hi,
>>>> 
>>>> I am intending to implement a custom Query using Lucene 6.x and due to
>>> the
>>>> lack of documentation concerned with a particular topic I have the
>>>> following questions.
>>>> 
>>>> The query is expected to implement a two-phase search, in the sense
>> that
>>>> during the first run it matches documents according to criteria X,
>>> whereas
>>>> during the later according to criteria Y of another document field. Can
>>>> this be accomplished by using the TwoPhaseIterator?
>>>> 
>>>> Secondly, the query as expressed through the API will not specify a
>>>> specific query field, but instead of a field that stores an array of
>>>> objects. From an implementation point of view, can I using the
>> LeafReader
>>>> retrieve an object that would map to a Java Map, which I can later use
>>> for
>>>> accessing a certain field within the object? Of is it perhaps more
>>>> advisable to get the document instance using the LeafReader's
>>>> getDocument(int docID) function, and then load particular? I'm afraid
>>> that
>>>> might hurt the performance in overall because the documents would need
>> to
>>>> be loaded from disk.
>>>> 
>>>> Thanks in advance,
>>>> Dominik
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>> 
>>>> 
>>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Lucene custom Query - efficiently and compare retrieve multiple document fields

Reply via email to