Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

Ivan Pavlukhin Sat, 14 Mar 2020 20:16:00 -0700

Yuriy,

> Let me summarize the approaches:
I agree with your reasoning, p.2 sounds the best one to me as well.


Will look into merge-sort strategy some time later.

Best regards,
Ivan Pavlukhin

пт, 13 мар. 2020 г. в 19:23, Yuriy Shuliga <shul...@gmail.com>:
>
> Ivan,
>
> I have made changes in the fork that reflects merge-sort strategy and now
> query future iterator unblocks as soon all first pages are delivered from
> nodes; then it waits for the next pages portions and so on.
> https://github.com/shuliga/ignite/commit/c84f04c18f67e99ab7bc0a7893b75f1dc83a76bd
>
> Please validate the design if you wish.
>
> Regarding ranking field in the entity.
>
> Entities for text queries in search domain are usually treated as
> documents with some metadata.
> This can be an id, issued/expired date, and document score returned for
> given query.
> It is common to include such fields in entity design.
>
> Answer to your question about omitting QueryRankField:
> - Then the response records just will come in arbitrary order. This
> should not fail TextQuery execution.
>
> Another point about rank value among different indices.
> - ranks are to be used for comparison between entities in praticular query
> response, they are not intended to be absolute over the system.
>
> Let me summarize the approaches:
> 1. Subclassing from Ranked.class.
>  pros: the simplest and ignite-natural approach
> cons: implicit nature, limits entity inheritance
>
> 2. Explicitly Introducing dedicated field  annotated  @QueryRankField
> pros:  ignite-natural approach, easy to introduce, explicitly controlled by
> developer
> cons: adds extra metadata to entity
>
> 3. Wrapping entity response with rank data, used for merge sort, not
> exposing it to client.
> pros: leaves entity design clean
> cons: rank is not available for client, development will require complex
> change in query execution / entity marshaling mechanisms
>
> I'd stay on p.2 as most balanced solution of these.
> What do you think?
>
> BR,
> Yuriy Shuliha
>
>
>
>
> ср, 11 бер. 2020 о 01:14 Ivan Pavlukhin <vololo...@gmail.com> пише:
>
> > Igniters,
> >
> > Not intentionally the discussion continued outside of dev list. I am
> > returning it back. You can find it below. Do not hesitate to join if you
> > have some thoughts on raised questions. May be you have ideas how to enrich
> > text query results with score/rank information.
> >
> > вт, 10 мар. 2020 г. в 09:11, Yuriy Shuliga <shul...@gmail.com>:
> >
> > > Yes, please do.
> > >
> > > вт, 10 бер. 2020, 02:26 користувач Ivan Pavlukhin <vololo...@gmail.com>
> > > пише:
> > >
> > >> Yuriy,
> > >>
> > >> I noticed that from some point our discussion moved out of Ignite dev
> > >> list. Would you mind if I return it back to dev list?
> > >>
> > >> Best regards,
> > >> Ivan Pavlukhin
> > >>
> > >> вт, 10 мар. 2020 г. в 03:25, Ivan Pavlukhin <vololo...@gmail.com>:
> > >> >
> > >> > > PS As far as i see, the are no chance to get on 2.8 release train.
> > >> What will be the next version/date we can aim on with this update?
> > >> >
> > >> > Yes, 2.8 is already available and the community is working on
> > >> finalizing activities (e.g. publishing documentation). I do not have any
> > >> reliable expectations about next releases. I suppose that there could
> > be a
> > >> couple of maintenance releases like 2.8.1 as several problems were
> > already
> > >> discovered. I do not know whether next more significant release is
> > going to
> > >> be 2.9 even major release 3.0. It sounds realistic to facilitate 2.9
> > >> because there are already several "almost ready" features in master. In
> > my
> > >> mind it is a good idea to start a discussion about next releases on dev
> > >> list.
> > >> >
> > >> > Best regards,
> > >> > Ivan Pavlukhin
> > >> >
> > >> > вт, 10 мар. 2020 г. в 00:58, Ivan Pavlukhin <vololo...@gmail.com>:
> > >> > >
> > >> > > Hi Yuriy,
> > >> > >
> > >> > > Sorry for a late response.
> > >> > >
> > >> > > > Suitable solution without subclassing might be:
> > >> > > > 1. Explicitly add float field to entity
> > >> > > > 2. Annotate it with special @QueryRankField, (for instance)
> > >> > > > 3. Fill in this field with docScore in GrlidLuceneindex, pass back
> > >> to initiating node
> > >> > > > 4. Possibly still need to proxify entity with adding Comparable
> > >> interface.
> > >> > > > 5. Perform merge sort on initiating node
> > >> > >
> > >> > > Possibly I missed it but one moment is not clear for me. What will
> > >> > > happen if an entity class does not have a field annotated with
> > >> > > QueryRankField?
> > >> > >
> > >> > > And I am still not sure that it is a proper (enough) approach. The
> > >> > > thing which bothers me is a transient and dynamic nature of "rank"
> > >> > > field. It does belong to entity, it can have different values for
> > the
> > >> > > same entity (e.g. different indices are used).
> > >> > >
> > >> > > I would like to experiment with a code a little bit. But most
> > likely I
> > >> > > will have a chance only at the end of this week.
> > >> > >
> > >> > > Best regards,
> > >> > > Ivan Pavlukhin
> > >> > >
> > >> > > пн, 2 мар. 2020 г. в 20:09, Yuriy Shuliga <shul...@gmail.com>:
> > >> > > >
> > >> > > > Hi Ivan,
> > >> > > >
> > >> > > > Have concerns about entity annotation variant.
> > >> > > > Wrapping into dynamic proxy for passing back, will be quite a
> > >> complex thing that requires changes in IgniteCacheObjectProcessor
> > >> > > > and entity marshaling.
> > >> > > >
> > >> > > > Suitable solution without subclassing might be:
> > >> > > > 1. Explicitly add float field to entity
> > >> > > > 2. Annotate it with special @QueryRankField, (for instance)
> > >> > > > 3. Fill in this field with docScore in GrlidLuceneindex, pass back
> > >> to initiating node
> > >> > > > 4. Possibly still need to proxify entity with adding Comparable
> > >> interface.
> > >> > > > 5. Perform merge sort on initiating node
> > >> > > >
> > >> > > > Would you consider this approach or return back to using Ranked
> > >> superclass?
> > >> > > >
> > >> > > > Regarding your proposal to implement megre sort - definitely yes.
> > >> > > > I will implement this.
> > >> > > > Sorry, didn't understand you earlier )
> > >> > > >
> > >> > > > BR,
> > >> > > > Yuriy Shuliha
> > >> > > >
> > >> > > > PS As far as i see, the are no chance to get on 2.8 release train.
> > >> What will be the next version/date we can aim on with this update?
> > >> > > >
> > >> > > >
> > >> > > > пт, 28 лют. 2020 о 23:08 Ivan Pavlukhin <vololo...@gmail.com>
> > пише:
> > >> > > >>
> > >> > > >> Hi Yuriy,
> > >> > > >>
> > >> > > >> Sorry for a late response and thank you for your comments.
> > >> > > >>
> > >> > > >> Approach with @Ranked annotation looks cleaner to me from API
> > >> point of view.
> > >> > > >>
> > >> > > >> Regarding merging responses from multiple nodes I suppose that
> > good
> > >> > > >> enough solution is possible:
> > >> > > >> 1. Request one page of entries from each node.
> > >> > > >> 2. Return one page to a user (as there is definitely a page of
> > the
> > >> > > >> best results already).
> > >> > > >> 3. Request next result pages from nodes corresponding to pages we
> > >> > > >> exposed to the user (actually nodes having lesser than 1 page of
> > >> > > >> pending results). Repeat from step 2.
> > >> > > >>
> > >> > > >> Some kind of sort merge plus backpressure. Backpressure part
> > might
> > >> be
> > >> > > >> left as an improvement.
> > >> > > >>
> > >> > > >> What do you think?
> > >> > > >>
> > >> > > >> Best regards,
> > >> > > >> Ivan Pavlukhin
> > >> > > >>
> > >> > > >> вт, 18 февр. 2020 г. в 18:27, Yuriy Shuliga <shul...@gmail.com>:
> > >> > > >>
> > >> > > >> >
> > >> > > >> > Hi Ivan,
> > >> > > >> >
> > >> > > >> > Thank you for keeping eye on the topic!
> > >> > > >> >
> > >> > > >> >  Here're the answers to your questions:
> > >> > > >> > 1. TextQuery response is always ordered by documentScore, and
> > >> this number are also frequently used when processing the results.
> > >> > > >> > We have analyzed current entity flow indeed the hood of query
> > >> processing and found out that the most clean approach to get response
> > with
> > >> ordered entities is to extent the entity itself.
> > >> > > >> > The only drawback will be the necessity to extend from Ranked
> > in
> > >> our case. And as it is very common to utilize documentScore (rank) when
> > >> working with TextQuery.
> > >> > > >> > Another  approach i see, is to play with reflection to create
> > >> proxy with Ranked interface. In this case we still will need to mark our
> > >> intentions to have ordered response and add some @Ranked annotation e.g.
> > >> > > >> > Please, advice what would fit Ignite better.
> > >> > > >> >
> > >> > > >> > 2. Yes, you are right. Using PriorityQueue  may lead to
> > unwanted
> > >> memory consumption.
> > >> > > >> > In order to get correct response we still need to retrieve data
> > >> from all of the nodes, as ant of them may contain value that may fall
> > into
> > >> limited range (this is because of float ranking score)
> > >> > > >> > This can be fixed by using Guava's MinMaxPriorityQueue that has
> > >> maximum size limitation. Technically it will be equivalent to the sorted
> > >> responses merging, as each element will require comparison upon all
> > queue.
> > >> > > >> >
> > >> > > >> > BR,
> > >> > > >> > Yuriy Shuliha
> > >> > > >> >
> > >> > > >> >
> > >> > > >> > чт, 13 лют. 2020 о 13:53 Ivan Pavlukhin <vololo...@gmail.com>
> > >> пише:
> > >> > > >> >>
> > >> > > >> >> Hi Yuriy,
> > >> > > >> >>
> > >> > > >> >> Sorry for a delay. I went through the proposed solution and I
> > >> have
> > >> > > >> >> some questions. Currently I am a little bit far from a context
> > >> of TEXT
> > >> > > >> >> queries, so correct me or redirect to some previous discussion
> > >> if I
> > >> > > >> >> got something wrong:
> > >> > > >> >> 1. What is a justification for using inheritance from Ranked
> > in
> > >> order
> > >> > > >> >> to keep order? Why cannot we mix in rank/score into entries
> > >> > > >> >> transferred inside GridCacheQueryResponse?
> > >> > > >> >> 2. Collecting all entries in PriorityQueue can lead to
> > >> unnecessary
> > >> > > >> >> heap memory consumption. I think that merging several sorted
> > >> runs
> > >> > > >> >> (responses from different nodes) will be a better option.
> > >> > > >> >>
> > >> > > >> >> Best regards,
> > >> > > >> >> Ivan Pavlukhin
> > >> > > >> >>
> > >> > > >> >> пн, 10 февр. 2020 г. в 18:32, Yuriy Shuliga <
> > shul...@gmail.com
> > >> >:
> > >> > > >> >> >
> > >> > > >> >> > Hi Ivan,
> > >> > > >> >> >
> > >> > > >> >> > Did you have a chance to look through the proposed solution?
> > >> > > >> >> > We definitely need this validation in order to proceed
> > >> further and provide the changes officially .
> > >> > > >> >> >
> > >> > > >> >> > BR,
> > >> > > >> >> > Yuriy Shluiha
> > >> > > >> >> >
> > >> > > >> >> > вт, 28 січ. 2020 о 17:30 Yuriy Shuliga <shul...@gmail.com>
> > >> пише:
> > >> > > >> >> >>
> > >> > > >> >> >> Hello,
> > >> > > >> >> >>
> > >> > > >> >> >> please see the proposed TextQuery ordering solution here:
> > >> > > >> >> >>
> > >>
> > https://github.com/apache/ignite/compare/master...shuliga:feature/rank_score
> > >> > > >> >> >>
> > >> > > >> >> >> Y.
> > >> > > >> >> >>
> > >> > > >> >> >> пт, 24 січ. 2020 о 09:50 Ivan Pavlukhin <
> > vololo...@gmail.com>
> > >> пише:
> > >> > > >> >> >>>
> > >> > > >> >> >>> Yuriy,
> > >> > > >> >> >>>
> > >> > > >> >> >>> Good to know that the story continues! Yes, it would be
> > >> really nice to
> > >> > > >> >> >>> see the code of your solution, of course formal
> > >> requirements can be
> > >> > > >> >> >>> omitted, a solution design is of the most interest so far.
> > >> And it
> > >> > > >> >> >>> definitely would be great to merge to Apache Ignite
> > codebase
> > >> > > >> >> >>> eventually.
> > >> > > >> >> >>>
> > >> > > >> >> >>> чт, 23 янв. 2020 г. в 16:47, Yuriy Shuliga <
> > >> shul...@gmail.com>:
> > >> > > >> >> >>> >
> > >> > > >> >> >>> > Hi Ivan,
> > >> > > >> >> >>> >
> > >> > > >> >> >>> > Actually I have engaged another developer to help bring
> > >> TextQueries to correctly working state.
> > >> > > >> >> >>> > For now we have solution that adds Ordering
> > functionality
> > >> to distributed TextQueries .
> > >> > > >> >> >>> > This is developed and tested locally. I can share
> > details
> > >> here, then we can discuss and decide whether to create a corresponding
> > >> ticket.
> > >> > > >> >> >>> >
> > >> > > >> >> >>> > The starting point is that by nature Lucene's documents
> > >> are always ordered by docScore:float;
> > >> > > >> >> >>> > So we created abstract class Ranked, implementing
> > >> Comparable<Ranked> and Serializable; and containing float rank value;
> > >> > > >> >> >>> >
> > >> > > >> >> >>> > Each entity expected to be ordered on TextQuery merge
> > >> should be derived from this class.
> > >> > > >> >> >>> > All subsequent actions will be done under the hood
> > >> automatically due to new CacheQueryFutureRankedDecorator
> > >> > > >> >> >>> > that contain special BlockingIterator used for correct
> > >> merge of distributed responses.
> > >> > > >> >> >>> > Text queries with Ranked entities are automatically
> > >> wrapped with this new decorator.
> > >> > > >> >> >>> >
> > >> > > >> >> >>> > This is a contour of solution. Please ask if any
> > >> questions.
> > >> > > >> >> >>> > Or i can create ticket and link PR with already tested
> > >> (yet locally) solution to it for detailed review.
> > >> > > >> >> >>> >
> > >> > > >> >> >>> > BR,
> > >> > > >> >> >>> > Yuriy
> > >> > > >> >> >>> >
> > >> > > >> >> >>> >
> > >> > > >> >> >>> > вт, 21 січ. 2020 о 07:29 Ivan Pavlukhin <
> > >> vololo...@gmail.com> пише:
> > >> > > >> >> >>> >>
> > >> > > >> >> >>> >> Hi Yuriy,
> > >> > > >> >> >>> >>
> > >> > > >> >> >>> >> Just would like to realize current state. Are you still
> > >> working on
> > >> > > >> >> >>> >> Ignite text queries? If not, are you going to continue
> > >> with it?
> > >> > > >> >> >>> >>
> > >> > > >> >> >>> >> пт, 13 дек. 2019 г. в 11:52, Ivan Pavlukhin <
> > >> vololo...@gmail.com>:
> > >> > > >> >> >>> >> >
> > >> > > >> >> >>> >> > Yuriy,
> > >> > > >> >> >>> >> >
> > >> > > >> >> >>> >> > Sure, I will be glad to help.
> > >> > > >> >> >>> >> >
> > >> > > >> >> >>> >> > > - incorrect nodes/partition selection during
> > >> querying?
> > >> > > >> >> >>> >> > Apparently this is the problem. If you feel it really
> > >> complicated to
> > >> > > >> >> >>> >> > understand and debug then I can dig deeper and share
> > >> my vision how the
> > >> > > >> >> >>> >> > problem can be fixed.
> > >> > > >> >> >>> >> >
> > >> > > >> >> >>> >> > ср, 11 дек. 2019 г. в 18:46, Yuriy Shuliga <
> > >> shul...@gmail.com>:
> > >> > > >> >> >>> >> > >
> > >> > > >> >> >>> >> > > I will look to the MOVING partition issue.
> > >> > > >> >> >>> >> > > But also need a guidance there.
> > >> > > >> >> >>> >> > >
> > >> > > >> >> >>> >> > > Ivan, don't you mind to be that person?
> > >> > > >> >> >>> >> > >
> > >> > > >> >> >>> >> > > The question is whether we have an issue with:
> > >> > > >> >> >>> >> > > -  wrong storing targets during indexing OR
> > >> > > >> >> >>> >> > > - incorrect nodes/partition selection during
> > >> querying?
> > >> > > >> >> >>> >> > >
> > >> > > >> >> >>> >> > > BR,
> > >> > > >> >> >>> >> > > Yuriy Shluiha
> > >> > > >> >> >>> >> > >
> > >> > > >> >> >>> >> > >
> > >> > > >> >> >>> >> > >
> > >> > > >> >> >>> >> > > --
> > >> > > >> >> >>> >> > > Sent from:
> > >> http://apache-ignite-developers.2346864.n4.nabble.com/
> > >> > > >> >> >>> >> >
> > >> > > >> >> >>> >> >
> > >> > > >> >> >>> >> >
> > >> > > >> >> >>> >> > --
> > >> > > >> >> >>> >> > Best regards,
> > >> > > >> >> >>> >> > Ivan Pavlukhin
> > >> > > >> >> >>> >>
> > >> > > >> >> >>> >>
> > >> > > >> >> >>> >>
> > >> > > >> >> >>> >> --
> > >> > > >> >> >>> >> Best regards,
> > >> > > >> >> >>> >> Ivan Pavlukhin
> > >> > > >> >> >>>
> > >> > > >> >> >>>
> > >> > > >> >> >>>
> > >> > > >> >> >>> --
> > >> > > >> >> >>> Best regards,
> > >> > > >> >> >>> Ivan Pavlukhin
> > >>
> > >
> >

Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

Reply via email to