Yuriy, > Let me summarize the approaches: I agree with your reasoning, p.2 sounds the best one to me as well.
Will look into merge-sort strategy some time later. Best regards, Ivan Pavlukhin пт, 13 мар. 2020 г. в 19:23, Yuriy Shuliga <shul...@gmail.com>: > > Ivan, > > I have made changes in the fork that reflects merge-sort strategy and now > query future iterator unblocks as soon all first pages are delivered from > nodes; then it waits for the next pages portions and so on. > https://github.com/shuliga/ignite/commit/c84f04c18f67e99ab7bc0a7893b75f1dc83a76bd > > Please validate the design if you wish. > > Regarding ranking field in the entity. > > Entities for text queries in search domain are usually treated as > documents with some metadata. > This can be an id, issued/expired date, and document score returned for > given query. > It is common to include such fields in entity design. > > Answer to your question about omitting QueryRankField: > - Then the response records just will come in arbitrary order. This > should not fail TextQuery execution. > > Another point about rank value among different indices. > - ranks are to be used for comparison between entities in praticular query > response, they are not intended to be absolute over the system. > > Let me summarize the approaches: > 1. Subclassing from Ranked.class. > pros: the simplest and ignite-natural approach > cons: implicit nature, limits entity inheritance > > 2. Explicitly Introducing dedicated field annotated @QueryRankField > pros: ignite-natural approach, easy to introduce, explicitly controlled by > developer > cons: adds extra metadata to entity > > 3. Wrapping entity response with rank data, used for merge sort, not > exposing it to client. > pros: leaves entity design clean > cons: rank is not available for client, development will require complex > change in query execution / entity marshaling mechanisms > > I'd stay on p.2 as most balanced solution of these. > What do you think? > > BR, > Yuriy Shuliha > > > > > ср, 11 бер. 2020 о 01:14 Ivan Pavlukhin <vololo...@gmail.com> пише: > > > Igniters, > > > > Not intentionally the discussion continued outside of dev list. I am > > returning it back. You can find it below. Do not hesitate to join if you > > have some thoughts on raised questions. May be you have ideas how to enrich > > text query results with score/rank information. > > > > вт, 10 мар. 2020 г. в 09:11, Yuriy Shuliga <shul...@gmail.com>: > > > > > Yes, please do. > > > > > > вт, 10 бер. 2020, 02:26 користувач Ivan Pavlukhin <vololo...@gmail.com> > > > пише: > > > > > >> Yuriy, > > >> > > >> I noticed that from some point our discussion moved out of Ignite dev > > >> list. Would you mind if I return it back to dev list? > > >> > > >> Best regards, > > >> Ivan Pavlukhin > > >> > > >> вт, 10 мар. 2020 г. в 03:25, Ivan Pavlukhin <vololo...@gmail.com>: > > >> > > > >> > > PS As far as i see, the are no chance to get on 2.8 release train. > > >> What will be the next version/date we can aim on with this update? > > >> > > > >> > Yes, 2.8 is already available and the community is working on > > >> finalizing activities (e.g. publishing documentation). I do not have any > > >> reliable expectations about next releases. I suppose that there could > > be a > > >> couple of maintenance releases like 2.8.1 as several problems were > > already > > >> discovered. I do not know whether next more significant release is > > going to > > >> be 2.9 even major release 3.0. It sounds realistic to facilitate 2.9 > > >> because there are already several "almost ready" features in master. In > > my > > >> mind it is a good idea to start a discussion about next releases on dev > > >> list. > > >> > > > >> > Best regards, > > >> > Ivan Pavlukhin > > >> > > > >> > вт, 10 мар. 2020 г. в 00:58, Ivan Pavlukhin <vololo...@gmail.com>: > > >> > > > > >> > > Hi Yuriy, > > >> > > > > >> > > Sorry for a late response. > > >> > > > > >> > > > Suitable solution without subclassing might be: > > >> > > > 1. Explicitly add float field to entity > > >> > > > 2. Annotate it with special @QueryRankField, (for instance) > > >> > > > 3. Fill in this field with docScore in GrlidLuceneindex, pass back > > >> to initiating node > > >> > > > 4. Possibly still need to proxify entity with adding Comparable > > >> interface. > > >> > > > 5. Perform merge sort on initiating node > > >> > > > > >> > > Possibly I missed it but one moment is not clear for me. What will > > >> > > happen if an entity class does not have a field annotated with > > >> > > QueryRankField? > > >> > > > > >> > > And I am still not sure that it is a proper (enough) approach. The > > >> > > thing which bothers me is a transient and dynamic nature of "rank" > > >> > > field. It does belong to entity, it can have different values for > > the > > >> > > same entity (e.g. different indices are used). > > >> > > > > >> > > I would like to experiment with a code a little bit. But most > > likely I > > >> > > will have a chance only at the end of this week. > > >> > > > > >> > > Best regards, > > >> > > Ivan Pavlukhin > > >> > > > > >> > > пн, 2 мар. 2020 г. в 20:09, Yuriy Shuliga <shul...@gmail.com>: > > >> > > > > > >> > > > Hi Ivan, > > >> > > > > > >> > > > Have concerns about entity annotation variant. > > >> > > > Wrapping into dynamic proxy for passing back, will be quite a > > >> complex thing that requires changes in IgniteCacheObjectProcessor > > >> > > > and entity marshaling. > > >> > > > > > >> > > > Suitable solution without subclassing might be: > > >> > > > 1. Explicitly add float field to entity > > >> > > > 2. Annotate it with special @QueryRankField, (for instance) > > >> > > > 3. Fill in this field with docScore in GrlidLuceneindex, pass back > > >> to initiating node > > >> > > > 4. Possibly still need to proxify entity with adding Comparable > > >> interface. > > >> > > > 5. Perform merge sort on initiating node > > >> > > > > > >> > > > Would you consider this approach or return back to using Ranked > > >> superclass? > > >> > > > > > >> > > > Regarding your proposal to implement megre sort - definitely yes. > > >> > > > I will implement this. > > >> > > > Sorry, didn't understand you earlier ) > > >> > > > > > >> > > > BR, > > >> > > > Yuriy Shuliha > > >> > > > > > >> > > > PS As far as i see, the are no chance to get on 2.8 release train. > > >> What will be the next version/date we can aim on with this update? > > >> > > > > > >> > > > > > >> > > > пт, 28 лют. 2020 о 23:08 Ivan Pavlukhin <vololo...@gmail.com> > > пише: > > >> > > >> > > >> > > >> Hi Yuriy, > > >> > > >> > > >> > > >> Sorry for a late response and thank you for your comments. > > >> > > >> > > >> > > >> Approach with @Ranked annotation looks cleaner to me from API > > >> point of view. > > >> > > >> > > >> > > >> Regarding merging responses from multiple nodes I suppose that > > good > > >> > > >> enough solution is possible: > > >> > > >> 1. Request one page of entries from each node. > > >> > > >> 2. Return one page to a user (as there is definitely a page of > > the > > >> > > >> best results already). > > >> > > >> 3. Request next result pages from nodes corresponding to pages we > > >> > > >> exposed to the user (actually nodes having lesser than 1 page of > > >> > > >> pending results). Repeat from step 2. > > >> > > >> > > >> > > >> Some kind of sort merge plus backpressure. Backpressure part > > might > > >> be > > >> > > >> left as an improvement. > > >> > > >> > > >> > > >> What do you think? > > >> > > >> > > >> > > >> Best regards, > > >> > > >> Ivan Pavlukhin > > >> > > >> > > >> > > >> вт, 18 февр. 2020 г. в 18:27, Yuriy Shuliga <shul...@gmail.com>: > > >> > > >> > > >> > > >> > > > >> > > >> > Hi Ivan, > > >> > > >> > > > >> > > >> > Thank you for keeping eye on the topic! > > >> > > >> > > > >> > > >> > Here're the answers to your questions: > > >> > > >> > 1. TextQuery response is always ordered by documentScore, and > > >> this number are also frequently used when processing the results. > > >> > > >> > We have analyzed current entity flow indeed the hood of query > > >> processing and found out that the most clean approach to get response > > with > > >> ordered entities is to extent the entity itself. > > >> > > >> > The only drawback will be the necessity to extend from Ranked > > in > > >> our case. And as it is very common to utilize documentScore (rank) when > > >> working with TextQuery. > > >> > > >> > Another approach i see, is to play with reflection to create > > >> proxy with Ranked interface. In this case we still will need to mark our > > >> intentions to have ordered response and add some @Ranked annotation e.g. > > >> > > >> > Please, advice what would fit Ignite better. > > >> > > >> > > > >> > > >> > 2. Yes, you are right. Using PriorityQueue may lead to > > unwanted > > >> memory consumption. > > >> > > >> > In order to get correct response we still need to retrieve data > > >> from all of the nodes, as ant of them may contain value that may fall > > into > > >> limited range (this is because of float ranking score) > > >> > > >> > This can be fixed by using Guava's MinMaxPriorityQueue that has > > >> maximum size limitation. Technically it will be equivalent to the sorted > > >> responses merging, as each element will require comparison upon all > > queue. > > >> > > >> > > > >> > > >> > BR, > > >> > > >> > Yuriy Shuliha > > >> > > >> > > > >> > > >> > > > >> > > >> > чт, 13 лют. 2020 о 13:53 Ivan Pavlukhin <vololo...@gmail.com> > > >> пише: > > >> > > >> >> > > >> > > >> >> Hi Yuriy, > > >> > > >> >> > > >> > > >> >> Sorry for a delay. I went through the proposed solution and I > > >> have > > >> > > >> >> some questions. Currently I am a little bit far from a context > > >> of TEXT > > >> > > >> >> queries, so correct me or redirect to some previous discussion > > >> if I > > >> > > >> >> got something wrong: > > >> > > >> >> 1. What is a justification for using inheritance from Ranked > > in > > >> order > > >> > > >> >> to keep order? Why cannot we mix in rank/score into entries > > >> > > >> >> transferred inside GridCacheQueryResponse? > > >> > > >> >> 2. Collecting all entries in PriorityQueue can lead to > > >> unnecessary > > >> > > >> >> heap memory consumption. I think that merging several sorted > > >> runs > > >> > > >> >> (responses from different nodes) will be a better option. > > >> > > >> >> > > >> > > >> >> Best regards, > > >> > > >> >> Ivan Pavlukhin > > >> > > >> >> > > >> > > >> >> пн, 10 февр. 2020 г. в 18:32, Yuriy Shuliga < > > shul...@gmail.com > > >> >: > > >> > > >> >> > > > >> > > >> >> > Hi Ivan, > > >> > > >> >> > > > >> > > >> >> > Did you have a chance to look through the proposed solution? > > >> > > >> >> > We definitely need this validation in order to proceed > > >> further and provide the changes officially . > > >> > > >> >> > > > >> > > >> >> > BR, > > >> > > >> >> > Yuriy Shluiha > > >> > > >> >> > > > >> > > >> >> > вт, 28 січ. 2020 о 17:30 Yuriy Shuliga <shul...@gmail.com> > > >> пише: > > >> > > >> >> >> > > >> > > >> >> >> Hello, > > >> > > >> >> >> > > >> > > >> >> >> please see the proposed TextQuery ordering solution here: > > >> > > >> >> >> > > >> > > https://github.com/apache/ignite/compare/master...shuliga:feature/rank_score > > >> > > >> >> >> > > >> > > >> >> >> Y. > > >> > > >> >> >> > > >> > > >> >> >> пт, 24 січ. 2020 о 09:50 Ivan Pavlukhin < > > vololo...@gmail.com> > > >> пише: > > >> > > >> >> >>> > > >> > > >> >> >>> Yuriy, > > >> > > >> >> >>> > > >> > > >> >> >>> Good to know that the story continues! Yes, it would be > > >> really nice to > > >> > > >> >> >>> see the code of your solution, of course formal > > >> requirements can be > > >> > > >> >> >>> omitted, a solution design is of the most interest so far. > > >> And it > > >> > > >> >> >>> definitely would be great to merge to Apache Ignite > > codebase > > >> > > >> >> >>> eventually. > > >> > > >> >> >>> > > >> > > >> >> >>> чт, 23 янв. 2020 г. в 16:47, Yuriy Shuliga < > > >> shul...@gmail.com>: > > >> > > >> >> >>> > > > >> > > >> >> >>> > Hi Ivan, > > >> > > >> >> >>> > > > >> > > >> >> >>> > Actually I have engaged another developer to help bring > > >> TextQueries to correctly working state. > > >> > > >> >> >>> > For now we have solution that adds Ordering > > functionality > > >> to distributed TextQueries . > > >> > > >> >> >>> > This is developed and tested locally. I can share > > details > > >> here, then we can discuss and decide whether to create a corresponding > > >> ticket. > > >> > > >> >> >>> > > > >> > > >> >> >>> > The starting point is that by nature Lucene's documents > > >> are always ordered by docScore:float; > > >> > > >> >> >>> > So we created abstract class Ranked, implementing > > >> Comparable<Ranked> and Serializable; and containing float rank value; > > >> > > >> >> >>> > > > >> > > >> >> >>> > Each entity expected to be ordered on TextQuery merge > > >> should be derived from this class. > > >> > > >> >> >>> > All subsequent actions will be done under the hood > > >> automatically due to new CacheQueryFutureRankedDecorator > > >> > > >> >> >>> > that contain special BlockingIterator used for correct > > >> merge of distributed responses. > > >> > > >> >> >>> > Text queries with Ranked entities are automatically > > >> wrapped with this new decorator. > > >> > > >> >> >>> > > > >> > > >> >> >>> > This is a contour of solution. Please ask if any > > >> questions. > > >> > > >> >> >>> > Or i can create ticket and link PR with already tested > > >> (yet locally) solution to it for detailed review. > > >> > > >> >> >>> > > > >> > > >> >> >>> > BR, > > >> > > >> >> >>> > Yuriy > > >> > > >> >> >>> > > > >> > > >> >> >>> > > > >> > > >> >> >>> > вт, 21 січ. 2020 о 07:29 Ivan Pavlukhin < > > >> vololo...@gmail.com> пише: > > >> > > >> >> >>> >> > > >> > > >> >> >>> >> Hi Yuriy, > > >> > > >> >> >>> >> > > >> > > >> >> >>> >> Just would like to realize current state. Are you still > > >> working on > > >> > > >> >> >>> >> Ignite text queries? If not, are you going to continue > > >> with it? > > >> > > >> >> >>> >> > > >> > > >> >> >>> >> пт, 13 дек. 2019 г. в 11:52, Ivan Pavlukhin < > > >> vololo...@gmail.com>: > > >> > > >> >> >>> >> > > > >> > > >> >> >>> >> > Yuriy, > > >> > > >> >> >>> >> > > > >> > > >> >> >>> >> > Sure, I will be glad to help. > > >> > > >> >> >>> >> > > > >> > > >> >> >>> >> > > - incorrect nodes/partition selection during > > >> querying? > > >> > > >> >> >>> >> > Apparently this is the problem. If you feel it really > > >> complicated to > > >> > > >> >> >>> >> > understand and debug then I can dig deeper and share > > >> my vision how the > > >> > > >> >> >>> >> > problem can be fixed. > > >> > > >> >> >>> >> > > > >> > > >> >> >>> >> > ср, 11 дек. 2019 г. в 18:46, Yuriy Shuliga < > > >> shul...@gmail.com>: > > >> > > >> >> >>> >> > > > > >> > > >> >> >>> >> > > I will look to the MOVING partition issue. > > >> > > >> >> >>> >> > > But also need a guidance there. > > >> > > >> >> >>> >> > > > > >> > > >> >> >>> >> > > Ivan, don't you mind to be that person? > > >> > > >> >> >>> >> > > > > >> > > >> >> >>> >> > > The question is whether we have an issue with: > > >> > > >> >> >>> >> > > - wrong storing targets during indexing OR > > >> > > >> >> >>> >> > > - incorrect nodes/partition selection during > > >> querying? > > >> > > >> >> >>> >> > > > > >> > > >> >> >>> >> > > BR, > > >> > > >> >> >>> >> > > Yuriy Shluiha > > >> > > >> >> >>> >> > > > > >> > > >> >> >>> >> > > > > >> > > >> >> >>> >> > > > > >> > > >> >> >>> >> > > -- > > >> > > >> >> >>> >> > > Sent from: > > >> http://apache-ignite-developers.2346864.n4.nabble.com/ > > >> > > >> >> >>> >> > > > >> > > >> >> >>> >> > > > >> > > >> >> >>> >> > > > >> > > >> >> >>> >> > -- > > >> > > >> >> >>> >> > Best regards, > > >> > > >> >> >>> >> > Ivan Pavlukhin > > >> > > >> >> >>> >> > > >> > > >> >> >>> >> > > >> > > >> >> >>> >> > > >> > > >> >> >>> >> -- > > >> > > >> >> >>> >> Best regards, > > >> > > >> >> >>> >> Ivan Pavlukhin > > >> > > >> >> >>> > > >> > > >> >> >>> > > >> > > >> >> >>> > > >> > > >> >> >>> -- > > >> > > >> >> >>> Best regards, > > >> > > >> >> >>> Ivan Pavlukhin > > >> > > > > >