Hi Simon: This might be specific to us, it makes sense not making such core changes If not needed.
Here is our use case anyway: We first sort the index in time order, so docids can be used as proxy for time. In the VoIP world, we are using Lucene to stitch call flows, which is similar to the APM/tracing use case. To optimally get the range of the transaction, using first and last docid helps without the need to traverse the posting list. It would be ideal for us to not have to modify Lucene, would be great to understand how getting AttributeSource helps with this case. Let me spend some time learning about it. Thank you for the suggestion! -John On Fri, Jan 8, 2021 at 11:19 PM Simon Willnauer <simon.willna...@gmail.com> wrote: > John, can you explain what the usecase for such a new API is? I don't > see a user of the API in your code. Is there a query you can optimize > with this or what is the reasoning behind this change? I personally > think it's quite invasive to add this information and there must be a > good reason to add this to the TermsEnum? I also don't think we should > have an option on the field for this if we add it but if we don't do > that it's quite a heavy change so I am on the fence if we should even > consider this? > I wonder if you can use the TermsEnum#getAttributeSource() API instead > and add this as a dedicated attribute which is present if the info is > stored. That way you can build your own PostingsFormat that does store > this information? > > simon > > On Wed, Jan 6, 2021 at 8:06 PM John Wang <john.w...@gmail.com> wrote: > > > > Thank you, Martin! > > > > You can apply the patch to the 8.7 build by just ignoring the changes to > Lucene90xxx. Appreciate the help and guidance! > > > > -John > > > > > > On Wed, Jan 6, 2021 at 10:36 AM Martin Gainty <mgai...@hotmail.com> > wrote: > >> > >> appears you are targeting 9.0 for your code > >> > lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90FieldInfosFormat.java > >> (Lucene90FIeldInfosFormat.java is not contained in either 8.4 or 8.7 > distros) > >> > >> <RANT> > >> someone had the bright idea to nuke ant 8.x build.xml without > consulting anyone > >> not a fan of ant but the execution model of gradle is woefully > inflexible in comparison to maven > >> </RANT> > >> > >> i will try with 90 distro to get the > codecs/lucene90/Lucene90FieldInfosFormat and recompile and hopefully your > TestLucene84PostingsFormat will run w/o fail or error > >> > >> Thx > >> martin- > >> > >> ________________________________ > >> From: John Wang <john.w...@gmail.com> > >> Sent: Wednesday, January 6, 2021 10:15 AM > >> To: dev@lucene.apache.org <dev@lucene.apache.org> > >> Subject: Re: additional term meta data > >> > >> Hey Martin: > >> > >> There is a test case in the PR we created on our own fork: > https://github.com/dashbase/lucene-solr/pull/1, which also contains some > example code on how to access in the PR description. > >> > >> Here is the link to the beginning of the tests: > https://github.com/dashbase/lucene-solr/blob/posting-last-docid/lucene/core/src/test/org/apache/lucene/codecs/lucene84/TestLucene84PostingsFormat.java#L142 > >> > >> I am not sure which version this should be applied to, currently, it > was based on master as of a few days ago. We intend to patch 8.7 for our > own environment. > >> > >> Any advice or feedback is much appreciated. > >> > >> Thank you! > >> > >> -John > >> > >> On Wed, Jan 6, 2021 at 3:28 AM Martin Gainty <mgai...@hotmail.com> > wrote: > >> > >> how to access first and last? > >> which version will you be merging > >> > >> ________________________________ > >> From: John Wang <john.w...@gmail.com> > >> Sent: Tuesday, January 5, 2021 8:19 PM > >> To: dev@lucene.apache.org <dev@lucene.apache.org> > >> Subject: additional term meta data > >> > >> Hi folks: > >> > >> We like to propose a feature to add additional per-term metadata to the > term diction. > >> > >> Currently, the TermsEnum API returns docFreq as its only meta-data. We > needed a way to quickly get the first and last doc id in the postings > without having to scan through the entire postings list. > >> > >> We have created a PR on our own fork and we would like to contribute > this back to the community. Please let us know if this is something that's > useful and/or fits Lucene's roadmap, we would be happy to submit a patch. > >> > >> https://github.com/dashbase/lucene-solr/pull/1 > >> > >> Thank you > >> > >> -John > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >