John, can you explain what the usecase for such a new API is? I don't see a user of the API in your code. Is there a query you can optimize with this or what is the reasoning behind this change? I personally think it's quite invasive to add this information and there must be a good reason to add this to the TermsEnum? I also don't think we should have an option on the field for this if we add it but if we don't do that it's quite a heavy change so I am on the fence if we should even consider this? I wonder if you can use the TermsEnum#getAttributeSource() API instead and add this as a dedicated attribute which is present if the info is stored. That way you can build your own PostingsFormat that does store this information?
simon On Wed, Jan 6, 2021 at 8:06 PM John Wang <john.w...@gmail.com> wrote: > > Thank you, Martin! > > You can apply the patch to the 8.7 build by just ignoring the changes to > Lucene90xxx. Appreciate the help and guidance! > > -John > > > On Wed, Jan 6, 2021 at 10:36 AM Martin Gainty <mgai...@hotmail.com> wrote: >> >> appears you are targeting 9.0 for your code >> lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90FieldInfosFormat.java >> (Lucene90FIeldInfosFormat.java is not contained in either 8.4 or 8.7 distros) >> >> <RANT> >> someone had the bright idea to nuke ant 8.x build.xml without consulting >> anyone >> not a fan of ant but the execution model of gradle is woefully inflexible in >> comparison to maven >> </RANT> >> >> i will try with 90 distro to get the >> codecs/lucene90/Lucene90FieldInfosFormat and recompile and hopefully your >> TestLucene84PostingsFormat will run w/o fail or error >> >> Thx >> martin- >> >> ________________________________ >> From: John Wang <john.w...@gmail.com> >> Sent: Wednesday, January 6, 2021 10:15 AM >> To: dev@lucene.apache.org <dev@lucene.apache.org> >> Subject: Re: additional term meta data >> >> Hey Martin: >> >> There is a test case in the PR we created on our own fork: >> https://github.com/dashbase/lucene-solr/pull/1, which also contains some >> example code on how to access in the PR description. >> >> Here is the link to the beginning of the tests: >> https://github.com/dashbase/lucene-solr/blob/posting-last-docid/lucene/core/src/test/org/apache/lucene/codecs/lucene84/TestLucene84PostingsFormat.java#L142 >> >> I am not sure which version this should be applied to, currently, it was >> based on master as of a few days ago. We intend to patch 8.7 for our own >> environment. >> >> Any advice or feedback is much appreciated. >> >> Thank you! >> >> -John >> >> On Wed, Jan 6, 2021 at 3:28 AM Martin Gainty <mgai...@hotmail.com> wrote: >> >> how to access first and last? >> which version will you be merging >> >> ________________________________ >> From: John Wang <john.w...@gmail.com> >> Sent: Tuesday, January 5, 2021 8:19 PM >> To: dev@lucene.apache.org <dev@lucene.apache.org> >> Subject: additional term meta data >> >> Hi folks: >> >> We like to propose a feature to add additional per-term metadata to the term >> diction. >> >> Currently, the TermsEnum API returns docFreq as its only meta-data. We >> needed a way to quickly get the first and last doc id in the postings >> without having to scan through the entire postings list. >> >> We have created a PR on our own fork and we would like to contribute this >> back to the community. Please let us know if this is something that's useful >> and/or fits Lucene's roadmap, we would be happy to submit a patch. >> >> https://github.com/dashbase/lucene-solr/pull/1 >> >> Thank you >> >> -John --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org