John, can you explain what the usecase for such a new API is? I don't
see a user of the API in your code. Is there a query you can optimize
with this or what is the reasoning behind this change? I personally
think it's quite invasive to add this information and there must be a
good reason to add this to the TermsEnum? I also don't think we should
have an option on the field for this if we add it but if we don't do
that it's quite a heavy change so I am on the fence if we should even
consider this?
I wonder if you can use the TermsEnum#getAttributeSource() API instead
and add this as a dedicated attribute which is present if the info is
stored. That way you can build your own PostingsFormat that does store
this information?

simon

On Wed, Jan 6, 2021 at 8:06 PM John Wang <john.w...@gmail.com> wrote:
>
> Thank you, Martin!
>
> You can apply the patch to the 8.7 build by just ignoring the changes to 
> Lucene90xxx. Appreciate the help and guidance!
>
> -John
>
>
> On Wed, Jan 6, 2021 at 10:36 AM Martin Gainty <mgai...@hotmail.com> wrote:
>>
>> appears you are targeting 9.0 for your code
>> lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90FieldInfosFormat.java
>> (Lucene90FIeldInfosFormat.java is not contained in either 8.4 or 8.7 distros)
>>
>> <RANT>
>> someone had the bright idea to nuke ant 8.x build.xml without consulting 
>> anyone
>> not a fan of ant but the execution model of gradle is woefully inflexible in 
>> comparison to maven
>> </RANT>
>>
>> i will try with 90 distro to get the 
>> codecs/lucene90/Lucene90FieldInfosFormat and recompile and hopefully your 
>> TestLucene84PostingsFormat will run w/o fail or error
>>
>> Thx
>> martin-
>>
>> ________________________________
>> From: John Wang <john.w...@gmail.com>
>> Sent: Wednesday, January 6, 2021 10:15 AM
>> To: dev@lucene.apache.org <dev@lucene.apache.org>
>> Subject: Re: additional term meta data
>>
>> Hey Martin:
>>
>> There is a test case in the PR we created on our own fork: 
>> https://github.com/dashbase/lucene-solr/pull/1, which also contains some 
>> example code on how to access in the PR description.
>>
>> Here is the link to the beginning of the tests: 
>> https://github.com/dashbase/lucene-solr/blob/posting-last-docid/lucene/core/src/test/org/apache/lucene/codecs/lucene84/TestLucene84PostingsFormat.java#L142
>>
>> I am not sure which version this should be applied to, currently, it was 
>> based on master as of a few days ago. We intend to patch 8.7 for our own 
>> environment.
>>
>> Any advice or feedback is much appreciated.
>>
>> Thank you!
>>
>> -John
>>
>> On Wed, Jan 6, 2021 at 3:28 AM Martin Gainty <mgai...@hotmail.com> wrote:
>>
>> how to access first and last?
>> which version will you be merging
>>
>> ________________________________
>> From: John Wang <john.w...@gmail.com>
>> Sent: Tuesday, January 5, 2021 8:19 PM
>> To: dev@lucene.apache.org <dev@lucene.apache.org>
>> Subject: additional term meta data
>>
>> Hi folks:
>>
>> We like to propose a feature to add additional per-term metadata to the term 
>> diction.
>>
>> Currently, the TermsEnum API returns docFreq as its only meta-data. We 
>> needed a way to quickly get the first and last doc id in the postings 
>> without having to scan through the entire postings list.
>>
>> We have created a PR on our own fork and we would like to contribute this 
>> back to the community. Please let us know if this is something that's useful 
>> and/or fits Lucene's roadmap, we would be happy to submit a patch.
>>
>> https://github.com/dashbase/lucene-solr/pull/1
>>
>> Thank you
>>
>> -John

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to