In the actual tool, this is reported as the "number of tokens". This
*IS* actually the number of tokens that you have.

On Fri, Sep 3, 2021 at 3:45 PM Ankur Goel <[email protected]> wrote:
>
> Hello Folks,
>                In Amazon product search we have a use case to override the 
> term-frequency to hold
> a custom scoring signal for a small subset of fields in a document. These 
> fields do not have positions
> enabled. The support for this was added to Lucene in 
> https://issues.apache.org/jira/browse/LUCENE-7854.
>
> Following this change the CheckIndex tool no longer reports the total token 
> counts correctly on our index.
> We have a simple 1-line change in our internal branch to increment total 
> positions count by 1 (instead of term-frequency)
> if a field does not have positions.
>
> Current: 
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java#L1609
>
> Proposed: (hasPositions ? freq : 1);
>
> If the community feels this is useful and something that should be changed in 
> Lucene then I am happy to open a JIRA and contribute a patch with suitable 
> unit test(s).
>
> Thanks
> -Ankur
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to