In the actual tool, this is reported as the "number of tokens". This *IS* actually the number of tokens that you have.
On Fri, Sep 3, 2021 at 3:45 PM Ankur Goel <[email protected]> wrote: > > Hello Folks, > In Amazon product search we have a use case to override the > term-frequency to hold > a custom scoring signal for a small subset of fields in a document. These > fields do not have positions > enabled. The support for this was added to Lucene in > https://issues.apache.org/jira/browse/LUCENE-7854. > > Following this change the CheckIndex tool no longer reports the total token > counts correctly on our index. > We have a simple 1-line change in our internal branch to increment total > positions count by 1 (instead of term-frequency) > if a field does not have positions. > > Current: > https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java#L1609 > > Proposed: (hasPositions ? freq : 1); > > If the community feels this is useful and something that should be changed in > Lucene then I am happy to open a JIRA and contribute a patch with suitable > unit test(s). > > Thanks > -Ankur > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
