hossman commented on issue #15540:
URL: https://github.com/apache/lucene/issues/15540#issuecomment-3820706190
> Yeah +1. It's kinda the vector equivalent of the empty string term, which
is indeed a valid term that you can index in Lucene if your tokenizer produces
it.
Except that having an empty string term doesn't trip any assertions during
segment merge, or when running `CheckIndex`
That to me seems like the biggest problem(s) here.
One of two things needs to be true:
1. This index is "valid", **_in spite of_** the vector value+similarity
combo being invalid, and neither segment merging nor `CheckIndex` should care
- even if this document will never (or always) be found via a vector
search query, that is none of the business of the index/merge/check logic
- analogous to indexing an empty string term
1. The index is "corrupt" _**because**_ the vector value+similarity combo
is invalid, and some code path should have stopped this document from ever
being added to the index in the first place.
- either `new KnnByteVectorField(...)` or
`IndexWriter.addDocument(...)` should have thrown an exception
- analogous to trying to index a negative position increment in a token
stream
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]