ldematte opened a new pull request, #16053: URL: https://github.com/apache/lucene/pull/16053
`Lucene99FlatVectorsWriter` hard-codes the per-field storage for vector values to an on-heap `ArrayList<T>` (`ArrayList<float[]>` or `ArrayList<byte[]>`), managed by a private nested class `Lucene99FlatVectorsWriter.FieldWriter<T>`. Different users of `Lucene99FlatVectorsFormat` may want to change how vectors are stored in memory; this PR proposes a simple change to decouple the external interface and the write pipeline implemented by `Lucene99FlatVectorsWriter` from the vectors' memory storage. In particular, this PR adds a public constructor overload to `Lucene99FlatVectorsWriter` that accepts a `FlatFieldVectorsWriter` factory: when the factory is supplied, `addField(FieldInfo)` uses it to obtain the per-field storage; when the existing two-arg ctor is used, the current behavior is preserved exactly via a default hardcoded `fieldWriterFactory`. `Lucene99FlatVectorsWriter` write pipeline is unchanged in shape, but is refactored internally to read its FieldWriter's state through accessor methods rather than direct field reads. This makes it possible to introduce a `Delegating` variant of `FieldWriter` that forwards to the injected strategy, without the need to open up the class visibility. `FieldWriter<T>` remains a private nested class. No public API surface of `FlatFieldVectorsWriter<T>` (or any other class) is changed. A new `TestKnnVectorsFormatCustomWriter` is introduced; this test exercises the new constructor against the full `BaseKnnVectorsFormatTestCase` suite, using a paged storage strategy as a concrete example of a non-default `FlatFieldVectorsWriter`. The purpose of the test is 2-fold: 1) ensure that the use of a different, custom strategy does not break any existing invariants, and 2) showcase how a different `FlatFieldVectorsWriter` can work. The test `FlatFieldVectorsWriter` stores vectors in a `List<ByteBuffer>` of fixed-size pages, and exposes them back through FlatFieldVectorsWriter#getVectors()` via an `AbstractList` adapter that materializes a heap array per access. The on-disk format produced is identical to the default configuration -- only the in-memory accumulation differs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
