[
https://issues.apache.org/jira/browse/LUCENE-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17029967#comment-17029967
]
Robert Muir commented on LUCENE-9205:
-------------------------------------
I agree the idea can work, but I'm not sure how much fun this would be to
debug. If the bug is an encoding corner-case, it might not be that horrible,
but if its trickier such as a misuse of some api, the additional indirection
might not be fun?
On the other hand if I look at BaseDocValuesFormatTestCase, I don't feel like
BinaryDV is severely under-represented? There are a lot of tests for binary dv
here... we should add more if we can think of interesting corner cases, because
these simple unit tests are the easiest to debug. I think accumulating these
has the highest value.
IMO if we want to wrap a codec with anything, the easiest win would be to fix
AssertingCodec to work with more than just the default codec. This thing tries
to be simple to debug and will find api misuses and so on. Unfortunately today
Asserting* is wired to only wrap the default codec and not anything else. And
if you override some methods (such as docvalues) to test a different
implementation, it returns that different implementation bare (without
asserts), and just the rest of the index with asserts.
The tests in {{org.apache.lucene.codecs.asserting}} work around this situation,
and test the default codec all over again, just with this wrapper enabled on
what they are testing. But maybe we should delete these tests, and just always
wrap all index structures with asserts, so that more than just the default
codec are getting the checks?
> Add a mock/test codec that redirects all doc values formats to BINARY
> ---------------------------------------------------------------------
>
> Key: LUCENE-9205
> URL: https://issues.apache.org/jira/browse/LUCENE-9205
> Project: Lucene - Core
> Issue Type: Task
> Reporter: Michael McCandless
> Priority: Major
>
> We have some exciting proposed changes that will use `BINARY` doc values,
> e.g. [using block compression in the default
> codec|https://github.com/apache/lucene-solr/pull/1234#pullrequestreview-353073188],
> supporting approximate nearest-neighbor search using
> [HNSW|https://issues.apache.org/jira/browse/LUCENE-9004] or
> [IVFFLAT|https://issues.apache.org/jira/browse/LUCENE-9136], but I think our
> testing of `BINARY` doc values could be improved.
> A possibly simple way to do that would be to make a mock/test codec that
> wraps the default codec but implements all doc values formats (`NUMERIC`,
> `SORTED`, etc.) on top of the wrapped codec's `BINARY` format. This way the
> wrapped codec only ever sees `BINARY`, and then all Lucene tests that test
> the other doc values codecs, would be testing `BINARY`.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]