magibney commented on pull request #15: URL: https://github.com/apache/lucene/pull/15#issuecomment-842630574
Quick status update: comments are formatted properly, precommit check passes (apologies again, and thanks to all who weighed in on that). >Looks to me like removing the old transform impl/tests would really simplify the process too. I removed the old transform impl. Removal does indeed simplify the process, at the expense of having a useful way to cross-check offset correction against a completely different offset impl. I think this is fine, but we probably can/should compensate by filling in `expectedOffsets` where possible for any tests with non-randomized input. Some of the explicit input is manually constructed and already tests offsets specifically; but many of the (`testBespoke*`) tests are edge cases derived from randomly generated data, and as such "expected" offsets are difficult to determine _a priori_. For these cases I would propose initially bootstrapping expected offsets based on _actual_ offsets, and clearly marking "expected" values so-generated as basically documenting existing behavior so that we have more "canary" coverage to detect unintended changes. This isn't ideal, but I think it's a good tradeoff, given how difficult it would be to determine canonical "correct" offset correction for r andomly derived (though static) input. It's worth reiterating to be clear about the status of this PR: "AnyTransliterator" is not yet supported. Once the already-implemented core functionality for "non-dynamic" (non-Any) transliterators is squared away, it should be straightforward to use that as a base to incorporate support for "AnyTransliterator", but I don't want to do that until further feedback/review of the core functionality as currently implemented. In summary: this should be ready for initial review, unless there's a preference to first incorporate synthetic "expected" offset values (as proposed above). Please let me know what's preferred wrt synthetic "expected" offset values -- I should be able to turn that around quickly if it's desired. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org