magibney commented on pull request #15:
URL: https://github.com/apache/lucene/pull/15#issuecomment-842630574


   Quick status update: comments are formatted properly, precommit check passes 
(apologies again, and thanks to all who weighed in on that).
   
   >Looks to me like removing the old transform impl/tests would really 
simplify the process too.
   
   I removed the old transform impl. Removal does indeed simplify the process, 
at the expense of having a useful way to cross-check offset correction against 
a completely different offset impl. I think this is fine, but we probably 
can/should compensate by filling in `expectedOffsets` where possible for any 
tests with non-randomized input. Some of the explicit input is manually 
constructed and already tests offsets specifically; but many of the 
(`testBespoke*`) tests are edge cases derived from randomly generated data, and 
as such "expected" offsets are difficult to determine _a priori_. For these 
cases I would propose initially bootstrapping expected offsets based on 
_actual_ offsets, and clearly marking "expected" values so-generated as 
basically documenting existing behavior so that we have more "canary" coverage 
to detect unintended changes. This isn't ideal, but I think it's a good 
tradeoff, given how difficult it would be to determine canonical "correct" 
offset correction for r
 andomly derived (though static) input.
   
   It's worth reiterating to be clear about the status of this PR: 
"AnyTransliterator" is not yet supported. Once the already-implemented core 
functionality for "non-dynamic" (non-Any) transliterators is squared away, it 
should be straightforward to use that as a base to incorporate support for 
"AnyTransliterator", but I don't want to do that until further feedback/review 
of the core functionality as currently implemented.
   
   In summary: this should be ready for initial review, unless there's a 
preference to first incorporate synthetic "expected" offset values (as proposed 
above). Please let me know what's preferred wrt synthetic "expected" offset 
values -- I should be able to turn that around quickly if it's desired.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to