tlrx commented on PR #16086:
URL: https://github.com/apache/lucene/pull/16086#issuecomment-4497386359

   Thanks @romseygeek  and @mikemccand for your feedback!
   
   Before addressing code review comments, I'll try to reply to the 
higher-level questions:
   
   > I'm not convinced we understand the root cause here, so I think we may be 
fixing a "not actually a problem". At Amazon we've also seen some evidence that 
merge abort was insanely slow, but somehow never followed up on it / got to 
root cause. Can we first spend some time back on the issue doing that?
   
   > It's hard for me to believe `checkIntegrity` can take minutes on any 
Lucene index file unless the storage system holding the index is insanely slow. 
Or, the index is insanely massive. Both are possible :)
   
   Yes, the storage systems were indeed insanely slow when it happened. I've 
seen this in two situations: disks saturated with IO (caused I think by many 
concurrent force merges running on non top notch SSDs) and storage backed by 
object storage like S3 with high read latency hiccups, as David describes in 
https://github.com/apache/lucene/issues/13354#issuecomment-4490313048. In those 
situations, the Lucene index needed to be closed for system maintenance purpose 
or to be reopened on a more performant machine.
   
   > For example, is there maybe a bug that's turned off the entire abort 
checking best effort system?
   
   I don't think there is a bug in the existing abort checking system itself. 
The existing `merge.checkAborted()` calls
   work correctly at the points where they are checked (there are 2 or 3 of 
such calls before and in `IndexWriter.mergeMiddle()`), but once the merge 
thread starts the integrity checks on the segment readers to merge, then it is 
committed to finishing reading all the files associated to all reader to merge 
before it can observe the abort flag again.
   
   Note that this change adds the `checkAborted()` method to `MergeState`, so 
we could maybe add additional checks on the aborted flag before checking the 
integrity of a reader. That would be less intrusive and already an improvement 
in my opinion, since the merge will be dropped later anyway.
   
   > do any of our unit tests check for prompt merge aborts?
   
   I found some tests that abort merges but they don't check that the abort 
happens promptly. Specifically, there is no
   test that verifies how "quickly" a merge reacts to the abort signal after it 
has entered `checkIntegrity` / `checksumEntireFile`.
   
   > I'm also not a fan of only doing stored fields -- other Lucene index files 
can be ginormous (vectors) -- can we impl for those as well?
   
   Yes I mentioned in the PR description that other files could benefit from 
this but wanted to keep the radius of changes low at first.
   
   For vectors, I wonder if more `checkAborted()` calls could be added too when 
building large graphs during merges.
   
   > For merges, `IndexWriter` already wraps the provided `Directory` to track 
which files are created for Codec file-tracking purposes, I think? If so, we 
could in theory wrap that to also instrument every `IndexInput` with this 
"check every N seconds", maybe.
   
   I explored the `MergeScheduler#wrapForMerge` path but noticed that many 
readers were not reopening the files/inputs for integrity checks but instead 
use clones of already opened inputs, making it hard to wrap the index inputs.  
The typical pattern seems to be reader -> reader merge instance > clone inputs 
-> seek to zero (then skip by reading to compute the checksum).
   
   I also explored wrapping the readers (`StoredFieldsReader`, 
`DocValuesProducer etc`) using `MergePolicy.OneMerge#wrapForMerge(CodecReader)` 
to override the checkIntegry methods but noticed that it would disable some 
bulk merges optimizations down the road.
   
   > Can we subclass `FilterIndexInput` instead [...]
   > since things can vary drastically depending on the env, let's use 
time/seconds as the "check every", not bytes? We can hardwire this to something 
reasonable that's surely in the noise of overall merge cost.
   
   If we're only accounting for time spent reading bytes for checksumming 
purposes without tying it to the `OneMerge` aborted flag, then I agree we can 
subclass `FilterIndexInput` directly in `checksumEntireFile`.
   
   So I think this leaves us with different options:
   
   1. Implement a FilterIndexInput in CodecUtil.checksumEntireFile() that 
periodically checks elapsed time
   2. Use the checkAborted() method added to MergeState in this pull request 
before each reader.checkIntegrity() call in the various writer/consumer merge 
methods.
   3. Explore wrapping the merge Directory via MergeScheduler.wrapForMerge to 
return wrapped IndexInputs that periodically checks checkAborted (may require 
to reopen files instead of relying on clones)
   4. Continue with the current proposal as-is.
   
   I think we could start with option 2 as a simple first improvement in this 
PR (doesn't interrupt in-progress checksums but avoids starting new ones on an 
already-aborted merge) and follow up with option 1 or 3 to interrupt 
long-running checksums.
   
   Does that sounds reasonable? If so I can rework the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to