ableegoldman opened a new pull request #8962:
URL: https://github.com/apache/kafka/pull/8962


   Two more edge cases I found producing extra TaskcorruptedException while 
playing around with the failing eos-beta upgrade test (sadly these are 
unrelated problems, as the test still fails with these fixes in place).
   
   1. Need to write the checkpoint when recycling a standby: although we do 
preserve the changelog offsets when recycling a task, and should therefore 
write the offsets when the new task is itself closed, we do NOT write the 
checkpoint for uninitialized tasks. So if the new task is ultimately closed 
before it gets out of the CREATED state, the offsets will not be written and we 
can get a TaskCorruptedException
   2. With the change in task locking to address some Windows-related nonsense 
(am I remembering that correctly?), we don't delete entire task directories but 
just clear the inner state. With EOS, during initialization we check if the 
state directory is non-empty and the checkpoint is missing, and throw a 
TaskCorrupted if so. But just opening a rocksdb store creates a `rocksdb` base 
dir in the task directory, so the `taskDirIsEmpty` check always fails and we 
always throw TaskCorrupted even if there's nothing there. 
   
   We can fix 2. for rocksdb specifically, but this might still cause a 
headache for users of custom stores. Note that it's not a correctness issue, 
just an annoyance, so my take is that we should avoid large last-minute changes 
and just fix for rocksdb in 2.6. Then we can consider a more holistic fix going 
forward


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to