wchevreuil commented on PR #4407:
URL: https://github.com/apache/hbase/pull/4407#issuecomment-1118618553

   > As I said on the jira, please hold on merging this 'simple' PR.
   > 
   > We need to discuss more here.
   > 
   > Thanks~
   
   Thanks for your comments, Duo. Just pasting the jira discussion here:
   
   > The reason to change the order is because when writing WAL, we need to 
make sure that the compaction is succeeded. And with SFT enabled, since it will 
have extra IOs, we need to make sure we have successfully update the SFT 
record, then we can write the compaction marker to WAL. This is very very 
important, otherwise, when recovering, we may find a compaction marker in the 
WAL to indicate that the compaction is succeeded, but while loading store file 
list, we will not load the newly generated files. This may cause serious bugs 
too, now or future.
   
   Makes sense. I guess both ways are problematic. But maybe we are more likely 
to fail at wal marker writing time? 
   
   > I still stand my point that, the actual problem here, is we still allow 
the dead RS to change the hfiles on HDFS. 
   
   
   Agree it would be the ideal. It's very challenging, though, to guarantee 
that all threads changing file system state would be interrupted upon a 
detection that RS is aborting. The store may be closing separately to the 
compaction. There's also the compacted files discharger chore running in the 
background.
   
   As an alternative for this immediate problem, we may keep the current order: 
update SFT, update SFM, write wal marker, but rollback SFM update in case of 
errors on write wal marker? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to