wchevreuil commented on PR #4407: URL: https://github.com/apache/hbase/pull/4407#issuecomment-1118618553
> As I said on the jira, please hold on merging this 'simple' PR. > > We need to discuss more here. > > Thanks~ Thanks for your comments, Duo. Just pasting the jira discussion here: > The reason to change the order is because when writing WAL, we need to make sure that the compaction is succeeded. And with SFT enabled, since it will have extra IOs, we need to make sure we have successfully update the SFT record, then we can write the compaction marker to WAL. This is very very important, otherwise, when recovering, we may find a compaction marker in the WAL to indicate that the compaction is succeeded, but while loading store file list, we will not load the newly generated files. This may cause serious bugs too, now or future. Makes sense. I guess both ways are problematic. But maybe we are more likely to fail at wal marker writing time? > I still stand my point that, the actual problem here, is we still allow the dead RS to change the hfiles on HDFS. Agree it would be the ideal. It's very challenging, though, to guarantee that all threads changing file system state would be interrupted upon a detection that RS is aborting. The store may be closing separately to the compaction. There's also the compacted files discharger chore running in the background. As an alternative for this immediate problem, we may keep the current order: update SFT, update SFM, write wal marker, but rollback SFM update in case of errors on write wal marker? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
