[ 
https://issues.apache.org/jira/browse/HBASE-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834652#action_12834652
 ] 

Todd Lipcon commented on HBASE-2231:
------------------------------------

Essentially, we're relying on the following sequence of events in the case of a 
pause:

# The RS starts a compaction
# GC or whatever kicks in, causing the RS to be kicked out of the ZK quorum
# Master notices this and opens the RS's latest HLog for append. This steals 
the write lease on the HLog file, bumps generation stamps, etc.
# RS comes back to life, finishes the compaction, writes "Compacted Region" to 
HLog, and calls hflush()
# hflush() fails since the original writer lease is no longer valid. Thus, the 
region server aborts and does not futz with the region's on-HDFS data (thus not 
interfering with the new server of this region)

This same sequence occurs for the case when the region tries to hflush edits 
in-flight RPCs from before the pause.

There was some question about a problem where the regionserver also rolls its 
HLog right after the pause. In this case, the master might not catch the rolled 
HLog file, and the RS wouldn't be properly interrupted. Perhaps one solution 
for this is that, when rolling an HLog, you must open the new HLog first, then 
append an "HLog rolled" message to the old HLog, then start writing into the 
new one. There can then be a protocol on the HMaster to determine that it has 
really stolen the lease for the latest HLog.

Regardless of the above question, the HMaster will also need to be sure to take 
the lease on the last HLog immediately, rather than starting to roll forward 
from the first HLog without looking at the latest ones.

> Compaction events should be written to HLog
> -------------------------------------------
>
>                 Key: HBASE-2231
>                 URL: https://issues.apache.org/jira/browse/HBASE-2231
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Todd Lipcon
>
> The sequence for a compaction should look like this:
> # Compact region to "new" files
> # Write a "Compacted Region" entry to the HLog
> # Delete "old" files
> This deals with a case where the RS has paused between step 1 and 2 and the 
> regions have since been reassigned.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to