[ 
https://issues.apache.org/jira/browse/HBASE-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13647143#comment-13647143
 ] 

Enis Soztutar commented on HBASE-2231:
--------------------------------------

It seems that although very unlikely this can still happen: 
HStore Compaction:
1. write compaction file
2. Append Compaction to WAL (new sequenceId)
3. store.writeLock.lock()
4. modify store files
5. store.writeLock.unlock() 
6. delete files

HRegion flush:
a. startSeqId = wal.startCacheFlush()     -> this has to come after 2
b. for each store
c.   store.writeLock.lock()
d.   add store file
e.   store.writeLock.unlock()
f. update completeSequenceId   

If we get an execution where steps 1 and 2 completed, then flush starts with 
steps a..f and completes before we make sure that the compaction files are 
deleted, then the compact entry might be skipped on replay. It seems we cannot 
extend the HStore write lock to guard against this easily. I was not able to 
find a solution that is not ugly for this. 

                
> Compaction events should be written to HLog
> -------------------------------------------
>
>                 Key: HBASE-2231
>                 URL: https://issues.apache.org/jira/browse/HBASE-2231
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>              Labels: moved_from_0_20_5
>             Fix For: 0.98.0, 0.95.1
>
>         Attachments: 2231-testcase-0.94.txt, 2231-testcase_v2.txt, 
> 2231-testcase_v3.txt, 2231v2.txt, 2231v3.txt, 2231v4.txt, 
> hbase-2231-testcase.txt, hbase-2231.txt, hbase-2231_v5.patch, 
> hbase-2231_v6.patch, hbase-2231_v7-0.95.patch, hbase-2231_v7.patch, 
> hbase-2231_v7.patch
>
>
> The sequence for a compaction should look like this:
> # Compact region to "new" files
> # Write a "Compacted Region" entry to the HLog
> # Delete "old" files
> This deals with a case where the RS has paused between step 1 and 2 and the 
> regions have since been reassigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to