[ 
https://issues.apache.org/jira/browse/HBASE-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12690114#action_12690114
 ] 

Jim Kellerman commented on HBASE-7:
-----------------------------------

There are (at least) three areas where we are still vulnerable:

1. Incomplete table deletion. (see above)
2. Incomplete cache flush (region server dies during flush) see below.
3. Inability to recover write ahead log (HLog) if server dies. Depends on 
HADOOP-4379

HBase protects itself from incomplete compactions by performing the operation 
in a temporary directory. If the compaction does not complete successfully, 
another compaction request will be generated and the partially completed 
compaction data is erased.

We should do  something similar for a cache flush: write the flush to a 
temporary directory and move the new store file into place only if the flush 
completes successfully. Any subsequent cache flush will erase data in the 
temporary flush directory. Recovery will happen when HLog is replayed by new 
server for the region.

Without HADOOP-4379, we cannot guarantee that we can recover the most recent 
HLog file. Although Dhruba is looking at the issue, he would probably accept 
help from someone else. Getting HADOOP-4379 integrated into Hadoop is the most 
important thing we can do to ensure data integrity.

The second most important thing to do is to put cache flushes into a temporary 
directory.

That would leave hbasefsck handling incomplete deletes (and perhaps other 
inconsistencies in the HBase file structure)


> [hbase] Provide a HBase checker and repair tool similar to fsck
> ---------------------------------------------------------------
>
>                 Key: HBASE-7
>                 URL: https://issues.apache.org/jira/browse/HBASE-7
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: util
>            Reporter: Jim Kellerman
>             Fix For: 0.20.0
>
>         Attachments: patch.txt
>
>
> We need a tool to verify (and repair) HBase much like fsck

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to