[ 
https://issues.apache.org/jira/browse/HBASE-10466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yunfan Zhong updated HBASE-10466:
---------------------------------

    Description: 
During region close, there are two flushes to ensure nothing is persisted in 
memory. When there is data in current memstore only, 1 flush is required. When 
there is data also in memstore's snapshot, 2 flushes are essential otherwise we 
have data loss. However, recently we found two bugs that lead to at least 1 
flush skipped and caused data loss.

Bug 1: Wrong calculation of HRegion.memstoreSize
When a flush fails, data to be flushed is kept in each MemStore's snapshot and 
wait for next flush attempt to continue on it. But when the next flush 
succeeds, the counter of total memstore size in HRegion is always deduced by 
the sum of current memstore sizes instead of snapshots left from previous 
failed flush. This calculation is problematic that almost every time there is 
failed flush, HRegion.memstoreSize gets reduced by a wrong value. If region 
flush could not proceed for a couple cycles, the size in current memstore could 
be much larger than the snapshot. It's likely to drift memstoreSize much 
smaller than expected. In extreme case, if the error accumulates to even bigger 
than HRegion's memstore size limit, any further flush is skipped because flush 
does not do anything if memstoreSize is not larger than 0.
When the region is closing, if the two flushes get skipped and leave data in 
current memstore and/or snapshot, we could lose data up to the memstore size 
limit of the region.
The fix is deducing correct size of data that is going to be flushed from 
memstoreSize.

Bug 2: Conditions for the first flush of region close (so-called pre-flush)
If memstoreSize is smaller than a certain value, or when region close starts a 
flush is ongoing, the first flush is skipped and only the second flush takes 
place. However, two flushes are required in case previous flush fails and 
leaves some data in snapshot. The bug could cause loss of data in current 
memstore.
The fix is removing all conditions except abort check so we ensure 2 flushes 
for region close.


  was:
When there are failed flushes, data to be flush are kept in each MemStore's 
snapshot. Next flush attempt will continue on snapshot first. However, the 
counter of total memstore size in HRegion is always deduced by the sum of 
current memstore sizes after the flush succeeds. This calculation is definitely 
wrong if flush fails last time.
When the region is closing, there are two flushes. During the period that some 
data is in snapshot and the memstore size is incorrect, the first flush 
successfully saved data in snapshot. But the memstore size counter was reduced 
to 0 or even less. This prevented the second flush since 
HRegion.internalFlushcache() directly returns while total memstore size is not 
greater than 0. As result, data in memstores were lost.
It could cause mass data loss up to the size limit of memstores.

        Summary: Bugs that causes flushes being skipped during HRegion close 
could cause data loss  (was: Wrong calculation of total memstore size in 
HRegion which could cause data loss)

> Bugs that causes flushes being skipped during HRegion close could cause data 
> loss
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-10466
>                 URL: https://issues.apache.org/jira/browse/HBASE-10466
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.89-fb
>            Reporter: Yunfan Zhong
>            Priority: Critical
>             Fix For: 0.89-fb
>
>
> During region close, there are two flushes to ensure nothing is persisted in 
> memory. When there is data in current memstore only, 1 flush is required. 
> When there is data also in memstore's snapshot, 2 flushes are essential 
> otherwise we have data loss. However, recently we found two bugs that lead to 
> at least 1 flush skipped and caused data loss.
> Bug 1: Wrong calculation of HRegion.memstoreSize
> When a flush fails, data to be flushed is kept in each MemStore's snapshot 
> and wait for next flush attempt to continue on it. But when the next flush 
> succeeds, the counter of total memstore size in HRegion is always deduced by 
> the sum of current memstore sizes instead of snapshots left from previous 
> failed flush. This calculation is problematic that almost every time there is 
> failed flush, HRegion.memstoreSize gets reduced by a wrong value. If region 
> flush could not proceed for a couple cycles, the size in current memstore 
> could be much larger than the snapshot. It's likely to drift memstoreSize 
> much smaller than expected. In extreme case, if the error accumulates to even 
> bigger than HRegion's memstore size limit, any further flush is skipped 
> because flush does not do anything if memstoreSize is not larger than 0.
> When the region is closing, if the two flushes get skipped and leave data in 
> current memstore and/or snapshot, we could lose data up to the memstore size 
> limit of the region.
> The fix is deducing correct size of data that is going to be flushed from 
> memstoreSize.
> Bug 2: Conditions for the first flush of region close (so-called pre-flush)
> If memstoreSize is smaller than a certain value, or when region close starts 
> a flush is ongoing, the first flush is skipped and only the second flush 
> takes place. However, two flushes are required in case previous flush fails 
> and leaves some data in snapshot. The bug could cause loss of data in current 
> memstore.
> The fix is removing all conditions except abort check so we ensure 2 flushes 
> for region close.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to