[jira] [Updated] (HBASE-4282) Potential data loss in retries of WAL close introduced in HBASE-4222

Gary Helmling (JIRA) Fri, 16 Sep 2011 18:01:50 -0700

     [ 
https://issues.apache.org/jira/browse/HBASE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Gary Helmling updated HBASE-4282:
---------------------------------

    Attachment: HBASE-4282_trunk_prelim.patch

Here's a preliminary patch for feedback.  We store the seq num for the last 
deferred flush edit that was appended.  If that doesn't match what we expect 
after a sync(), then we don't reset it.  In addition, we only ride over a WAL 
close error if there is no outstanding deferred seq num, otherwise we abort as 
we previously did.

There is a window here where we might fail to clear lastDeferredSeq if an edit 
comes in after we get the current value, prior to the sync() and where we do 
the check post-sync().  It's possible that the edit was appended prior to the 
sync() and is no longer outstanding, so we might be providing a false positive 
for abort when we could be riding over the error instead.  Don't know if we can 
do better here with the sync() being done outside of the updates lock for 
HDFS-895 goodness.

The main thing missing from this patch is an additional test case for this 
condition, which I'll work up.

> Potential data loss in retries of WAL close introduced in HBASE-4222
> --------------------------------------------------------------------
>
>                 Key: HBASE-4282
>                 URL: https://issues.apache.org/jira/browse/HBASE-4282
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Gary Helmling
>            Assignee: Gary Helmling
>            Priority: Blocker
>             Fix For: 0.92.0, 0.90.5
>
>         Attachments: HBASE-4282_trunk_prelim.patch
>
>
> The ability to ride over WAL close errors on log rolling added in HBASE-4222 
> could lead to missing HLog entries if:
> * A table has DEFERRED_LOG_FLUSH=true
> * There are unflushed WALEdit entries for that table in the current 
> SequenceFile writer buffer
> Since the writes were already acknowledged to the client, just ignoring the 
> close error to allow for another log roll doesn't seem like the right thing 
> to do here.
> We could easily flag this state and only ride over the close error if there 
> aren't unflushed entries.  This would bring the above condition back to the 
> previous behavior of aborting the region server.  However, aborting the 
> region server in this state is still guaranteeing data loss.  Is there 
> anything we can do better in this case?  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4282) Potential data loss in retries of WAL close introduced in HBASE-4222

Reply via email to