stack created HBASE-16689:
-----------------------------

             Summary: Durability == ASYNC_WAL means no SYNC
                 Key: HBASE-16689
                 URL: https://issues.apache.org/jira/browse/HBASE-16689
             Project: HBase
          Issue Type: Bug
          Components: wal
    Affects Versions: 1.2.3, 1.1.6, 1.0.3
            Reporter: stack
            Assignee: stack
            Priority: Critical


Setting DURABILITY=ASYNC_WAL on a Table suspends all syncs for all table Table 
appends. If all tables on a cluster have this setting, data is flushed from the 
RS to the DN at some arbitrary time and a bunch may just hang out in DFSClient 
buffers on the RS-side indefinitely if writes are sporadic, at least until 
there is a WAL roll -- a log roll sends a sync through the write pipeline to 
flush out any outstanding appends -- or a region close which does similar.... 
or we crash and drop the data in buffers RS.

This is probably not what a user expects when they set ASYNC_WAL (We don't doc 
anywhere that I could find clearly what ASYNC_WAL means). Worse, old-time users 
probably associate ASYNC_WAL and DEFERRED_FLUSH, an old HTableDescriptor config 
that was deprecated and replaced by ASYNC_WAL. DEFERRED_FLUSH ran a background 
thread -- LogSyncer -- that on a configurable interval, sent a sync down the 
write pipeline so any outstanding appends since last last interval start get 
pushed out to the DN.  ASYNC_WAL doesn't do this (see below for history on how 
we let go of the LogSyncer feature).

Of note, we always sync meta edits. You can't turn this off. Also, given WALs 
are per regionserver, if other regions on the RS are from tables that have sync 
set, these writes will push out to the DN any appends done on tables that have 
DEFERRED/ASYNC_WAL set.

To fix, we could do a few things:

 * Simple and comprehensive would be always queuing a sync, even if ASYNC_WAL 
is set but we let go of Handlers as soon as we write the memstore -- we don't 
wait on the sync to complete as we do with the default setting of 
Durability=SYNC_WAL.
 * Be like a 'real' database and add in a sync after N bytes of data have been 
appended (configurable) or after M milliseconds have passed, which ever 
threshold happens first. The size check would be easy. The sync-ever-M-millis 
would mean another thread.

Let me take a look and report back. Will file a bit of history on how we got 
here in next comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to