Re: commit semantics

Ryan Rawson Mon, 11 Jan 2010 22:53:40 -0800

Right now each regionserver has 1 log, so if 2 puts on different
tables hit the same RS, they hit the same HLog.


There are 2 performance enhancing things in trunk:
- bulk commit - we only call sync() once per RPC, no matter how many
rows are involved.  If you use the batch put API you can get really
high levels of performance.
- group commit - we can take multiple thread's worth of sync()s and do
it in one, not N.  This improves performance while maintaining high
data security.

If you are expecting very high concurrency, group commit is your
friend. The more concurrent operations, the more rows per sync you are
capturing and the higher overall rows/sec performance you can achieve
while the same number of sync() calls/sec performance remains
constant.

The other option is to sync() on a fine grained timer, eg: every 10ms
(or at 100hz).  The window of data loss is small, and the performance
boost is substantial. I asked JD to implement a switchable config so
that you can chose on a table-by-table basis the right mix of
performance vs persistence with a better control feature than merely
"sync every N rows".

I've thought about this issue quite a bit, and I think the sync every
1 rows combined with optional no-sync and low time sync() is the way
to go. If you want to discuss this more in person, maybe we can meet
up for brews or something.

-ryan

On Mon, Jan 11, 2010 at 10:25 PM, Dhruba Borthakur <dhr...@gmail.com> wrote:
> any IO to a HDFS-file (appends, writes, etc) ae actually blocked on a
> pending sync. "sync" in HDFS is a pretty heavyweight operation as it stands.
>
> if we want the best of both worlds.. latency as well as data integrity, how
> about inserting the same record into two completely separate HBase tables in
> parallel... the operation can complete as soon as the record is inserted
> into the first HBase table (thus giving low latencies) but data integrity
> will not be compromised because it is unlikely that two region servers will
> fail exactly at the same time (assuming that there is a way to ensure that
> these two tables are not handled by the same region server).
>
> thanks,
> dhruba
>
>
> On Mon, Jan 11, 2010 at 8:12 PM, Joydeep Sarma <jssa...@apache.org> wrote:
>
>> ok - hadn't thought about it that way - but yeah with a default of 1 -
>> the semantics seem correct.
>>
>> under high load - some batching would automatically happen at this
>> setting (or so one would think - not sure if hdfs appends are blocked
>> on pending syncs (in which case the batching wouldn't quite happen i
>> think) - cc'ing Dhruba).
>>
>> if the performance with setting of 1 doesn't work out - we may need an
>> option to delay acks until actual syncs .. (most likely we would be
>> able to compromise on latency to get higher throughput - but wouldn't
>> be willing to compromise on data integrity)
>>
>> > Hey Joydeep,
>> >
>> > This is actually intended this way but the name of the variable is
>> > misleading. The sync is done only if forceSync or we have enough
>> > entries to sync (default is 1). If someone wants to sync only 100
>> > entries for example, they would play with that configuration.
>> >
>> > Hope that helps,
>> >
>> > J-D
>> >
>> >
>> > On Mon, Jan 11, 2010 at 3:46 PM, Joydeep Sarma <jssa...@apache.org>
>> wrote:
>> >>
>> >> Hey HBase-devs,
>> >>
>> >> we have been going through hbase code to come up to speed.
>> >>
>> >> One of the questions was regarding the commit semantics. Thumbing
>> through the RegionServer code that's appending to the wal:
>> >>
>> >> syncWal -> HLog.sync -> addToSyncQueue ->syncDone.await()
>> >>
>> >> and the log writer thread calls:
>> >>
>> >> hflush(), syncDone.signalAll()
>> >>
>> >> however hflush doesn't necessarily call a sync on the underlying log
>> file:
>> >>
>> >>       if (this.forceSync ||
>> >>           this.unflushedEntries.get() >= this.flushlogentries) { ...
>> sync() ... }
>> >>
>> >> so it seems that if forceSync is not true, the syncWal can unblock
>> before a sync is called (and forcesync seems to be only true for
>> metaregion()).
>> >>
>> >> are we missing something - or is there a bug here (the signalAll should
>> be conditional on hflush having actually flushed something).
>> >>
>> >> thanks,
>> >>
>> >> Joydeep
>> >
>>
>
>
>
> --
> Connect to me at http://www.facebook.com/dhruba
>

Re: commit semantics

Reply via email to