Re: How threads interact with each other in HBase

杨苏立 Yang Su Li Sun, 02 Apr 2017 13:54:45 -0700

Regarding HBASE-5954 specifically, have you thought about using BOB (block
order breaker,
https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-pillai.pdf)
to verify if a change is correct.


It allows you to explore many different crash scenarios.



On Sun, Apr 2, 2017 at 1:35 PM, 杨苏立 Yang Su Li <[email protected]> wrote:

> I understand why HBase by default does not use hsync -- it does come with
> big performance cost (though for FSYNC_WAL which is not the default option,
> you should probably do it because the documentation explicitly promised
> it).
>
>
> I just want to make sure my description about HBase is accurate, including
> the durability aspect.
>
> On Sun, Apr 2, 2017 at 12:19 PM, Ted Yu <[email protected]> wrote:
>
>> Suli:
>> Have you looked at HBASE-5954 ?
>>
>> It gives some background on why hbase code is formulated the way it
>> currently is.
>>
>> Cheers
>>
>> On Sun, Apr 2, 2017 at 9:36 AM, 杨苏立 Yang Su Li <[email protected]>
>> wrote:
>>
>> > Don't your second paragraph just prove my point? -- If data is not
>> > persisted to disk, then it is not durable. That is the definition of
>> > durability.
>> >
>> > If you want the data to be durable, then you need to call hsync()
>> instead
>> > of hflush(), and that would be the correct behavior if you use FSYNC_WAL
>> > flag (per HBase documentation).
>> >
>> > However, HBase does not do that.
>> >
>> > Suli
>> >
>> > On Sun, Apr 2, 2017 at 11:26 AM, Josh Elser <[email protected]>
>> wrote:
>> >
>> > > No, that's not correct. HBase would, by definition, not be a
>> > > consistent database if a write was not durable when a client sees a
>> > > successful write.
>> > >
>> > > The point that I will concede to you is that the hflush call may, in
>> > > extenuating circumstances, may not be completely durable. For example,
>> > > HFlush does not actually force the data to disk. If an abrupt power
>> > > failure happens before this data is pushed to disk, HBase may think
>> > > that data was durable when it actually wasn't (at the HDFS level).
>> > >
>> > > On Thu, Mar 30, 2017 at 4:26 PM, 杨苏立 Yang Su Li <[email protected]>
>> > > wrote:
>> > > > Also, please correct me if I am wrong, but I don't think a put is
>> > durable
>> > > > when an RPC returns to the client. Just its corresponding WAL entry
>> is
>> > > > pushed to the memory of all three data nodes, so it has a low
>> > probability
>> > > > of being lost. But nothing is persisted at this point.
>> > > >
>> > > > And this is true no mater you use SYNC_WAL or FSYNC_WAL flag.
>> > > >
>> > > > On Tue, Mar 28, 2017 at 12:11 PM, Josh Elser <[email protected]>
>> > wrote:
>> > > >
>> > > >> 1.1 -> 2: don't forget about the block cache which can invalidate
>> the
>> > > need
>> > > >> for any HDFS read.
>> > > >>
>> > > >> I think you're over-simplifying the write-path quite a bit. I'm not
>> > sure
>> > > >> what you mean by an 'asynchronous write', but that doesn't exist at
>> > the
>> > > >> HBase RPC layer as that would invalidate the consistency guarantees
>> > (if
>> > > an
>> > > >> RPC returns to the client that data was "put", then it is durable).
>> > > >>
>> > > >> Going off of memory (sorry in advance if I misstate something): the
>> > > >> general way that data is written to the WAL is a "group commit".
>> You
>> > > have
>> > > >> many threads all trying to append data to the WAL -- performance
>> would
>> > > be
>> > > >> terrible if you serially applied all of these writes. Instead, many
>> > > writes
>> > > >> can be accepted and a the caller receives a Future. The caller must
>> > wait
>> > > >> for the Future to complete. What's happening behind the scene is
>> that
>> > > the
>> > > >> writes are being bundled together to reduce the number of syncs to
>> the
>> > > WAL
>> > > >> ("grouping" the writes together). When one caller's future would
>> > > complete,
>> > > >> what really happened is that the write/sync which included the
>> > caller's
>> > > >> update was committed (along with others). All of this is happening
>> > > inside
>> > > >> the RS's implementation of accepting an update.
>> > > >>
>> > > >> https://github.com/apache/hbase/blob/55d6dcaf877cc5223e67973
>> > > >> 6eb613173229c18be/hbase-server/src/main/java/org/apache/
>> hadoop/hbase/
>> > > >> regionserver/wal/FSHLog.java#L74-L106
>> > > >>
>> > > >>
>> > > >> 杨苏立 Yang Su Li wrote:
>> > > >>
>> > > >>> The attachment can be found in the following URL:
>> > > >>> http://pages.cs.wisc.edu/~suli/hbase.pdf
>> > > >>>
>> > > >>> Sorry for the inconvenience...
>> > > >>>
>> > > >>>
>> > > >>> On Mon, Mar 27, 2017 at 8:25 PM, Ted Yu<[email protected]>
>> wrote:
>> > > >>>
>> > > >>> Again, attachment didn't come thru.
>> > > >>>>
>> > > >>>> Is it possible to formulate as google doc ?
>> > > >>>>
>> > > >>>> Thanks
>> > > >>>>
>> > > >>>> On Mon, Mar 27, 2017 at 6:19 PM, 杨苏立 Yang Su Li<
>> [email protected]>
>> > > >>>> wrote:
>> > > >>>>
>> > > >>>> Hi,
>> > > >>>>>
>> > > >>>>> I am a graduate student working on scheduling on storage
>> systems,
>> > > and we
>> > > >>>>> are interested in how different threads in HBase interact with
>> each
>> > > >>>>> other
>> > > >>>>> and how it might affect scheduling.
>> > > >>>>>
>> > > >>>>> I have written down my understanding on how HBase/HDFS works
>> based
>> > on
>> > > >>>>> its
>> > > >>>>> current thread architecture (attached). I am wondering if the
>> > > developers
>> > > >>>>>
>> > > >>>> of
>> > > >>>>
>> > > >>>>> HBase could take a look at it and let me know if anything is
>> > > incorrect
>> > > >>>>> or
>> > > >>>>> inaccurate, or if I have missed anything.
>> > > >>>>>
>> > > >>>>> Thanks a lot for your help!
>> > > >>>>>
>> > > >>>>> On Wed, Mar 22, 2017 at 3:39 PM, 杨苏立 Yang Su Li<
>> [email protected]
>> > >
>> > > >>>>> wrote:
>> > > >>>>>
>> > > >>>>> Hi,
>> > > >>>>>>
>> > > >>>>>> I am a graduate student working on scheduling on storage
>> systems,
>> > > and
>> > > >>>>>> we
>> > > >>>>>> are interested in how different threads in HBase interact with
>> > each
>> > > >>>>>>
>> > > >>>>> other
>> > > >>>>
>> > > >>>>> and how it might affect scheduling.
>> > > >>>>>>
>> > > >>>>>> I have written down my understanding on how HBase/HDFS works
>> based
>> > > on
>> > > >>>>>>
>> > > >>>>> its
>> > > >>>>
>> > > >>>>> current thread architecture (attached). I am wondering if the
>> > > >>>>>>
>> > > >>>>> developers of
>> > > >>>>
>> > > >>>>> HBase could take a look at it and let me know if anything is
>> > > incorrect
>> > > >>>>>>
>> > > >>>>> or
>> > > >>>>
>> > > >>>>> inaccurate, or if I have missed anything.
>> > > >>>>>>
>> > > >>>>>> Thanks a lot for your help!
>> > > >>>>>>
>> > > >>>>>> --
>> > > >>>>>> Suli Yang
>> > > >>>>>>
>> > > >>>>>> Department of Physics
>> > > >>>>>> University of Wisconsin Madison
>> > > >>>>>>
>> > > >>>>>> 4257 Chamberlin Hall
>> > > >>>>>> Madison WI 53703
>> > > >>>>>>
>> > > >>>>>>
>> > > >>>>>>
>> > > >>>>> --
>> > > >>>>> Suli Yang
>> > > >>>>>
>> > > >>>>> Department of Physics
>> > > >>>>> University of Wisconsin Madison
>> > > >>>>>
>> > > >>>>> 4257 Chamberlin Hall
>> > > >>>>> Madison WI 53703
>> > > >>>>>
>> > > >>>>>
>> > > >>>>>
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >
>> > > >
>> > > > --
>> > > > Suli Yang
>> > > >
>> > > > Department of Physics
>> > > > University of Wisconsin Madison
>> > > >
>> > > > 4257 Chamberlin Hall
>> > > > Madison WI 53703
>> > >
>> >
>> >
>> >
>> > --
>> > Suli Yang
>> >
>> > Department of Physics
>> > University of Wisconsin Madison
>> >
>> > 4257 Chamberlin Hall
>> > Madison WI 53703
>> >
>>
>
>
>
> --
> Suli Yang
>
> Department of Physics
> University of Wisconsin Madison
>
> 4257 Chamberlin Hall
> Madison WI 53703
>
>


-- 
Suli Yang

Department of Physics
University of Wisconsin Madison

4257 Chamberlin Hall
Madison WI 53703

Re: How threads interact with each other in HBase

Reply via email to