Re: How threads interact with each other in HBase

杨苏立 Yang Su Li Sun, 02 Apr 2017 14:43:28 -0700

You might want to look at this follow up work as well:
https://www.usenix.org/conference/osdi16/technical-sessions/presentation/alagappan


It talks about how to use bob on distributed systems.

On Sun, Apr 2, 2017 at 4:32 PM, Ted Yu <[email protected]> wrote:

> Need some time to digest the BOB and see if it can simplify the reasoning
> of how fsync is implemented in hbase.
>
> hdfs was evaluated by the paper where I noticed the following:
>
> bq. both HDFS and ZooKeeper respondents lament that such an fsync() is not
> easily achievable with Java
>
> Cheers
>
> On Sun, Apr 2, 2017 at 1:53 PM, 杨苏立 Yang Su Li <[email protected]> wrote:
>
> > Regarding HBASE-5954 specifically, have you thought about using BOB
> (block
> > order breaker,
> > https://www.usenix.org/system/files/conference/osdi14/
> > osdi14-paper-pillai.pdf)
> > to verify if a change is correct.
> >
> > It allows you to explore many different crash scenarios.
> >
> >
> >
> > On Sun, Apr 2, 2017 at 1:35 PM, 杨苏立 Yang Su Li <[email protected]>
> wrote:
> >
> > > I understand why HBase by default does not use hsync -- it does come
> with
> > > big performance cost (though for FSYNC_WAL which is not the default
> > option,
> > > you should probably do it because the documentation explicitly promised
> > > it).
> > >
> > >
> > > I just want to make sure my description about HBase is accurate,
> > including
> > > the durability aspect.
> > >
> > > On Sun, Apr 2, 2017 at 12:19 PM, Ted Yu <[email protected]> wrote:
> > >
> > >> Suli:
> > >> Have you looked at HBASE-5954 ?
> > >>
> > >> It gives some background on why hbase code is formulated the way it
> > >> currently is.
> > >>
> > >> Cheers
> > >>
> > >> On Sun, Apr 2, 2017 at 9:36 AM, 杨苏立 Yang Su Li <[email protected]>
> > >> wrote:
> > >>
> > >> > Don't your second paragraph just prove my point? -- If data is not
> > >> > persisted to disk, then it is not durable. That is the definition of
> > >> > durability.
> > >> >
> > >> > If you want the data to be durable, then you need to call hsync()
> > >> instead
> > >> > of hflush(), and that would be the correct behavior if you use
> > FSYNC_WAL
> > >> > flag (per HBase documentation).
> > >> >
> > >> > However, HBase does not do that.
> > >> >
> > >> > Suli
> > >> >
> > >> > On Sun, Apr 2, 2017 at 11:26 AM, Josh Elser <[email protected]>
> > >> wrote:
> > >> >
> > >> > > No, that's not correct. HBase would, by definition, not be a
> > >> > > consistent database if a write was not durable when a client sees
> a
> > >> > > successful write.
> > >> > >
> > >> > > The point that I will concede to you is that the hflush call may,
> in
> > >> > > extenuating circumstances, may not be completely durable. For
> > example,
> > >> > > HFlush does not actually force the data to disk. If an abrupt
> power
> > >> > > failure happens before this data is pushed to disk, HBase may
> think
> > >> > > that data was durable when it actually wasn't (at the HDFS level).
> > >> > >
> > >> > > On Thu, Mar 30, 2017 at 4:26 PM, 杨苏立 Yang Su Li <
> [email protected]
> > >
> > >> > > wrote:
> > >> > > > Also, please correct me if I am wrong, but I don't think a put
> is
> > >> > durable
> > >> > > > when an RPC returns to the client. Just its corresponding WAL
> > entry
> > >> is
> > >> > > > pushed to the memory of all three data nodes, so it has a low
> > >> > probability
> > >> > > > of being lost. But nothing is persisted at this point.
> > >> > > >
> > >> > > > And this is true no mater you use SYNC_WAL or FSYNC_WAL flag.
> > >> > > >
> > >> > > > On Tue, Mar 28, 2017 at 12:11 PM, Josh Elser <[email protected]
> >
> > >> > wrote:
> > >> > > >
> > >> > > >> 1.1 -> 2: don't forget about the block cache which can
> invalidate
> > >> the
> > >> > > need
> > >> > > >> for any HDFS read.
> > >> > > >>
> > >> > > >> I think you're over-simplifying the write-path quite a bit. I'm
> > not
> > >> > sure
> > >> > > >> what you mean by an 'asynchronous write', but that doesn't
> exist
> > at
> > >> > the
> > >> > > >> HBase RPC layer as that would invalidate the consistency
> > guarantees
> > >> > (if
> > >> > > an
> > >> > > >> RPC returns to the client that data was "put", then it is
> > durable).
> > >> > > >>
> > >> > > >> Going off of memory (sorry in advance if I misstate something):
> > the
> > >> > > >> general way that data is written to the WAL is a "group
> commit".
> > >> You
> > >> > > have
> > >> > > >> many threads all trying to append data to the WAL --
> performance
> > >> would
> > >> > > be
> > >> > > >> terrible if you serially applied all of these writes. Instead,
> > many
> > >> > > writes
> > >> > > >> can be accepted and a the caller receives a Future. The caller
> > must
> > >> > wait
> > >> > > >> for the Future to complete. What's happening behind the scene
> is
> > >> that
> > >> > > the
> > >> > > >> writes are being bundled together to reduce the number of syncs
> > to
> > >> the
> > >> > > WAL
> > >> > > >> ("grouping" the writes together). When one caller's future
> would
> > >> > > complete,
> > >> > > >> what really happened is that the write/sync which included the
> > >> > caller's
> > >> > > >> update was committed (along with others). All of this is
> > happening
> > >> > > inside
> > >> > > >> the RS's implementation of accepting an update.
> > >> > > >>
> > >> > > >> https://github.com/apache/hbase/blob/55d6dcaf877cc5223e67973
> > >> > > >> 6eb613173229c18be/hbase-server/src/main/java/org/apache/
> > >> hadoop/hbase/
> > >> > > >> regionserver/wal/FSHLog.java#L74-L106
> > >> > > >>
> > >> > > >>
> > >> > > >> 杨苏立 Yang Su Li wrote:
> > >> > > >>
> > >> > > >>> The attachment can be found in the following URL:
> > >> > > >>> http://pages.cs.wisc.edu/~suli/hbase.pdf
> > >> > > >>>
> > >> > > >>> Sorry for the inconvenience...
> > >> > > >>>
> > >> > > >>>
> > >> > > >>> On Mon, Mar 27, 2017 at 8:25 PM, Ted Yu<[email protected]>
> > >> wrote:
> > >> > > >>>
> > >> > > >>> Again, attachment didn't come thru.
> > >> > > >>>>
> > >> > > >>>> Is it possible to formulate as google doc ?
> > >> > > >>>>
> > >> > > >>>> Thanks
> > >> > > >>>>
> > >> > > >>>> On Mon, Mar 27, 2017 at 6:19 PM, 杨苏立 Yang Su Li<
> > >> [email protected]>
> > >> > > >>>> wrote:
> > >> > > >>>>
> > >> > > >>>> Hi,
> > >> > > >>>>>
> > >> > > >>>>> I am a graduate student working on scheduling on storage
> > >> systems,
> > >> > > and we
> > >> > > >>>>> are interested in how different threads in HBase interact
> with
> > >> each
> > >> > > >>>>> other
> > >> > > >>>>> and how it might affect scheduling.
> > >> > > >>>>>
> > >> > > >>>>> I have written down my understanding on how HBase/HDFS works
> > >> based
> > >> > on
> > >> > > >>>>> its
> > >> > > >>>>> current thread architecture (attached). I am wondering if
> the
> > >> > > developers
> > >> > > >>>>>
> > >> > > >>>> of
> > >> > > >>>>
> > >> > > >>>>> HBase could take a look at it and let me know if anything is
> > >> > > incorrect
> > >> > > >>>>> or
> > >> > > >>>>> inaccurate, or if I have missed anything.
> > >> > > >>>>>
> > >> > > >>>>> Thanks a lot for your help!
> > >> > > >>>>>
> > >> > > >>>>> On Wed, Mar 22, 2017 at 3:39 PM, 杨苏立 Yang Su Li<
> > >> [email protected]
> > >> > >
> > >> > > >>>>> wrote:
> > >> > > >>>>>
> > >> > > >>>>> Hi,
> > >> > > >>>>>>
> > >> > > >>>>>> I am a graduate student working on scheduling on storage
> > >> systems,
> > >> > > and
> > >> > > >>>>>> we
> > >> > > >>>>>> are interested in how different threads in HBase interact
> > with
> > >> > each
> > >> > > >>>>>>
> > >> > > >>>>> other
> > >> > > >>>>
> > >> > > >>>>> and how it might affect scheduling.
> > >> > > >>>>>>
> > >> > > >>>>>> I have written down my understanding on how HBase/HDFS
> works
> > >> based
> > >> > > on
> > >> > > >>>>>>
> > >> > > >>>>> its
> > >> > > >>>>
> > >> > > >>>>> current thread architecture (attached). I am wondering if
> the
> > >> > > >>>>>>
> > >> > > >>>>> developers of
> > >> > > >>>>
> > >> > > >>>>> HBase could take a look at it and let me know if anything is
> > >> > > incorrect
> > >> > > >>>>>>
> > >> > > >>>>> or
> > >> > > >>>>
> > >> > > >>>>> inaccurate, or if I have missed anything.
> > >> > > >>>>>>
> > >> > > >>>>>> Thanks a lot for your help!
> > >> > > >>>>>>
> > >> > > >>>>>> --
> > >> > > >>>>>> Suli Yang
> > >> > > >>>>>>
> > >> > > >>>>>> Department of Physics
> > >> > > >>>>>> University of Wisconsin Madison
> > >> > > >>>>>>
> > >> > > >>>>>> 4257 Chamberlin Hall
> > >> > > >>>>>> Madison WI 53703
> > >> > > >>>>>>
> > >> > > >>>>>>
> > >> > > >>>>>>
> > >> > > >>>>> --
> > >> > > >>>>> Suli Yang
> > >> > > >>>>>
> > >> > > >>>>> Department of Physics
> > >> > > >>>>> University of Wisconsin Madison
> > >> > > >>>>>
> > >> > > >>>>> 4257 Chamberlin Hall
> > >> > > >>>>> Madison WI 53703
> > >> > > >>>>>
> > >> > > >>>>>
> > >> > > >>>>>
> > >> > > >>>
> > >> > > >>>
> > >> > > >>>
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > > Suli Yang
> > >> > > >
> > >> > > > Department of Physics
> > >> > > > University of Wisconsin Madison
> > >> > > >
> > >> > > > 4257 Chamberlin Hall
> > >> > > > Madison WI 53703
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Suli Yang
> > >> >
> > >> > Department of Physics
> > >> > University of Wisconsin Madison
> > >> >
> > >> > 4257 Chamberlin Hall
> > >> > Madison WI 53703
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > Suli Yang
> > >
> > > Department of Physics
> > > University of Wisconsin Madison
> > >
> > > 4257 Chamberlin Hall
> > > Madison WI 53703
> > >
> > >
> >
> >
> > --
> > Suli Yang
> >
> > Department of Physics
> > University of Wisconsin Madison
> >
> > 4257 Chamberlin Hall
> > Madison WI 53703
> >
>



-- 
Suli Yang

Department of Physics
University of Wisconsin Madison

4257 Chamberlin Hall
Madison WI 53703

Re: How threads interact with each other in HBase

Reply via email to