I understand why HBase by default does not use hsync -- it does come with big performance cost (though for FSYNC_WAL which is not the default option, you should probably do it because the documentation explicitly promised it).
I just want to make sure my description about HBase is accurate, including the durability aspect. On Sun, Apr 2, 2017 at 12:19 PM, Ted Yu <[email protected]> wrote: > Suli: > Have you looked at HBASE-5954 ? > > It gives some background on why hbase code is formulated the way it > currently is. > > Cheers > > On Sun, Apr 2, 2017 at 9:36 AM, 杨苏立 Yang Su Li <[email protected]> wrote: > > > Don't your second paragraph just prove my point? -- If data is not > > persisted to disk, then it is not durable. That is the definition of > > durability. > > > > If you want the data to be durable, then you need to call hsync() instead > > of hflush(), and that would be the correct behavior if you use FSYNC_WAL > > flag (per HBase documentation). > > > > However, HBase does not do that. > > > > Suli > > > > On Sun, Apr 2, 2017 at 11:26 AM, Josh Elser <[email protected]> > wrote: > > > > > No, that's not correct. HBase would, by definition, not be a > > > consistent database if a write was not durable when a client sees a > > > successful write. > > > > > > The point that I will concede to you is that the hflush call may, in > > > extenuating circumstances, may not be completely durable. For example, > > > HFlush does not actually force the data to disk. If an abrupt power > > > failure happens before this data is pushed to disk, HBase may think > > > that data was durable when it actually wasn't (at the HDFS level). > > > > > > On Thu, Mar 30, 2017 at 4:26 PM, 杨苏立 Yang Su Li <[email protected]> > > > wrote: > > > > Also, please correct me if I am wrong, but I don't think a put is > > durable > > > > when an RPC returns to the client. Just its corresponding WAL entry > is > > > > pushed to the memory of all three data nodes, so it has a low > > probability > > > > of being lost. But nothing is persisted at this point. > > > > > > > > And this is true no mater you use SYNC_WAL or FSYNC_WAL flag. > > > > > > > > On Tue, Mar 28, 2017 at 12:11 PM, Josh Elser <[email protected]> > > wrote: > > > > > > > >> 1.1 -> 2: don't forget about the block cache which can invalidate > the > > > need > > > >> for any HDFS read. > > > >> > > > >> I think you're over-simplifying the write-path quite a bit. I'm not > > sure > > > >> what you mean by an 'asynchronous write', but that doesn't exist at > > the > > > >> HBase RPC layer as that would invalidate the consistency guarantees > > (if > > > an > > > >> RPC returns to the client that data was "put", then it is durable). > > > >> > > > >> Going off of memory (sorry in advance if I misstate something): the > > > >> general way that data is written to the WAL is a "group commit". You > > > have > > > >> many threads all trying to append data to the WAL -- performance > would > > > be > > > >> terrible if you serially applied all of these writes. Instead, many > > > writes > > > >> can be accepted and a the caller receives a Future. The caller must > > wait > > > >> for the Future to complete. What's happening behind the scene is > that > > > the > > > >> writes are being bundled together to reduce the number of syncs to > the > > > WAL > > > >> ("grouping" the writes together). When one caller's future would > > > complete, > > > >> what really happened is that the write/sync which included the > > caller's > > > >> update was committed (along with others). All of this is happening > > > inside > > > >> the RS's implementation of accepting an update. > > > >> > > > >> https://github.com/apache/hbase/blob/55d6dcaf877cc5223e67973 > > > >> 6eb613173229c18be/hbase-server/src/main/java/org/ > apache/hadoop/hbase/ > > > >> regionserver/wal/FSHLog.java#L74-L106 > > > >> > > > >> > > > >> 杨苏立 Yang Su Li wrote: > > > >> > > > >>> The attachment can be found in the following URL: > > > >>> http://pages.cs.wisc.edu/~suli/hbase.pdf > > > >>> > > > >>> Sorry for the inconvenience... > > > >>> > > > >>> > > > >>> On Mon, Mar 27, 2017 at 8:25 PM, Ted Yu<[email protected]> > wrote: > > > >>> > > > >>> Again, attachment didn't come thru. > > > >>>> > > > >>>> Is it possible to formulate as google doc ? > > > >>>> > > > >>>> Thanks > > > >>>> > > > >>>> On Mon, Mar 27, 2017 at 6:19 PM, 杨苏立 Yang Su Li< > [email protected]> > > > >>>> wrote: > > > >>>> > > > >>>> Hi, > > > >>>>> > > > >>>>> I am a graduate student working on scheduling on storage systems, > > > and we > > > >>>>> are interested in how different threads in HBase interact with > each > > > >>>>> other > > > >>>>> and how it might affect scheduling. > > > >>>>> > > > >>>>> I have written down my understanding on how HBase/HDFS works > based > > on > > > >>>>> its > > > >>>>> current thread architecture (attached). I am wondering if the > > > developers > > > >>>>> > > > >>>> of > > > >>>> > > > >>>>> HBase could take a look at it and let me know if anything is > > > incorrect > > > >>>>> or > > > >>>>> inaccurate, or if I have missed anything. > > > >>>>> > > > >>>>> Thanks a lot for your help! > > > >>>>> > > > >>>>> On Wed, Mar 22, 2017 at 3:39 PM, 杨苏立 Yang Su Li< > [email protected] > > > > > > >>>>> wrote: > > > >>>>> > > > >>>>> Hi, > > > >>>>>> > > > >>>>>> I am a graduate student working on scheduling on storage > systems, > > > and > > > >>>>>> we > > > >>>>>> are interested in how different threads in HBase interact with > > each > > > >>>>>> > > > >>>>> other > > > >>>> > > > >>>>> and how it might affect scheduling. > > > >>>>>> > > > >>>>>> I have written down my understanding on how HBase/HDFS works > based > > > on > > > >>>>>> > > > >>>>> its > > > >>>> > > > >>>>> current thread architecture (attached). I am wondering if the > > > >>>>>> > > > >>>>> developers of > > > >>>> > > > >>>>> HBase could take a look at it and let me know if anything is > > > incorrect > > > >>>>>> > > > >>>>> or > > > >>>> > > > >>>>> inaccurate, or if I have missed anything. > > > >>>>>> > > > >>>>>> Thanks a lot for your help! > > > >>>>>> > > > >>>>>> -- > > > >>>>>> Suli Yang > > > >>>>>> > > > >>>>>> Department of Physics > > > >>>>>> University of Wisconsin Madison > > > >>>>>> > > > >>>>>> 4257 Chamberlin Hall > > > >>>>>> Madison WI 53703 > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>> -- > > > >>>>> Suli Yang > > > >>>>> > > > >>>>> Department of Physics > > > >>>>> University of Wisconsin Madison > > > >>>>> > > > >>>>> 4257 Chamberlin Hall > > > >>>>> Madison WI 53703 > > > >>>>> > > > >>>>> > > > >>>>> > > > >>> > > > >>> > > > >>> > > > > > > > > > > > > -- > > > > Suli Yang > > > > > > > > Department of Physics > > > > University of Wisconsin Madison > > > > > > > > 4257 Chamberlin Hall > > > > Madison WI 53703 > > > > > > > > > > > -- > > Suli Yang > > > > Department of Physics > > University of Wisconsin Madison > > > > 4257 Chamberlin Hall > > Madison WI 53703 > > > -- Suli Yang Department of Physics University of Wisconsin Madison 4257 Chamberlin Hall Madison WI 53703
