Regarding HBASE-5954 specifically, have you thought about using BOB (block order breaker, https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-pillai.pdf) to verify if a change is correct.
It allows you to explore many different crash scenarios. On Sun, Apr 2, 2017 at 1:35 PM, 杨苏立 Yang Su Li <[email protected]> wrote: > I understand why HBase by default does not use hsync -- it does come with > big performance cost (though for FSYNC_WAL which is not the default option, > you should probably do it because the documentation explicitly promised > it). > > > I just want to make sure my description about HBase is accurate, including > the durability aspect. > > On Sun, Apr 2, 2017 at 12:19 PM, Ted Yu <[email protected]> wrote: > >> Suli: >> Have you looked at HBASE-5954 ? >> >> It gives some background on why hbase code is formulated the way it >> currently is. >> >> Cheers >> >> On Sun, Apr 2, 2017 at 9:36 AM, 杨苏立 Yang Su Li <[email protected]> >> wrote: >> >> > Don't your second paragraph just prove my point? -- If data is not >> > persisted to disk, then it is not durable. That is the definition of >> > durability. >> > >> > If you want the data to be durable, then you need to call hsync() >> instead >> > of hflush(), and that would be the correct behavior if you use FSYNC_WAL >> > flag (per HBase documentation). >> > >> > However, HBase does not do that. >> > >> > Suli >> > >> > On Sun, Apr 2, 2017 at 11:26 AM, Josh Elser <[email protected]> >> wrote: >> > >> > > No, that's not correct. HBase would, by definition, not be a >> > > consistent database if a write was not durable when a client sees a >> > > successful write. >> > > >> > > The point that I will concede to you is that the hflush call may, in >> > > extenuating circumstances, may not be completely durable. For example, >> > > HFlush does not actually force the data to disk. If an abrupt power >> > > failure happens before this data is pushed to disk, HBase may think >> > > that data was durable when it actually wasn't (at the HDFS level). >> > > >> > > On Thu, Mar 30, 2017 at 4:26 PM, 杨苏立 Yang Su Li <[email protected]> >> > > wrote: >> > > > Also, please correct me if I am wrong, but I don't think a put is >> > durable >> > > > when an RPC returns to the client. Just its corresponding WAL entry >> is >> > > > pushed to the memory of all three data nodes, so it has a low >> > probability >> > > > of being lost. But nothing is persisted at this point. >> > > > >> > > > And this is true no mater you use SYNC_WAL or FSYNC_WAL flag. >> > > > >> > > > On Tue, Mar 28, 2017 at 12:11 PM, Josh Elser <[email protected]> >> > wrote: >> > > > >> > > >> 1.1 -> 2: don't forget about the block cache which can invalidate >> the >> > > need >> > > >> for any HDFS read. >> > > >> >> > > >> I think you're over-simplifying the write-path quite a bit. I'm not >> > sure >> > > >> what you mean by an 'asynchronous write', but that doesn't exist at >> > the >> > > >> HBase RPC layer as that would invalidate the consistency guarantees >> > (if >> > > an >> > > >> RPC returns to the client that data was "put", then it is durable). >> > > >> >> > > >> Going off of memory (sorry in advance if I misstate something): the >> > > >> general way that data is written to the WAL is a "group commit". >> You >> > > have >> > > >> many threads all trying to append data to the WAL -- performance >> would >> > > be >> > > >> terrible if you serially applied all of these writes. Instead, many >> > > writes >> > > >> can be accepted and a the caller receives a Future. The caller must >> > wait >> > > >> for the Future to complete. What's happening behind the scene is >> that >> > > the >> > > >> writes are being bundled together to reduce the number of syncs to >> the >> > > WAL >> > > >> ("grouping" the writes together). When one caller's future would >> > > complete, >> > > >> what really happened is that the write/sync which included the >> > caller's >> > > >> update was committed (along with others). All of this is happening >> > > inside >> > > >> the RS's implementation of accepting an update. >> > > >> >> > > >> https://github.com/apache/hbase/blob/55d6dcaf877cc5223e67973 >> > > >> 6eb613173229c18be/hbase-server/src/main/java/org/apache/ >> hadoop/hbase/ >> > > >> regionserver/wal/FSHLog.java#L74-L106 >> > > >> >> > > >> >> > > >> 杨苏立 Yang Su Li wrote: >> > > >> >> > > >>> The attachment can be found in the following URL: >> > > >>> http://pages.cs.wisc.edu/~suli/hbase.pdf >> > > >>> >> > > >>> Sorry for the inconvenience... >> > > >>> >> > > >>> >> > > >>> On Mon, Mar 27, 2017 at 8:25 PM, Ted Yu<[email protected]> >> wrote: >> > > >>> >> > > >>> Again, attachment didn't come thru. >> > > >>>> >> > > >>>> Is it possible to formulate as google doc ? >> > > >>>> >> > > >>>> Thanks >> > > >>>> >> > > >>>> On Mon, Mar 27, 2017 at 6:19 PM, 杨苏立 Yang Su Li< >> [email protected]> >> > > >>>> wrote: >> > > >>>> >> > > >>>> Hi, >> > > >>>>> >> > > >>>>> I am a graduate student working on scheduling on storage >> systems, >> > > and we >> > > >>>>> are interested in how different threads in HBase interact with >> each >> > > >>>>> other >> > > >>>>> and how it might affect scheduling. >> > > >>>>> >> > > >>>>> I have written down my understanding on how HBase/HDFS works >> based >> > on >> > > >>>>> its >> > > >>>>> current thread architecture (attached). I am wondering if the >> > > developers >> > > >>>>> >> > > >>>> of >> > > >>>> >> > > >>>>> HBase could take a look at it and let me know if anything is >> > > incorrect >> > > >>>>> or >> > > >>>>> inaccurate, or if I have missed anything. >> > > >>>>> >> > > >>>>> Thanks a lot for your help! >> > > >>>>> >> > > >>>>> On Wed, Mar 22, 2017 at 3:39 PM, 杨苏立 Yang Su Li< >> [email protected] >> > > >> > > >>>>> wrote: >> > > >>>>> >> > > >>>>> Hi, >> > > >>>>>> >> > > >>>>>> I am a graduate student working on scheduling on storage >> systems, >> > > and >> > > >>>>>> we >> > > >>>>>> are interested in how different threads in HBase interact with >> > each >> > > >>>>>> >> > > >>>>> other >> > > >>>> >> > > >>>>> and how it might affect scheduling. >> > > >>>>>> >> > > >>>>>> I have written down my understanding on how HBase/HDFS works >> based >> > > on >> > > >>>>>> >> > > >>>>> its >> > > >>>> >> > > >>>>> current thread architecture (attached). I am wondering if the >> > > >>>>>> >> > > >>>>> developers of >> > > >>>> >> > > >>>>> HBase could take a look at it and let me know if anything is >> > > incorrect >> > > >>>>>> >> > > >>>>> or >> > > >>>> >> > > >>>>> inaccurate, or if I have missed anything. >> > > >>>>>> >> > > >>>>>> Thanks a lot for your help! >> > > >>>>>> >> > > >>>>>> -- >> > > >>>>>> Suli Yang >> > > >>>>>> >> > > >>>>>> Department of Physics >> > > >>>>>> University of Wisconsin Madison >> > > >>>>>> >> > > >>>>>> 4257 Chamberlin Hall >> > > >>>>>> Madison WI 53703 >> > > >>>>>> >> > > >>>>>> >> > > >>>>>> >> > > >>>>> -- >> > > >>>>> Suli Yang >> > > >>>>> >> > > >>>>> Department of Physics >> > > >>>>> University of Wisconsin Madison >> > > >>>>> >> > > >>>>> 4257 Chamberlin Hall >> > > >>>>> Madison WI 53703 >> > > >>>>> >> > > >>>>> >> > > >>>>> >> > > >>> >> > > >>> >> > > >>> >> > > > >> > > > >> > > > -- >> > > > Suli Yang >> > > > >> > > > Department of Physics >> > > > University of Wisconsin Madison >> > > > >> > > > 4257 Chamberlin Hall >> > > > Madison WI 53703 >> > > >> > >> > >> > >> > -- >> > Suli Yang >> > >> > Department of Physics >> > University of Wisconsin Madison >> > >> > 4257 Chamberlin Hall >> > Madison WI 53703 >> > >> > > > > -- > Suli Yang > > Department of Physics > University of Wisconsin Madison > > 4257 Chamberlin Hall > Madison WI 53703 > > -- Suli Yang Department of Physics University of Wisconsin Madison 4257 Chamberlin Hall Madison WI 53703
