Re: How threads interact with each other in HBase

Josh Elser Mon, 03 Apr 2017 08:21:45 -0700

Yes, you are correct that there is an edge condition here when there isabrupt power-failure to a node. HDFS guards against most of this asthere are multiple copies of your data spread across racks. However, ifyou have abrupt power failure across multiple racks (or your entirehardware), yes, you would likely lose some data. Having some form ofredundant power-supply is a common deployment choice that furthermitigates this risk. If this is not documented clearly enough, patchesare welcome to improve this :)

IMO, all of this is an implementation detail, though, as I believe youalready understand. It does not change the fact thatarchitecturally/academically, HBase is a consistent system.


杨苏立 Yang Su Li wrote:

I understand why HBase by default does not use hsync -- it does come with
big performance cost (though for FSYNC_WAL which is not the default option,
you should probably do it because the documentation explicitly promised
it).


I just want to make sure my description about HBase is accurate, including
the durability aspect.

On Sun, Apr 2, 2017 at 12:19 PM, Ted Yu<[email protected]>  wrote:

Suli:
Have you looked at HBASE-5954 ?

It gives some background on why hbase code is formulated the way it
currently is.

Cheers

On Sun, Apr 2, 2017 at 9:36 AM, 杨苏立 Yang Su Li<[email protected]>  wrote:

Don't your second paragraph just prove my point? -- If data is not
persisted to disk, then it is not durable. That is the definition of
durability.

If you want the data to be durable, then you need to call hsync() instead
of hflush(), and that would be the correct behavior if you use FSYNC_WAL
flag (per HBase documentation).

However, HBase does not do that.

Suli

On Sun, Apr 2, 2017 at 11:26 AM, Josh Elser<[email protected]>

wrote:

No, that's not correct. HBase would, by definition, not be a
consistent database if a write was not durable when a client sees a
successful write.

The point that I will concede to you is that the hflush call may, in
extenuating circumstances, may not be completely durable. For example,
HFlush does not actually force the data to disk. If an abrupt power
failure happens before this data is pushed to disk, HBase may think
that data was durable when it actually wasn't (at the HDFS level).

On Thu, Mar 30, 2017 at 4:26 PM, 杨苏立 Yang Su Li<[email protected]>
wrote:

Also, please correct me if I am wrong, but I don't think a put is

durable

when an RPC returns to the client. Just its corresponding WAL entry

is

pushed to the memory of all three data nodes, so it has a low

probability

of being lost. But nothing is persisted at this point.

And this is true no mater you use SYNC_WAL or FSYNC_WAL flag.

On Tue, Mar 28, 2017 at 12:11 PM, Josh Elser<[email protected]>

wrote:

1.1 ->  2: don't forget about the block cache which can invalidate

the

need

for any HDFS read.

I think you're over-simplifying the write-path quite a bit. I'm not

sure

what you mean by an 'asynchronous write', but that doesn't exist at

the

HBase RPC layer as that would invalidate the consistency guarantees

(if

an

RPC returns to the client that data was "put", then it is durable).

Going off of memory (sorry in advance if I misstate something): the
general way that data is written to the WAL is a "group commit". You

have

many threads all trying to append data to the WAL -- performance

would

be

terrible if you serially applied all of these writes. Instead, many

writes

can be accepted and a the caller receives a Future. The caller must

wait

for the Future to complete. What's happening behind the scene is

that

the

writes are being bundled together to reduce the number of syncs to

the

WAL

("grouping" the writes together). When one caller's future would

complete,

what really happened is that the write/sync which included the

caller's

update was committed (along with others). All of this is happening

inside

the RS's implementation of accepting an update.

https://github.com/apache/hbase/blob/55d6dcaf877cc5223e67973
6eb613173229c18be/hbase-server/src/main/java/org/

apache/hadoop/hbase/

regionserver/wal/FSHLog.java#L74-L106


杨苏立 Yang Su Li wrote:

The attachment can be found in the following URL:
http://pages.cs.wisc.edu/~suli/hbase.pdf

Sorry for the inconvenience...


On Mon, Mar 27, 2017 at 8:25 PM, Ted Yu<[email protected]>

wrote:

Again, attachment didn't come thru.

Is it possible to formulate as google doc ?

Thanks

On Mon, Mar 27, 2017 at 6:19 PM, 杨苏立 Yang Su Li<

[email protected]>

wrote:

Hi,

I am a graduate student working on scheduling on storage systems,

and we

are interested in how different threads in HBase interact with

each

other
and how it might affect scheduling.

I have written down my understanding on how HBase/HDFS works

based

on

its
current thread architecture (attached). I am wondering if the

developers

of

HBase could take a look at it and let me know if anything is

incorrect

or
inaccurate, or if I have missed anything.

Thanks a lot for your help!

On Wed, Mar 22, 2017 at 3:39 PM, 杨苏立 Yang Su Li<

[email protected]

wrote:

Hi,

I am a graduate student working on scheduling on storage

systems,

and

we
are interested in how different threads in HBase interact with

each

other
and how it might affect scheduling.

I have written down my understanding on how HBase/HDFS works

based

on

its
current thread architecture (attached). I am wondering if the
developers of
HBase could take a look at it and let me know if anything is

incorrect

or
inaccurate, or if I have missed anything.

Thanks a lot for your help!

--
Suli Yang

Department of Physics
University of Wisconsin Madison

4257 Chamberlin Hall
Madison WI 53703

--
Suli Yang

Department of Physics
University of Wisconsin Madison

4257 Chamberlin Hall
Madison WI 53703


--
Suli Yang

Department of Physics
University of Wisconsin Madison

4257 Chamberlin Hall
Madison WI 53703



--
Suli Yang

Department of Physics
University of Wisconsin Madison

4257 Chamberlin Hall
Madison WI 53703

Re: How threads interact with each other in HBase

Reply via email to