Re: HBase region server failure issues

Vladimir Rodionov Mon, 14 Apr 2014 18:33:07 -0700

*On the other hand, 95% of HBase users don't actually configure HDFS to
fsync() every edit. Given that, the random writes aren't actually going to
cause one seek per write -- they'll get buffered up and written back
periodically in a much more efficient fashion.*


Todd, this is in theory. Reality is different. 1 writer is definitely more
efficient than 100. This won't scale well.


On Mon, Apr 14, 2014 at 6:20 PM, Todd Lipcon <[email protected]> wrote:

> On the other hand, 95% of HBase users don't actually configure HDFS to
> fsync() every edit. Given that, the random writes aren't actually going to
> cause one seek per write -- they'll get buffered up and written back
> periodically in a much more efficient fashion.
>
> Plus, in some small number of years, I believe SSDs will be available on
> most server machines (in a hybrid configuration) so the seeks will cost
> less even with fsync on.
>
> -Todd
>
>
> On Mon, Apr 14, 2014 at 4:54 PM, Vladimir Rodionov
> <[email protected]>wrote:
>
> > I do not think its a good idea to have one WAL file per region. All WAL
> > file idea is based on assumption that  writing data sequentially reduces
> > average latency and increases total throughput. This is no longer a case
> in
> > a one WAL file per region approach, you may have hundreds active regions
> > per RS and all sequential writes become random ones and random IO for
> > rotational media is very bad, very bad.
> >
> > -Vladimir Rodionov
> >
> >
> >
> > On Mon, Apr 14, 2014 at 2:41 PM, Ted Yu <[email protected]> wrote:
> >
> > > There is on-going effort to address this issue.
> > >
> > > See the following:
> > > HBASE-8610 Introduce interfaces to support MultiWAL
> > > HBASE-10378 Divide HLog interface into User and Implementor specific
> > > interfaces
> > >
> > > Cheers
> > >
> > >
> > > On Mon, Apr 14, 2014 at 1:47 PM, Claudiu Soroiu <[email protected]>
> > wrote:
> > >
> > > > Hi all,
> > > >
> > > > My name is Claudiu Soroiu and I am new to hbase/hadoop but not new to
> > > > distributed computing in FT/HA environments and I see there are a lot
> > of
> > > > issues reported related to the region server failure.
> > > >
> > > > The main problem I see it is related to recovery time in case of a
> node
> > > > failure and distributed log splitting. After some tunning I managed
> to
> > > > reduce it to 8 seconds in total and for the moment it fits the needs.
> > > >
> > > > I have one question: *Why there is only one WAL file per region
> server
> > > and
> > > > not one WAL per region itself? *
> > > > I haven't found the exact answer anywhere, that's why i'm asking on
> > this
> > > > list and please point me to the right direction if i missed the list.
> > > >
> > > > My point is that eliminating the need of splitting a log in case of
> > > failure
> > > > reduces the downtime for the regions and the only delay that we will
> > see
> > > > will be related to transferring data over network to the region
> servers
> > > > that will take over the failed regions.
> > > > This is feasible only if having multiple WAL's per Region Server does
> > not
> > > > affect the overall write performance.
> > > >
> > > > Thanks,
> > > > Claudiu
> > > >
> > >
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: HBase region server failure issues

Reply via email to