Re: HBase region server failure issues

Todd Lipcon Mon, 14 Apr 2014 18:22:27 -0700

On the other hand, 95% of HBase users don't actually configure HDFS to
fsync() every edit. Given that, the random writes aren't actually going to
cause one seek per write -- they'll get buffered up and written back
periodically in a much more efficient fashion.


Plus, in some small number of years, I believe SSDs will be available on
most server machines (in a hybrid configuration) so the seeks will cost
less even with fsync on.

-Todd


On Mon, Apr 14, 2014 at 4:54 PM, Vladimir Rodionov
<[email protected]>wrote:

> I do not think its a good idea to have one WAL file per region. All WAL
> file idea is based on assumption that  writing data sequentially reduces
> average latency and increases total throughput. This is no longer a case in
> a one WAL file per region approach, you may have hundreds active regions
> per RS and all sequential writes become random ones and random IO for
> rotational media is very bad, very bad.
>
> -Vladimir Rodionov
>
>
>
> On Mon, Apr 14, 2014 at 2:41 PM, Ted Yu <[email protected]> wrote:
>
> > There is on-going effort to address this issue.
> >
> > See the following:
> > HBASE-8610 Introduce interfaces to support MultiWAL
> > HBASE-10378 Divide HLog interface into User and Implementor specific
> > interfaces
> >
> > Cheers
> >
> >
> > On Mon, Apr 14, 2014 at 1:47 PM, Claudiu Soroiu <[email protected]>
> wrote:
> >
> > > Hi all,
> > >
> > > My name is Claudiu Soroiu and I am new to hbase/hadoop but not new to
> > > distributed computing in FT/HA environments and I see there are a lot
> of
> > > issues reported related to the region server failure.
> > >
> > > The main problem I see it is related to recovery time in case of a node
> > > failure and distributed log splitting. After some tunning I managed to
> > > reduce it to 8 seconds in total and for the moment it fits the needs.
> > >
> > > I have one question: *Why there is only one WAL file per region server
> > and
> > > not one WAL per region itself? *
> > > I haven't found the exact answer anywhere, that's why i'm asking on
> this
> > > list and please point me to the right direction if i missed the list.
> > >
> > > My point is that eliminating the need of splitting a log in case of
> > failure
> > > reduces the downtime for the regions and the only delay that we will
> see
> > > will be related to transferring data over network to the region servers
> > > that will take over the failed regions.
> > > This is feasible only if having multiple WAL's per Region Server does
> not
> > > affect the overall write performance.
> > >
> > > Thanks,
> > > Claudiu
> > >
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: HBase region server failure issues

Reply via email to