Re: [DISCUSS] Distributed Log Replay in branch-1

Ted Yu Wed, 08 Jul 2015 18:06:57 -0700

bq. set up at least Jenkins based integration tests that exercise the code
paths


+1

On Wed, Jul 8, 2015 at 5:45 PM, Andrew Purtell <[email protected]> wrote:

> I think we should leave it enabled in master and set up at least Jenkins
> based integration tests that exercise the code paths. Otherwise, we might
> be better off removing the DLR code rather than have it rot in place.
>
> On Wed, Jul 8, 2015 at 11:48 AM, Enis Söztutar <[email protected]> wrote:
>
> > On Wed, Jul 8, 2015 at 10:23 AM, Stack <[email protected]> wrote:
> >
> > > On Wed, Jul 8, 2015 at 7:53 AM, Sean Busbey <[email protected]>
> wrote:
> > >
> > > > Hi Folks!
> > > >
> > > > For the 1.2 release, I think the consensus is to disable Distributed
> > Log
> > > > Replay by default due to lack of sufficient testing. At least, that's
> > the
> > > > only feedback I've heard so far. :)
> > > >
> > >
> > >
> > > > Anyone object to that?
> > > >
> > > >
> > > I've been trying it over the last few days. It is easy enough to lose
> > > data: HBASE-14028.  It is a bit tough tracing how the loss is happening
> > > given more moving parts and that it seems few have treaded this route
> > > previously; breadcrumbs are sparse (fixing).
> > >
> > > I'll keep at this until DLR in 1.2 is for sure a lost cause.
> > >
> > > On DLR:
> > >
> > > + DLR is a little more involved than DLS -- which is already tough
> enough
> > > to follow. It might be best to just punt and come back here after
> assign
> > > has been redone (and simplified) on top of pv2; hbase-2.0.0?
> > >
> >
> > Agreed. DLR is a very good idea, but unfortunately has not stabilized
> > enough.  The recovery semantics, zk interactions, assignment, etc make it
> > very complex to understand and operate. I would vote for not doing any
> more
> > work on this side unless we have solved the assignment process. The other
> > problem is that we cannot have only DLR since if the table is offline DLS
> > is needed, which forces us to maintain and test two different subsystems.
> > In the long term,  we should be shooting for a simplified solution.
> >
> > Let's disable in master as well. Once / if we have better testing we can
> > always re-enable it.
> >
> > Enis
> >
> >
> > > + It can actually make for a worse MTTR as it does not do re-lookups
> > during
> > > replay of a WAL if the target server crashes during DLR; the whole WAL
> > > replay must timeout before we'll go re-find the new location (30
> seconds
> > at
> > > least).
> > >
> > > St.Ack
> > >
> > >
> > > Presuming no one does,what do folks think about just disabling it by
> > > > default in the current branch-1?
> > > >
> > >
> > >
> > > > That isn't to say it couldn't switch to on-by-default at a latter 1.y
> > > > release. It's just that we had to turn it off right before the 1.1
> > > releases
> > > > as well, and I'd prefer we avoid these last minute changes in favor
> of
> > > > waiting until someone has the time to prioritize thorough testing.
> > > >
> > > >
> > >
> > >
> > >
> > > > --
> > > > Sean
> > > >
> > >
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Re: [DISCUSS] Distributed Log Replay in branch-1

Reply via email to