I think we should leave it enabled in master and set up at least Jenkins
based integration tests that exercise the code paths. Otherwise, we might
be better off removing the DLR code rather than have it rot in place.

On Wed, Jul 8, 2015 at 11:48 AM, Enis Söztutar <[email protected]> wrote:

> On Wed, Jul 8, 2015 at 10:23 AM, Stack <[email protected]> wrote:
>
> > On Wed, Jul 8, 2015 at 7:53 AM, Sean Busbey <[email protected]> wrote:
> >
> > > Hi Folks!
> > >
> > > For the 1.2 release, I think the consensus is to disable Distributed
> Log
> > > Replay by default due to lack of sufficient testing. At least, that's
> the
> > > only feedback I've heard so far. :)
> > >
> >
> >
> > > Anyone object to that?
> > >
> > >
> > I've been trying it over the last few days. It is easy enough to lose
> > data: HBASE-14028.  It is a bit tough tracing how the loss is happening
> > given more moving parts and that it seems few have treaded this route
> > previously; breadcrumbs are sparse (fixing).
> >
> > I'll keep at this until DLR in 1.2 is for sure a lost cause.
> >
> > On DLR:
> >
> > + DLR is a little more involved than DLS -- which is already tough enough
> > to follow. It might be best to just punt and come back here after assign
> > has been redone (and simplified) on top of pv2; hbase-2.0.0?
> >
>
> Agreed. DLR is a very good idea, but unfortunately has not stabilized
> enough.  The recovery semantics, zk interactions, assignment, etc make it
> very complex to understand and operate. I would vote for not doing any more
> work on this side unless we have solved the assignment process. The other
> problem is that we cannot have only DLR since if the table is offline DLS
> is needed, which forces us to maintain and test two different subsystems.
> In the long term,  we should be shooting for a simplified solution.
>
> Let's disable in master as well. Once / if we have better testing we can
> always re-enable it.
>
> Enis
>
>
> > + It can actually make for a worse MTTR as it does not do re-lookups
> during
> > replay of a WAL if the target server crashes during DLR; the whole WAL
> > replay must timeout before we'll go re-find the new location (30 seconds
> at
> > least).
> >
> > St.Ack
> >
> >
> > Presuming no one does,what do folks think about just disabling it by
> > > default in the current branch-1?
> > >
> >
> >
> > > That isn't to say it couldn't switch to on-by-default at a latter 1.y
> > > release. It's just that we had to turn it off right before the 1.1
> > releases
> > > as well, and I'd prefer we avoid these last minute changes in favor of
> > > waiting until someone has the time to prioritize thorough testing.
> > >
> > >
> >
> >
> >
> > > --
> > > Sean
> > >
> >
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Reply via email to