bq. set up at least Jenkins based integration tests that exercise the code paths
+1 On Wed, Jul 8, 2015 at 5:45 PM, Andrew Purtell <[email protected]> wrote: > I think we should leave it enabled in master and set up at least Jenkins > based integration tests that exercise the code paths. Otherwise, we might > be better off removing the DLR code rather than have it rot in place. > > On Wed, Jul 8, 2015 at 11:48 AM, Enis Söztutar <[email protected]> wrote: > > > On Wed, Jul 8, 2015 at 10:23 AM, Stack <[email protected]> wrote: > > > > > On Wed, Jul 8, 2015 at 7:53 AM, Sean Busbey <[email protected]> > wrote: > > > > > > > Hi Folks! > > > > > > > > For the 1.2 release, I think the consensus is to disable Distributed > > Log > > > > Replay by default due to lack of sufficient testing. At least, that's > > the > > > > only feedback I've heard so far. :) > > > > > > > > > > > > > > Anyone object to that? > > > > > > > > > > > I've been trying it over the last few days. It is easy enough to lose > > > data: HBASE-14028. It is a bit tough tracing how the loss is happening > > > given more moving parts and that it seems few have treaded this route > > > previously; breadcrumbs are sparse (fixing). > > > > > > I'll keep at this until DLR in 1.2 is for sure a lost cause. > > > > > > On DLR: > > > > > > + DLR is a little more involved than DLS -- which is already tough > enough > > > to follow. It might be best to just punt and come back here after > assign > > > has been redone (and simplified) on top of pv2; hbase-2.0.0? > > > > > > > Agreed. DLR is a very good idea, but unfortunately has not stabilized > > enough. The recovery semantics, zk interactions, assignment, etc make it > > very complex to understand and operate. I would vote for not doing any > more > > work on this side unless we have solved the assignment process. The other > > problem is that we cannot have only DLR since if the table is offline DLS > > is needed, which forces us to maintain and test two different subsystems. > > In the long term, we should be shooting for a simplified solution. > > > > Let's disable in master as well. Once / if we have better testing we can > > always re-enable it. > > > > Enis > > > > > > > + It can actually make for a worse MTTR as it does not do re-lookups > > during > > > replay of a WAL if the target server crashes during DLR; the whole WAL > > > replay must timeout before we'll go re-find the new location (30 > seconds > > at > > > least). > > > > > > St.Ack > > > > > > > > > Presuming no one does,what do folks think about just disabling it by > > > > default in the current branch-1? > > > > > > > > > > > > > > That isn't to say it couldn't switch to on-by-default at a latter 1.y > > > > release. It's just that we had to turn it off right before the 1.1 > > > releases > > > > as well, and I'd prefer we avoid these last minute changes in favor > of > > > > waiting until someone has the time to prioritize thorough testing. > > > > > > > > > > > > > > > > > > > > > -- > > > > Sean > > > > > > > > > > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) >
