Thanks Hao, and Adar and Mike for reviewing.  Glad we caught this issue
before the release!  I've just created an RC 3 with Hao's patch which will
run until Friday.  This vote is abandoned.

- Dan

On Mon, Sep 4, 2017 at 10:54 PM, Todd Lipcon <[email protected]> wrote:

> On Mon, Sep 4, 2017 at 10:50 PM, Hao Hao <[email protected]> wrote:
>
> > Thanks Adar, Mike a lot for the quick review! Apologize again for not
> > caught it in an earlier stage.
> >
> > @Todd, I believe this LIFO change only controls which container the
> copied
> > tablet should reside inside a single disk. Whether the tablets will
> spread
> > across
> > multiple disks is still decided by the flag
> > '--fs_target_data_dirs_per_tablet'
> > introduced in the 'TServer disk failure handling' feature. But maybe you
> > can
> > elaborate a bit more on what kind of regression you are talking about?
> >
>
> Ah, I was initially worried that it would choose to re-use an open
> container on disk A before it would use any container from disk B. I forgot
> that the disk selection behavior happens before the container selection
> behavior, so it will still randomize (or round robin) across the assigned
> data disks. Thanks for clarifying.
>
> -Todd
>
>
> > Thanks!
> >
> > Best,
> > Hao
> >
> > On Mon, Sep 4, 2017 at 10:10 PM, Todd Lipcon <[email protected]> wrote:
> >
> > > Quick question on the fix: does the lifo behavior now cause the
> entirety
> > of
> > > the copied tablet to end up in one container (hence one disk) if the
> > tablet
> > > is less than a few gb? If so that also seems like a regression to
> > consider.
> > >
> > > (Sorry for hijacking this thread rather than commenting on Gerrit. On
> my
> > > phone)
> > >
> > > Todd
> > >
> > > On Sep 4, 2017 9:21 PM, "Adar Lieber-Dembo" <[email protected]> wrote:
> > >
> > > > -1
> > > >
> > > > I agree with Hao's assessment that KUDU-2131 should be considered a
> > > > blocker for 1.5. Below is some more color:
> > > >
> > > > Before the fix for KUDU-1726, tablet copies fsynced more than
> > > > necessary but reliably kept their sessions alive by amortizing the
> > > > cost of the fsync in each downloaded block.
> > > >
> > > > With the fix, the number of fsyncs in a tablet copy is no longer
> > > > amortized to each downloaded block. In theory, this is good: by
> > > > batching up all of the fsyncs, we can find opportunities to coalesce
> > > > them, or omit all of them if the copy is aborted. But, if the number
> > > > of fsyncs turns out to be very high (because it's a large tablet and
> > > > the tserver already has a high number of containers), the
> > > > non-amortized cost can reliably cause the tablet copy session to time
> > > > out. I view this as a regression to the overall reliability of tablet
> > > > copies.
> > > >
> > > > Hao authored a fix for KUDU-2131. Mike and I reviewed it, and I just
> > > > merged it to master. I'm cherry picking it to branch-1.5.x and I
> think
> > > > we should spin up an RC3 that includes it.
> > > >
> > > >
> > > > On Fri, Sep 1, 2017 at 11:00 PM, Hao Hao <[email protected]>
> wrote:
> > > > > Unfortunately, I think I just found a blocker for 1.5 (KUDU-2131
> > > > > <https://issues.apache.org/jira/browse/KUDU-2131>).
> > > > >
> > > > > I am not sure if we should revert KUDU-1726 or do a quick fix which
> > > adds
> > > > a
> > > > > user
> > > > > facing flag to config tablet copy session expire time (such as
> > > > > --tablet_copy_idle_timeout_ms).
> > > > > Apologize for any inconvenience!
> > > > >
> > > > > Best,
> > > > > Hao
> > > > >
> > > > > On Fri, Sep 1, 2017 at 6:44 PM, Mike Percy <[email protected]>
> > wrote:
> > > > >
> > > > >> +1 on 1.5.0 RC2.
> > > > >>
> > > > >> Thanks for the quick turnaround on RC1, Dan.
> > > > >>
> > > > >> Sigs and checksums match. LICENSE and NOTICE files look good.
> > > > >>
> > > > >> I built the C++ code in RELEASE mode on Ubuntu xenial and all the
> > C++
> > > > tests
> > > > >> passed, except a couple of flaky tests that passed when I re-ran
> > them.
> > > > >>
> > > > >> I didn't run the Java or Python tests and I didn't check the Maven
> > > > >> artifacts.
> > > > >>
> > > > >> Mike
> > > > >>
> > > > >>
> > > > >> > On Fri, Sep 1, 2017 at 4:43 PM, Dan Burkert <
> > [email protected]>
> > > > >> wrote:
> > > > >> >
> > > > >> > > Hi,
> > > > >> > >
> > > > >> > > The Apache Kudu team is happy to announce the second release
> > > > candidate
> > > > >> > for
> > > > >> > > Apache Kudu 1.5.0.
> > > > >> > >
> > > > >> > > Apache Kudu 1.5.0 is a minor release which offers many
> > > improvements
> > > > and
> > > > >> > > fixes since the prior release.
> > > > >> > >
> > > > >> > > The is a source-only release. The artifacts are staged here:
> > > > >> > > https://dist.apache.org/repos/dist/dev/kudu/1.5.0-RC2/
> > > > >> > >
> > > > >> > > Java convenience binaries in the form of a Maven repository
> are
> > > > staged
> > > > >> > > here: https://repository.apache.org/content/repositories/
> > > > >> orgapachekudu-
> > > > >> > > 1014.
> > > > >> > >
> > > > >> > > It is built from this tag: https://git-wip-us.apache.org/
> > > > >> > repos/asf?p=kudu.
> > > > >> > > git;a=commit;h=9f2e3253319d7d9733cf764a63e308b91787538a
> > > > >> > >
> > > > >> > > KEYS file: http://www.apache.org/dist/kudu/KEYS
> > > > >> > >
> > > > >> > > I suggest going through the README, building Kudu, and running
> > the
> > > > unit
> > > > >> > > tests, and testing the Maven artifacts.
> > > > >> > >
> > > > >> > > The vote will run until Wednesday, September 6th at 5pm PDT.
> > > > >> > >
> > > > >> > > Thanks,
> > > > >> > > Dan
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Reply via email to