Our (unwritten) rule has been that a commit cannot even go in unless unit _and_ regression tests pass. Releases are stricter, all tests, longevity tests, UI, are required to pass. In addition, any performance regression needs to be discussed.
So far we have not made any exceptions, but that is not to say we cannot. On Fri, Jul 13, 2018 at 1:03 PM, Vlad Rozov <[email protected]> wrote: > My 2 cents: > > From Apache point of view it is OK to do a release even if unit tests do > not pass at all or there is a large number of regression introduced. Apache > release is a source release and as long as it compiles and does not have > license issues, it is up to community (PMC) to decide on any other criteria > for a release. > > The issue in DRILL-6453 is not limited to a large number of hash joins. It > should be possible to reproduce it even with a single hash join as long as > left and right sides are getting batches from one(many) to many exchanges > (broadcast or hash partitioner senders). > > Thank you, > > Vlad > > > On 7/13/18 08:41, Aman Sinha wrote: > >> I would say we have to take a measured approach to this and decide on a >> case-by-case which issue is a show stopper. >> While of course we have to make every effort to avoid regression, we >> cannot >> claim that a particular release will not cause any regression. >> I believe there are 10000+ passing tests, so that should provide a level >> of confidence. The TPC-DS 72 is a 10 table join which in the hadoop >> world >> of >> denormalized schemas is not relatively common. The main question is does >> the issue reproduce with fewer joins having the same type of distribution >> plan ? >> >> >> Aman >> >> On Fri, Jul 13, 2018 at 7:36 AM Arina Yelchiyeva < >> [email protected]> >> wrote: >> >> We cannot release with existing regressions, especially taking into >>> account >>> the there are not minor issues. >>> As far as I understand reverting is not an option since hash join spill >>> feature are extended into several commits + subsequent fixes. >>> I guess we need to consider postponing the release until issues are >>> resolved. >>> >>> Kind regards, >>> Arina >>> >>> On Fri, Jul 13, 2018 at 5:14 PM Boaz Ben-Zvi <[email protected]> wrote: >>> >>> (Guessing ...) It is possible that the root cause for DRILL-6606 is >>>> similar to that in DRILL-6453 -- that is the new "early sniffing" in >>>> the >>>> Hash-Join, which repeatedly invokes next() on the two "children" of the >>>> join *during schema discovery* until non-empty data is returned (or >>>> NONE, >>>> STOP, etc). Last night Salim, Vlad and I briefly discussed >>>> alternatives, >>>> like postponing the "sniffing" to a later time (beginning of the build >>>> >>> for >>> >>>> the right child, and beginning of the probe for the left child). >>>> >>>> However this would require some work time. So what should we do about >>>> >>> 1.14 >>> >>>> ? >>>> >>>> Thanks, >>>> >>>> Boaz >>>> >>>> On Fri, Jul 13, 2018 at 3:46 AM, Arina Yelchiyeva < >>>> [email protected]> wrote: >>>> >>>> During implementing late limit 0 optimization, Bohdan has found one more >>>>> regression after Hash Join spill to disk. >>>>> https://issues.apache.org/jira/browse/DRILL-6606 >>>>> < >>>>> >>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues. >>> apache.org_jira_browse_DRILL-2D6606&d=DwMFaQ&c=cskdkSMqhcnjZ >>> xdQVpwTXg&r=7lXQnf0aC8VQ0iMXwVgNHw&m=OHnyHeZpNk3hcwkG-JoQG6E >>> 90tKdoS47J1rv5x-hJzw&s=wm5zpJf9K2zYzrqRB1LqLpKcvmBK5y6XC0ZUqVmSjko&e= >>> >>>> Boaz please take a look. >>>>> >>>>> Kind regards, >>>>> Arina >>>>> >>>>> >>>> >
