correction in my email below. I meant "in my opinion it *has now* become
number one bottleneck for the project" (worst place for a typo I guess)

On Fri, May 11, 2018 at 1:46 PM, Vihang Karajgaonkar <vih...@cloudera.com>
wrote:

> +1 There are many problems with the test infrastructure and in my opinion
> it has not become number one bottleneck for the project. I was looking at
> the infrastructure yesterday and I think the current infrastructure (even
> its own set of problems) is still under-utilized. I am planning to increase
> the number of threads to process the parallel test batches to start with.
> It needs a restart on the server side. I can do it now, it folks are okay
> with it. Else I can do it over weekend when the queue is small.
>
> I listed the improvements which I thought would be useful under
> https://issues.apache.org/jira/browse/HIVE-19425 but frankly speaking I
> am not able to devote as much time as I would like to on it. I would
> appreciate if folks who have some more time if they can help out.
>
> I think to start with https://issues.apache.org/jira/browse/HIVE-19429
> will help a lot. We need to pack more test runs in parallel and containers
> provide good isolation.
>
> For the short term green runs, I think we should @Ignore the tests which
> are known to be failing since many runs. They are anyways not being
> addressed as such. If people think they are important to be run we should
> fix them and only then re-enable them.
>
> Also, I feel we need light-weight test run which we can run locally before
> submitting it for the full-suite. That way minor issues with the patch can
> be handled locally. May be create a profile which runs a subset of
> important tests which are consistent. We can apply some label that
> pre-checkin-local tests are runs successful and only then we submit for the
> full-suite.
>
> More thoughts are welcome. Thanks for starting this conversation.
>
> On Fri, May 11, 2018 at 1:27 PM, Jesus Camacho Rodriguez <
> jcama...@apache.org> wrote:
>
>> I believe we have reached a state (maybe we did reach it a while ago)
>> that is not sustainable anymore, as there are so many tests failing /
>> timing out that it is not possible to verify whether a patch is breaking
>> some critical parts of the system or not. It also seems to me that due to
>> the timeouts (maybe due to infra, maybe not), ptest runs are taking even
>> longer than usual, which in turn creates even longer queue of patches.
>>
>> There is an ongoing effort to improve ptests usability (
>> https://issues.apache.org/jira/browse/HIVE-19425), but apart from that,
>> we need to make an effort to stabilize existing tests and bring that
>> failure count to zero.
>>
>> Hence, I am suggesting *we stop committing any patch before we get a
>> green run*. If someone thinks this proposal is too radical, please come up
>> with an alternative, because I do not think it is OK to have the ptest runs
>> in their current state. Other projects of certain size (e.g., Hadoop,
>> Spark) are always green, we should be able to do the same.
>>
>> Finally, once we get to zero failures, I suggest we are less tolerant
>> with committing without getting a clean ptests run. If there is a failure,
>> we need to fix it or revert the patch that caused it, then we continue
>> developing.
>>
>> Please, let’s all work together as a community to fix this issue, that is
>> the only way to get to zero quickly.
>>
>> Thanks,
>> Jesús
>>
>> PS. I assume the flaky tests will come into the discussion. Let´s see
>> first how many of those we have, then we can work to find a fix.
>>
>>
>>
>

Reply via email to