[ 
https://issues.apache.org/jira/browse/HIVE-22942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055864#comment-17055864
 ] 

Zoltan Haindrich commented on HIVE-22942:
-----------------------------------------

Hey All,

I think the best would be to replace the ptest thing with something else - 
which is not maintained by the Hive community; moving to junit5 would be cool; 
but it might be challenging to do...the arallel execution of tests within the 
same machine tend to uncover further issues when we don't expect 2 pieces of 
the same kind of test to be executed at the same time...and I don't think we 
can have a single machine to execute all of them in one place - I think running 
batches in isolated environment on 1 thread might be more robust - and 
reliable; so that we can actually will be able to repro the issue.

I've opened a PR with a working prototype; it isn't complete - but it's able to 
do the following:
* builds upon some jenkins plugins; and the job itself is defined as a 
Jenkinsfile
* uses docker images executed on a kubernetes cluster to provide 
reproducibility - so anyone will be more likely to be able to repro runs of the 
tests by using docker
* to make the parrallel test executor plugin "happy" - I needed to find a way 
to reduce the max testclass execution time belove ~30 minutest
** as a first approach I went on and analyzed test execution times based on the 
actual testcase times....its possible; but defining the ranges and maintaining 
them long term might be intersting at least
**  then I compared how "well" a naive approach would compare...and I concluded 
that going over twice as many splits the result is acceptable....so I went this 
way its a cleaner way to do it..
** I wanted to not disrupt existing usages of testing so I came up with the 
following way to declare further classes for qtest over 30minutes ; let's go 
with TestCliDriver for now:
*** in case a special flag is enables (qsplits) the TestCliDriver is split into 
a number of parts; the "split" classes are differ only in the package name; so 
a "-Dtest=TestCliDriver" will still work to run the testcase
*** there is some shell script / java reflection stuff which actually does the 
splitting of the test parameter list into smaller pieces

currently I think the replacement layout will be:
* a kubernetes cluster somewhere (gce/gke) 
* a jenkins running inside the kubernetes cluster
* a local artifact caching instance is added to reduce outside comm
* it would be easier to tie the job into github PRs and live with that instead 
retaining the run-a-patch approach
* as for running multiple ptest; it will be easily possible as the limit will 
be the number of pods the jenkins may launch; 

things that are still need investigations/etc:
* there are a bunch of failing tests ... I guess most of them has some env 
issue in the background
* there should be a timeout on executing a set of tests; the ptest env uses a 
"timeout" on the maven command - I can just throw in the timeout plugin; but 
timeouts should be fixed....they are a sign of big problems like deadlocks/etc
* no support for "isolated" tests - this should be rethinked


> Replace PTest with an alternative
> ---------------------------------
>
>                 Key: HIVE-22942
>                 URL: https://issues.apache.org/jira/browse/HIVE-22942
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Zoltan Haindrich
>            Assignee: Zoltan Haindrich
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> I never opened a jira about this...but it might actually help collect ideas 
> and actually start going somewhere sooner than later :D
> Right now we maintain the ptest2 project inside Hive to be able to run Hive 
> tests in a distributed fashion...the backstab of this solution is that we are 
> putting much effort into maintaining a distributed test execution framework...
> I think it would be better if we could find an off the shelf solution for the 
> task and migrate to that instead of putting more efforts into the ptest 
> framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to