[ 
https://issues.apache.org/jira/browse/HIVE-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16464846#comment-16464846
 ] 

Alan Gates commented on HIVE-19429:
-----------------------------------

I've been working on the side on a tool to run all the Hive tests using docker. 
 You can take a look at it at [https://github.com/alanfgates/dtest]  It might 
be useful for reworking ptest or as a base for a new tool.

It works by first building a docker image from a git repo and branch.  If this 
succeeds it then runs the tests in containers.  The output is analyzed for 
failures, errors, or timeouts.  At the end the user is presented a list of 
tests that failed or resulted in an error.

Currently it uses docker directly, so it is confined to a single host.  It 
should be straight forward to rework it to use Yarn, Kubernetes, or other 
container managers so it can run in a cluster.  I've been running it on a 32 
core box with 10 simultaneous containers and it finishes in about 2 hours 20 
minutes (of which the first 20 minutes is build).

Limitations:
 * Some tests fail in it that don't fail in ptest.  So far the ones I have 
looked at fail on the box I'm using whether from the command line or in the 
container, so I do not think the failures are related to the tool.  At least 
some of these are ordering issues with queries that don't use order by.  I 
haven't examined all of them.
 * I have not analyzed whether every test run by ptest is also run by this.  
The numbers are in the ballpark.  Following the logic of ptest has been 
challenging.  It would be very nice if 'mvn install' did the right thing for 
all these tests, rather than requiring reading multiple other config files to 
figure out which qfiles to use.
 * I don't have the Spark itests running in it yet.  When I tried to run them 
before they failed.  I haven't gotten around to diagnosing the issue.
 * It doesn't clean up after itself.  It creates about 150 docker containers 
and an image for every build.  I've been leaving these around after the builds 
for debugging.  There is a separate tool (dtest-cleanup) that will clean up old 
images and containers.  Eventually this should be integrated into the tool.
 * There's also a jenkins launch script.  I have it running on an internal 
machine at Hortonworks.

Let me know if you want to use parts of this, or have me contribute it back to 
Hive in a patch.  Originally I was working on it inside Hive (as evidenced by 
the package names) but then I pulled it into a separate repo because it was 
easier than keeping it on a separate Hive branch.

> Investigate alternative technologies like docker containers to increase 
> parallelism
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-19429
>                 URL: https://issues.apache.org/jira/browse/HIVE-19429
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Vihang Karajgaonkar
>            Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to