[
https://issues.apache.org/jira/browse/HIVE-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16464846#comment-16464846
]
Alan Gates commented on HIVE-19429:
-----------------------------------
I've been working on the side on a tool to run all the Hive tests using docker.
You can take a look at it at [https://github.com/alanfgates/dtest] It might
be useful for reworking ptest or as a base for a new tool.
It works by first building a docker image from a git repo and branch. If this
succeeds it then runs the tests in containers. The output is analyzed for
failures, errors, or timeouts. At the end the user is presented a list of
tests that failed or resulted in an error.
Currently it uses docker directly, so it is confined to a single host. It
should be straight forward to rework it to use Yarn, Kubernetes, or other
container managers so it can run in a cluster. I've been running it on a 32
core box with 10 simultaneous containers and it finishes in about 2 hours 20
minutes (of which the first 20 minutes is build).
Limitations:
* Some tests fail in it that don't fail in ptest. So far the ones I have
looked at fail on the box I'm using whether from the command line or in the
container, so I do not think the failures are related to the tool. At least
some of these are ordering issues with queries that don't use order by. I
haven't examined all of them.
* I have not analyzed whether every test run by ptest is also run by this.
The numbers are in the ballpark. Following the logic of ptest has been
challenging. It would be very nice if 'mvn install' did the right thing for
all these tests, rather than requiring reading multiple other config files to
figure out which qfiles to use.
* I don't have the Spark itests running in it yet. When I tried to run them
before they failed. I haven't gotten around to diagnosing the issue.
* It doesn't clean up after itself. It creates about 150 docker containers
and an image for every build. I've been leaving these around after the builds
for debugging. There is a separate tool (dtest-cleanup) that will clean up old
images and containers. Eventually this should be integrated into the tool.
* There's also a jenkins launch script. I have it running on an internal
machine at Hortonworks.
Let me know if you want to use parts of this, or have me contribute it back to
Hive in a patch. Originally I was working on it inside Hive (as evidenced by
the package names) but then I pulled it into a separate repo because it was
easier than keeping it on a separate Hive branch.
> Investigate alternative technologies like docker containers to increase
> parallelism
> -----------------------------------------------------------------------------------
>
> Key: HIVE-19429
> URL: https://issues.apache.org/jira/browse/HIVE-19429
> Project: Hive
> Issue Type: Sub-task
> Reporter: Vihang Karajgaonkar
> Priority: Major
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)