[
https://issues.apache.org/jira/browse/HIVE-16749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16066649#comment-16066649
]
Allen Wittenauer commented on HIVE-16749:
-----------------------------------------
FYI, typically people let Yetus execute the docker commands itself. Two big
reasons for this:
* yetus will run docker build which means that the dockerfile can be modified
on the fly as necessary. yetus will detect if the dockerfile has changed and
rebuild as necessary--including as part of the patch being tested!
* the patchdir and basedir will be available after the container exists, which
means logs and such are available post-build. This is very useful to have
access to in case of failures.
Anyway, cutting back the extra bits, given a directory structure of:
artifacts dir for Jenkins here: ${WORKSPACE}/artifacts
git checkout to here: ${WORKSPACE}/source
You just need the following extra lines on the command line:
{code}
--patch-dir=${WORKSPACE}/artifacts \
--basedir=${WORKSPACE}/source \
--docker \
--dockerfile=${WORKSPACE}/dev-support/docker/Dockerfile \
{code}
Be aware that the Dockerfile needs to have *everything* that Yetus will need to
do it's work. e.g., if the pylint test is enabled, then python with all the
pre-req pylint eggs needs be installed too. You can see the default/example one
that Yetus uses here:
https://github.com/apache/yetus/blob/405cd9fa6e4f6240690bbba1bad6d054a4241214/precommit/test-patch-docker/Dockerfile
If you have an existing Dockerfile that has some extra stuff you don't want
executed as part of the Yetus run, if you can separate that out to the bottom
of the file, you can use it too. See Hadoop's as an example:
https://github.com/apache/hadoop/blob/ee243e5289212aa2912d191035802ea023367e19/dev-support/docker/Dockerfile
The {{{# YETUS CUT HERE}}} line acts as a guard.
I also HIGHLY recommend using the {{{--mvn-custom-repos}}} and {{{--jenkins}}}
where more than one maven run is happening on a Jenkins instance. Maven does
*zero* locking of its cache, which means that simultaneous runs will stomp all
over each other and result in wildly inaccurate results. Those flags will
guarantee on Jenkins that different executors will use different .m2 caches for
themselves as well for different branches. The very first run on a node will
take a while as it does the mass download, but after that it's pretty quick.
We saw significant unit test failure counts drop after doing that in Hadoop.
One other thing: you don't need to run patch. You can monkey patch individual
functions inside the hive personality file. It's loaded last which means it
can overwrite other functions... :)
> Run YETUS in Docker container
> -----------------------------
>
> Key: HIVE-16749
> URL: https://issues.apache.org/jira/browse/HIVE-16749
> Project: Hive
> Issue Type: Sub-task
> Reporter: Peter Vary
> Assignee: Zoltan Haindrich
> Attachments: HIVE-16749.1.patch
>
>
> Think about the pros and cons of running YETUS in a docker container:
> - Resources
> - Usage complexity
> - Yetus version changes
> - Findbugs
> - etc.
> If worthwhile run YETUS in a docker container
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)