[
https://issues.apache.org/jira/browse/HADOOP-11984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14548455#comment-14548455
]
Allen Wittenauer commented on HADOOP-11984:
-------------------------------------------
bq. It seems to me that it is more of an apple vs orange comparison – more
importantly, does the time parsing TEST-*xml (which takes seconds at maximum)
actually matter, give the fact that in general Jenkins spends 15 mins to build
the trunk, and ~2 hours to run the tests?
~2 hours only for HDFS. The next closest (IIRC) is mapreduce-jobclient which
comes in at 20 minutes. Perhaps the HDFS folks should take a serious look at
re-arranging the universe, not running integration tests in unit tests, start
paying attention to the nightly build, etc.
bq. Popping up one level – it looks like you have some concerns on moving
test-patch to other scripting languages that have more choices of libraries.
deadhorse.gif
Python, ruby, etc, all suffer from the same problem: which version do you
target to get the maximum amount of coverage? test-patch, like the user-client
code, MUST be able to run in a variety of hostile environments. (No, Mac OS X
and Linux are NOT good enough.) python, frankly, sucks at that because the API
is continually evolving in incompatible ways.(*) ... and that's before we even
get into the morass of add-ons. And python 3.x.
FWIW, the *only* big portability problem with the current version of
test-patch.sh that I'm aware of is one usage of GNU diff because I was too lazy
to write more complex awk to work around it. Otherwise, it's all POSIX+bash
3.x and should run even on fairly ancient systems unchanged! The outlook for
*forward* compatibility, as a result, is extremely good. It's pretty much
impossible to do that with most other language choices (including, ironically,
Java).... except maybe one:
If I had my way, I'd have written this in perl 5. It's a significantly better
choice for the things we need to do here (text processing! OS manipulation!)
and it's compatibility across versions deployed with every relatively modern OS
that I'm aware of is extremely high. But we don't do perl, have a small
tolerance for python, and the rest is in bash. So given those choices, it was
an easy one to make.
bq. I'm wondering whether there are anythings can be done to improve the
maintainability and reduce the bars of getting involved (e.g., reusing
libraries from other scripting languages) in the longer term.
There are plenty of people who are fully competent to write decent bash. We
just don't invite them into the Hadoop tent. The number of people contributing
to the parts that I've rewritten have gone up SIGNIFICANTLY because people who
have these skills realize that someone is paying attention. As a side note, I
personally think it's great if the Java folks feel uncomfortable that code that
they don't understand is in the system.
(*) - while working on releasedocmaker, I heard two conflicting things: "that
API is deprecated you should use xyz" and "oh, make sure this works with python
vx.x". Guess what? I can't use the non-deprecated API in vx.x. So deprecated
APIs here we come, which now means I'm continually answering the question of
"why does this code use method y?".
> Enable parallel JUnit tests in pre-commit.
> ------------------------------------------
>
> Key: HADOOP-11984
> URL: https://issues.apache.org/jira/browse/HADOOP-11984
> Project: Hadoop Common
> Issue Type: Improvement
> Components: scripts
> Reporter: Chris Nauroth
> Assignee: Chris Nauroth
> Attachments: HADOOP-11984.001.patch, HADOOP-11984.002.patch,
> HADOOP-11984.003.patch, HADOOP-11984.004.patch
>
>
> HADOOP-9287 and related issues implemented the parallel-tests Maven profile
> for running JUnit tests in multiple concurrent processes. This issue
> proposes to activate that profile during pre-commit to speed up execution.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)