[
https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14236540#comment-14236540
]
Nicholas Chammas commented on SPARK-3431:
-----------------------------------------
Here's an example failure I don't understand.
I fire up {{sbt/sbt}} with {{SparkBuild.scala}} at [this
version|https://github.com/nchammas/spark/blob/ab127b798dbfa9399833d546e627f9651b060918/project/SparkBuild.scala]:
{code}
def groupBySuite(tests: Seq[TestDefinition], javaOptions: Seq[String]) = {
tests groupBy (_.name.split('.').slice(0,4).mkString(".")) map {
case (suite, tests) =>
new Group(
name = suite,
tests = tests,
// runPolicy = Tests.InProcess)
runPolicy = SubProcess(javaOptions = javaOptions))
} toSeq
}
<snipped>
testGrouping in Test <<= (definedTests in Test, javaOptions in Test) map
groupBySuite,
{code}
Then I run this at the SBT prompt:
{code}
testOnly org.apache.spark.sql.hive.execution.HiveQuerySuite
{code}
I get a lot of errors, but this one stands out:
{code}
21:53:56.662 WARN org.apache.spark.sql.hive.execution.HiveQuerySuite: Running
query 1/1 with hive.
java.io.IOException: Cannot run program "/usr/bin/hadoop" (in directory
"/path/to/my/copy/of/spark"): error=2, No such file or directory
{code}
If I comment out [the {{testGrouping in Test}}
line|https://github.com/nchammas/spark/blob/ab127b798dbfa9399833d546e627f9651b060918/project/SparkBuild.scala#L429],
the test runs fine.
So it smells like the forked JVMs are somehow not getting passed the
[configured
paths|https://github.com/nchammas/spark/blob/ab127b798dbfa9399833d546e627f9651b060918/project/SparkBuild.scala#L403-L418]
or something. There are some related posts about this [on Stack
Overflow|http://stackoverflow.com/questions/18002205/sbt-test-only-not-picking-up-jvm-option-when-forking-a-jvm-for-tests]
and [SBT's issue tracker|https://github.com/sbt/sbt/issues/975].
I'm not sure how to proceed with SBT, or whether I've identified a legitimate
blocker or not. I may just move on to Maven unless I make some kind of
breakthrough. Any pointers would be appreciated.
> Parallelize execution of tests
> ------------------------------
>
> Key: SPARK-3431
> URL: https://issues.apache.org/jira/browse/SPARK-3431
> Project: Spark
> Issue Type: Improvement
> Components: Build
> Reporter: Nicholas Chammas
>
> Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common
> strategy to cut test time down is to parallelize the execution of the tests.
> Doing that may in turn require some prerequisite changes to be made to how
> certain tests run.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]