fwiw, when we did this work in HBase, we categorized the tests. Then some
tests can share a single jvm, while some others need to be isolated in
their own jvm. Nevertheless surefire can still run them in parallel by
starting/stopping several jvm.

I think we need to do this as well. Perhaps the test naming hierarchy can
be used to group non-parallelizable tests in the same JVM.

For example, here are some Hive tests from our project:

org.apache.spark.sql.hive.StatisticsSuite
org.apache.spark.sql.hive.execution.HiveQuerySuite
org.apache.spark.sql.QueryTest
org.apache.spark.sql.parquet.HiveParquetSuite

If we group tests by the first 5 parts of their name (e.g.
org.apache.spark.sql.hive), then we’d have the first 2 tests run in the
same JVM, and the next 2 tests each run in their own JVM.

I’m new to this stuff so I’m not sure if I’m going about this in the right
way, but you can see my attempt with this approach on GitHub
<https://github.com/nchammas/spark/blob/ab127b798dbfa9399833d546e627f9651b060918/project/SparkBuild.scala#L388-L397>,
as well as the related discussion on JIRA
<https://issues.apache.org/jira/browse/SPARK-3431>.

If anyone has more feedback on this, I’d love to hear it (either on this
thread or in the JIRA issue).

Nick
​

On Sun Sep 07 2014 at 8:28:51 PM Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:

> On Fri, Aug 8, 2014 at 1:12 PM, Reynold Xin <r...@databricks.com> wrote:
>
>> Nick,
>>
>> Would you like to file a ticket to track this?
>>
>
> SPARK-3431 <https://issues.apache.org/jira/browse/SPARK-3431>:
> Parallelize execution of tests
> > Sub-task: SPARK-3432 <https://issues.apache.org/jira/browse/SPARK-3432>:
> Fix logging of unit test execution time
>
> Nick
>

Reply via email to