[ 
https://issues.apache.org/jira/browse/SPARK-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964791#comment-13964791
 ] 

Michael Armbrust commented on SPARK-1455:
-----------------------------------------

This is a great idea.  One possible modification: If there are changes in Spark 
Core but not in Spark SQL, it is probably safe to skip running some of the more 
expensive test cases, specifically all of the hive compatibility query tests.  
Since all the query operators just use mapPartitions, I think the other query 
tests would find changes to Spark core that are going to break Spark SQL.

> Determine which test suites to run based on code changes
> --------------------------------------------------------
>
>                 Key: SPARK-1455
>                 URL: https://issues.apache.org/jira/browse/SPARK-1455
>             Project: Spark
>          Issue Type: Improvement
>          Components: Project Infra
>            Reporter: Patrick Wendell
>             Fix For: 1.1.0
>
>
> Right now we run the entire set of tests for every change. This means the 
> tests take a long time. Our pull request builder checks out the merge branch 
> from git, so we could do a diff and figure out what source files were 
> changed, and run a more isolated set of tests. We should just run tests in a 
> way that reflects the inter-dependencies of the project. E.g:
> - If Spark core is modified, we should run all tests
> - If just SQL is modified, we should run only the SQL tests
> - If just Streaming is modified, we should run only the streaming tests
> - If just Pyspark is modified, we only run the PySpark tests.
> And so on. I think this would reduce the RTT of the tests a lot and it should 
> be pretty easy to accomplish with some scripting foo.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to