[ 
https://issues.apache.org/jira/browse/SPARK-14765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-14765:
----------------------------------
    Description: 
h4. Background
Yesterday, when I exposed a function to Python/R, only Python and R module are 
changed. At the first successful build, it took 25 minutes, but the next one 
took 125 minutes.

* [25 
minutes|https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56291/consoleFull]
{code}
[info] Found the following changed modules: pyspark-sql, sparkr
{code}

* [125 
minutes|https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56298/]
{code}
[info] Found the following changed modules: pyspark-sql, sparkr, mllib, hive, 
sql, root
{code}

h4. Problem
`identify_changed_files_from_git_commits` function in `run-tests.py` simply 
runs `git diff` to the master. It means **Newly updated master code** also 
considered as a changed file in the PR.

h4. How to fix
Add `git rebase` command before running `git diff`.

  was:
h5. Background
Yesterday, when I exposed a function to Python/R, only Python and R module are 
changed. At the first successful build, it took 25 minutes, but the next one 
took 125 minutes.

* [25 
minutes|https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56291/consoleFull]
{code}
[info] Found the following changed modules: pyspark-sql, sparkr
{code}

* [125 
minutes|https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56298/]
{code}
[info] Found the following changed modules: pyspark-sql, sparkr, mllib, hive, 
sql, root
{code}

h5. Problem
`identify_changed_files_from_git_commits` function in `run-tests.py` simply 
runs `git diff` to the master. It means **Newly updated master code** also 
considered as a changed file in the PR.

h5. How to fix
Add `git rebase` command before running `git diff`.


> Jenkins should run tests only based on the PR contents
> ------------------------------------------------------
>
>                 Key: SPARK-14765
>                 URL: https://issues.apache.org/jira/browse/SPARK-14765
>             Project: Spark
>          Issue Type: Bug
>          Components: Project Infra
>            Reporter: Dongjoon Hyun
>
> h4. Background
> Yesterday, when I exposed a function to Python/R, only Python and R module 
> are changed. At the first successful build, it took 25 minutes, but the next 
> one took 125 minutes.
> * [25 
> minutes|https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56291/consoleFull]
> {code}
> [info] Found the following changed modules: pyspark-sql, sparkr
> {code}
> * [125 
> minutes|https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56298/]
> {code}
> [info] Found the following changed modules: pyspark-sql, sparkr, mllib, hive, 
> sql, root
> {code}
> h4. Problem
> `identify_changed_files_from_git_commits` function in `run-tests.py` simply 
> runs `git diff` to the master. It means **Newly updated master code** also 
> considered as a changed file in the PR.
> h4. How to fix
> Add `git rebase` command before running `git diff`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to