[ 
https://issues.apache.org/jira/browse/SPARK-27921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-27921:
---------------------------------
    Description: 
This JIRA targets to improve Python test coverage in particular about 
{{ExtractPythonUDFs}}.
 This rule has caused many regressions or issues such as SPARK-27803, 
SPARK-26147, SPARK-26864, SPARK-26293, SPARK-25314 and SPARK-24721.
 We should convert *.sql test cases that can be affected by this rule 
{{ExtractPythonUDFs}} like 
[https://github.com/apache/spark/blob/f5317f10b25bd193cf5026a8f4fd1cd1ded8f5b4/sql/core/src/test/resources/sql-tests/inputs/udf/udf-inner-join.sql]
 Namely most of plan related test cases might have to be converted.

*Here is the rough contribution guide to follow:*

1. Copy and paste 'xxx.sql' file into {{udf/udf-xxx.sql}}

2. Keep the comments and state that this file was copied from {{xxx.sql}}, for 
now.

3. Run it below:
{code:java}
SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *SQLQueryTestSuite -- -z 
udf/udf-xxx.sql"
git add .
{code}
4. Insert `udf(...)` into each statement. It is not required to add more 
combinations.
 And it is not strict about where to insert.

5. Run it below again:
{code:java}
SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *SQLQueryTestSuite -- -z 
udf/udf-xxx.sql"
git diff
{code}
6. Compare results with original file, {{xxx.sq}}`. If there are no notable 
diff, open a PR.

7. If there are diff, file or find the JIRA, skip the tests with comments.

8. Run without generating golden files and check:
{code:java}
build/sbt "sql/test-only *SQLQueryTestSuite -- -z udf/udf-xxx.sql"
{code}

9. _If possible_ - not required, when you open a PR. please attach {{git diff 
xxx.sql.out}} between 3. and 5. in the PR description with the template below:

{code}
<details><summary>Diff comparing to 'xxx.sql'</summary>
<p>

```diff
...  # here you put 'git diff' results
```
</p>
</details>
{code}

Note that registered UDFs all return strings - so there are some differences 
are expected.
Note that this JIRA targets plan specific cases in general.

  was:
This JIRA targets to improve Python test coverage in particular about 
{{ExtractPythonUDFs}}.
 This rule has caused many regressions or issues such as SPARK-27803, 
SPARK-26147, SPARK-26864, SPARK-26293, SPARK-25314 and SPARK-24721.
 We should convert *.sql test cases that can be affected by this rule 
{{ExtractPythonUDFs}} like 
[https://github.com/apache/spark/blob/f5317f10b25bd193cf5026a8f4fd1cd1ded8f5b4/sql/core/src/test/resources/sql-tests/inputs/udf/udf-inner-join.sql]
 Namely most of plan related test cases might have to be converted.

*Here is the rough contribution guide to follow:*

1. Copy and paste 'xxx.sql' file into {{udf/udf-xxx.sql}}

2. Keep the comments and state that this file was copied from {{xxx.sql}}, for 
now.

3. Run it below:
{code:java}
SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *SQLQueryTestSuite -- -z 
udf/udf-xxx.sql"
git add .
{code}
4. Insert `udf(...)` into each statement. It is not required to add more 
combinations.
 And it is not strict about where to insert.

5. Run it below again:
{code:java}
SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *SQLQueryTestSuite -- -z 
udf/udf-xxx.sql"
git diff
{code}
6. Compare results with original file, {{xxx.sq}}`. If there are no notable 
diff, open a PR.

7. If there are diff, file or find the JIRA, skip the tests with comments.

8. Run without generating golden files and check:
{code:java}
build/sbt "sql/test-only *SQLQueryTestSuite -- -z udf/udf-xxx.sql"
{code}

9. When you open a PR. please attach {{git diff xxx.sql.out}} between 3. and 5. 
in the PR description with the template below:

{code}
<details><summary>Diff comparing to 'xxx.sql'</summary>
<p>

```diff
...  # here you put 'git diff' results
```
</p>
</details>
{code}

Note that registered UDFs all return strings - so there are some differences 
are expected.
Note that this JIRA targets plan specific cases in general.


> Convert applicable *.sql tests into UDF integrated test base
> ------------------------------------------------------------
>
>                 Key: SPARK-27921
>                 URL: https://issues.apache.org/jira/browse/SPARK-27921
>             Project: Spark
>          Issue Type: Umbrella
>          Components: PySpark, SQL
>    Affects Versions: 3.0.0
>            Reporter: Hyukjin Kwon
>            Priority: Major
>
> This JIRA targets to improve Python test coverage in particular about 
> {{ExtractPythonUDFs}}.
>  This rule has caused many regressions or issues such as SPARK-27803, 
> SPARK-26147, SPARK-26864, SPARK-26293, SPARK-25314 and SPARK-24721.
>  We should convert *.sql test cases that can be affected by this rule 
> {{ExtractPythonUDFs}} like 
> [https://github.com/apache/spark/blob/f5317f10b25bd193cf5026a8f4fd1cd1ded8f5b4/sql/core/src/test/resources/sql-tests/inputs/udf/udf-inner-join.sql]
>  Namely most of plan related test cases might have to be converted.
> *Here is the rough contribution guide to follow:*
> 1. Copy and paste 'xxx.sql' file into {{udf/udf-xxx.sql}}
> 2. Keep the comments and state that this file was copied from {{xxx.sql}}, 
> for now.
> 3. Run it below:
> {code:java}
> SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *SQLQueryTestSuite -- 
> -z udf/udf-xxx.sql"
> git add .
> {code}
> 4. Insert `udf(...)` into each statement. It is not required to add more 
> combinations.
>  And it is not strict about where to insert.
> 5. Run it below again:
> {code:java}
> SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *SQLQueryTestSuite -- 
> -z udf/udf-xxx.sql"
> git diff
> {code}
> 6. Compare results with original file, {{xxx.sq}}`. If there are no notable 
> diff, open a PR.
> 7. If there are diff, file or find the JIRA, skip the tests with comments.
> 8. Run without generating golden files and check:
> {code:java}
> build/sbt "sql/test-only *SQLQueryTestSuite -- -z udf/udf-xxx.sql"
> {code}
> 9. _If possible_ - not required, when you open a PR. please attach {{git diff 
> xxx.sql.out}} between 3. and 5. in the PR description with the template below:
> {code}
> <details><summary>Diff comparing to 'xxx.sql'</summary>
> <p>
> ```diff
> ...  # here you put 'git diff' results
> ```
> </p>
> </details>
> {code}
> Note that registered UDFs all return strings - so there are some differences 
> are expected.
> Note that this JIRA targets plan specific cases in general.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to