This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new ded8cdf8d945 [SPARK-47367][PYTHON][CONNECT][TESTS][FOLLOW-UP] Recover
the test case for the number of partitions
ded8cdf8d945 is described below
commit ded8cdf8d9459e0e5b73c01c8ee41ae54ccd7ac5
Author: Hyukjin Kwon <[email protected]>
AuthorDate: Tue Mar 26 07:35:49 2024 -0700
[SPARK-47367][PYTHON][CONNECT][TESTS][FOLLOW-UP] Recover the test case for
the number of partitions
### What changes were proposed in this pull request?
This PR is a followup of https://github.com/apache/spark/pull/45486 that
addresses https://github.com/apache/spark/pull/45486#discussion_r1538753052
review comment to recover the test coverage related to the number of partitions
in Python Data Source.
### Why are the changes needed?
To restore the test coverage.
### Does this PR introduce _any_ user-facing change?
No, test-only.
### How was this patch tested?
Unittest fixed, CI in this PR should verify it.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #45720 from HyukjinKwon/SPARK-47367-folliwup.
Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
python/pyspark/sql/tests/test_python_datasource.py | 3 +++
1 file changed, 3 insertions(+)
diff --git a/python/pyspark/sql/tests/test_python_datasource.py
b/python/pyspark/sql/tests/test_python_datasource.py
index f69e1dee1285..d028a210b007 100644
--- a/python/pyspark/sql/tests/test_python_datasource.py
+++ b/python/pyspark/sql/tests/test_python_datasource.py
@@ -28,6 +28,7 @@ from pyspark.sql.datasource import (
WriterCommitMessage,
CaseInsensitiveDict,
)
+from pyspark.sql.functions import spark_partition_id
from pyspark.sql.types import Row, StructType
from pyspark.testing.sqlutils import (
have_pyarrow,
@@ -236,10 +237,12 @@ class BasePythonDataSourceTestsMixin:
self.spark.dataSource.register(InMemoryDataSource)
df = self.spark.read.format("memory").load()
+ self.assertEqual(df.select(spark_partition_id()).distinct().count(), 3)
assertDataFrameEqual(df, [Row(x=0, y="0"), Row(x=1, y="1"), Row(x=2,
y="2")])
df = self.spark.read.format("memory").option("num_partitions",
2).load()
assertDataFrameEqual(df, [Row(x=0, y="0"), Row(x=1, y="1")])
+ self.assertEqual(df.select(spark_partition_id()).distinct().count(), 2)
def _get_test_json_data_source(self):
import json
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]