This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new b9ff737e3d6 [SPARK-41553][PS][PYTHON][CORE] Fix the documentation for
`num_files`
b9ff737e3d6 is described below
commit b9ff737e3d68bcc9e788185eab8e320c1742ca41
Author: Bjørn <[email protected]>
AuthorDate: Sat Dec 31 10:26:07 2022 +0900
[SPARK-41553][PS][PYTHON][CORE] Fix the documentation for `num_files`
### What changes were proposed in this pull request?
"num_files: the number of files to be written ..."
`files` is changed to `partitions`
"num_files: the number of partitions to be written ..."
And there is added a text "This is deprecated. Use
`DataFrame.spark.repartition` instead."
With this PR users can find the right documentation before they get an
error message that they should use something other than what they have used.
### Why are the changes needed?
`num_files` has been deprecated and might be removed in a future version. "
"Use `DataFrame.spark.repartition` instead.",
The num_files argument doesn't manage the number of files, but specifying
the partition number. https://github.com/apache/spark/pull/33379
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass GA
Closes #39098 from bjornjorgensen/num_files-to-repartition.
Lead-authored-by: Bjørn <[email protected]>
Co-authored-by: Bjørn Jørgensen
<[email protected]>
Co-authored-by: Bjørn Jørgensen <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
---
python/pyspark/pandas/generic.py | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/python/pyspark/pandas/generic.py b/python/pyspark/pandas/generic.py
index c2a32f37767..9f218ac9a9f 100644
--- a/python/pyspark/pandas/generic.py
+++ b/python/pyspark/pandas/generic.py
@@ -671,8 +671,9 @@ class Frame(object, metaclass=ABCMeta):
.. note:: pandas-on-Spark writes CSV files into the directory, `path`,
and writes
multiple `part-...` files in the directory when `path` is
specified.
- This behavior was inherited from Apache Spark. The number of files
can
- be controlled by `num_files`.
+ This behavior was inherited from Apache Spark. The number of
partitions can
+ be controlled by `num_files`. This is deprecated.
+ Use `DataFrame.spark.repartition` instead.
Parameters
----------
@@ -694,8 +695,8 @@ class Frame(object, metaclass=ABCMeta):
escapechar: str, default None
String of length 1. Character used to escape `sep` and `quotechar`
when appropriate.
- num_files: the number of files to be written in `path` directory when
- this is a path.
+ num_files: the number of partitions to be written in `path` directory
when
+ this is a path. This is deprecated. Use
`DataFrame.spark.repartition` instead.
mode: str
Python write mode, default 'w'.
@@ -900,8 +901,9 @@ class Frame(object, metaclass=ABCMeta):
.. note:: pandas-on-Spark writes JSON files into the directory,
`path`, and writes
multiple `part-...` files in the directory when `path` is
specified.
- This behavior was inherited from Apache Spark. The number of files
can
- be controlled by `num_files`.
+ This behavior was inherited from Apache Spark. The number of
partitions can
+ be controlled by `num_files`. This is deprecated.
+ Use `DataFrame.spark.repartition` instead.
.. note:: output JSON format is different from pandas'. It always uses
`orient='records'`
for its output. This behavior might have to change soon.
@@ -927,8 +929,8 @@ class Frame(object, metaclass=ABCMeta):
A string representing the compression to use in the output file,
only used when the first argument is a filename. By default, the
compression is inferred from the filename.
- num_files: the number of files to be written in `path` directory when
- this is a path.
+ num_files: the number of partitions to be written in `path` directory
when
+ this is a path. This is deprecated. Use
`DataFrame.spark.repartition` instead.
mode: str
Python write mode, default 'w'.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]