[spark] branch master updated: [SPARK-41553][PS][PYTHON][CORE] Fix the documentation for `num_files`

gurwls223 Fri, 30 Dec 2022 17:26:29 -0800

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new b9ff737e3d6 [SPARK-41553][PS][PYTHON][CORE] Fix the documentation for 
`num_files`
b9ff737e3d6 is described below

commit b9ff737e3d68bcc9e788185eab8e320c1742ca41
Author: Bjørn <[email protected]>
AuthorDate: Sat Dec 31 10:26:07 2022 +0900

    [SPARK-41553][PS][PYTHON][CORE] Fix the documentation for `num_files`
    
    ### What changes were proposed in this pull request?
    
    "num_files: the number of files to be written ..."
    `files` is changed to `partitions`
    "num_files: the number of partitions to be written ..."
    
    And there is added a text "This is deprecated. Use 
`DataFrame.spark.repartition` instead."
    
    With this PR users can find the right documentation before they get an 
error message that they should use something other than what they have used.
    
    ### Why are the changes needed?
    
    `num_files` has been deprecated and might be removed in a future version. "
    "Use `DataFrame.spark.repartition` instead.",
    
    The num_files argument doesn't manage the number of files, but specifying 
the partition number. https://github.com/apache/spark/pull/33379
    
    ### Does this PR introduce _any_ user-facing change?
    No.
    
    ### How was this patch tested?
    Pass GA
    
    Closes #39098 from bjornjorgensen/num_files-to-repartition.
    
    Lead-authored-by: Bjørn <[email protected]>
    Co-authored-by: Bjørn Jørgensen 
<[email protected]>
    Co-authored-by: Bjørn Jørgensen <[email protected]>
    Signed-off-by: Hyukjin Kwon <[email protected]>
---
 python/pyspark/pandas/generic.py | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/python/pyspark/pandas/generic.py b/python/pyspark/pandas/generic.py
index c2a32f37767..9f218ac9a9f 100644
--- a/python/pyspark/pandas/generic.py
+++ b/python/pyspark/pandas/generic.py
@@ -671,8 +671,9 @@ class Frame(object, metaclass=ABCMeta):
 
         .. note:: pandas-on-Spark writes CSV files into the directory, `path`, 
and writes
             multiple `part-...` files in the directory when `path` is 
specified.
-            This behavior was inherited from Apache Spark. The number of files 
can
-            be controlled by `num_files`.
+            This behavior was inherited from Apache Spark. The number of 
partitions can
+            be controlled by `num_files`. This is deprecated.
+            Use `DataFrame.spark.repartition` instead.
 
         Parameters
         ----------
@@ -694,8 +695,8 @@ class Frame(object, metaclass=ABCMeta):
         escapechar: str, default None
             String of length 1. Character used to escape `sep` and `quotechar`
             when appropriate.
-        num_files: the number of files to be written in `path` directory when
-            this is a path.
+        num_files: the number of partitions to be written in `path` directory 
when
+            this is a path. This is deprecated. Use 
`DataFrame.spark.repartition` instead.
         mode: str
             Python write mode, default 'w'.
 
@@ -900,8 +901,9 @@ class Frame(object, metaclass=ABCMeta):
 
         .. note:: pandas-on-Spark writes JSON files into the directory, 
`path`, and writes
             multiple `part-...` files in the directory when `path` is 
specified.
-            This behavior was inherited from Apache Spark. The number of files 
can
-            be controlled by `num_files`.
+            This behavior was inherited from Apache Spark. The number of 
partitions can
+            be controlled by `num_files`. This is deprecated.
+            Use `DataFrame.spark.repartition` instead.
 
         .. note:: output JSON format is different from pandas'. It always uses 
`orient='records'`
             for its output. This behavior might have to change soon.
@@ -927,8 +929,8 @@ class Frame(object, metaclass=ABCMeta):
             A string representing the compression to use in the output file,
             only used when the first argument is a filename. By default, the
             compression is inferred from the filename.
-        num_files: the number of files to be written in `path` directory when
-            this is a path.
+        num_files: the number of partitions to be written in `path` directory 
when
+            this is a path. This is deprecated. Use 
`DataFrame.spark.repartition` instead.
         mode: str
             Python write mode, default 'w'.
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark] branch master updated: [SPARK-41553][PS][PYTHON][CORE] Fix the documentation for `num_files`

Reply via email to