This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new d40447b2234 [SPARK-44945][DOCS][PYTHON] Automate PySpark error class 
documentation
d40447b2234 is described below

commit d40447b2234a21087ab83bf50a0388ce525caaa4
Author: itholic <haejoon....@databricks.com>
AuthorDate: Mon Aug 28 13:14:08 2023 +0900

    [SPARK-44945][DOCS][PYTHON] Automate PySpark error class documentation
    
    ### What changes were proposed in this pull request?
    
    This PR introduces a new script named `errors_doc_gen.py` that 
automatically generates an RST documentation file containing the error classes 
defined in `pyspark.errors.error_classes`.
    
    ### Why are the changes needed?
    
    With the increasing number of error classes in PySpark, it becomes 
imperative to have a centralized documentation that provides an overview of 
these errors for both developers and users. This automated script ensures that 
the documentation is always up-to-date with the error classes defined in the 
source code, reducing the manual effort required to update the documentation 
each time a new error class is introduced or an existing one is modified.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No, this script is primarily for internal use by developers to generate 
documentation. End-users will benefit from the updated documentation that this 
script produces, but they will not directly interact with the script itself.
    
    ### How was this patch tested?
    
    The script was executed locally, and the generated RST file was verified 
for accuracy and completeness. Furthermore, the generated RST was rendered 
using Sphinx to ensure it produces the expected visual output.
    
    <img width="1395" alt="Screenshot 2023-08-24 at 10 19 52 PM" 
src="https://github.com/apache/spark/assets/44108233/00142374-2fba-43c1-9034-c5db6989cffa";>
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #42658 from itholic/err_doc_automation.
    
    Authored-by: itholic <haejoon....@databricks.com>
    Signed-off-by: Hyukjin Kwon <gurwls...@apache.org>
---
 python/docs/source/conf.py                | 10 ++++
 python/docs/source/development/errors.rst | 92 -------------------------------
 python/pyspark/errors_doc_gen.py          | 61 ++++++++++++++++++++
 3 files changed, 71 insertions(+), 92 deletions(-)

diff --git a/python/docs/source/conf.py b/python/docs/source/conf.py
index e4ca39d4f0c..9f0fd3ec7f3 100644
--- a/python/docs/source/conf.py
+++ b/python/docs/source/conf.py
@@ -31,6 +31,16 @@ output_rst_file_path = (
 )
 generate_supported_api(output_rst_file_path)
 
+# generate development/errors.rst
+from pyspark.errors_doc_gen import generate_errors_doc
+
+output_rst_file_path = (
+    "%s/development/errors.rst"
+    % os.path.dirname(os.path.abspath(__file__))
+)
+generate_errors_doc(output_rst_file_path)
+
+
 # Remove previously generated rst files. Ignore errors just in case it stops
 # generating whole docs.
 gen_rst_dirs = ["reference/api", "reference/pyspark.pandas/api",
diff --git a/python/docs/source/development/errors.rst 
b/python/docs/source/development/errors.rst
deleted file mode 100644
index f48c224dbdc..00000000000
--- a/python/docs/source/development/errors.rst
+++ /dev/null
@@ -1,92 +0,0 @@
-..  Licensed to the Apache Software Foundation (ASF) under one
-    or more contributor license agreements.  See the NOTICE file
-    distributed with this work for additional information
-    regarding copyright ownership.  The ASF licenses this file
-    to you under the Apache License, Version 2.0 (the
-    "License"); you may not use this file except in compliance
-    with the License.  You may obtain a copy of the License at
-
-..    http://www.apache.org/licenses/LICENSE-2.0
-
-..  Unless required by applicable law or agreed to in writing,
-    software distributed under the License is distributed on an
-    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-    KIND, either express or implied.  See the License for the
-    specific language governing permissions and limitations
-    under the License.
-
-========================
-Error classes in PySpark
-========================
-
-This is a list of common, named error classes returned by PySpark which are 
defined at `error_classes.py 
<https://github.com/apache/spark/blob/master/python/pyspark/errors/error_classes.py>`_.
-
-When writing PySpark errors, developers must use an error class from the list. 
If an appropriate error class is not available, add a new one into the list. 
For more information, please refer to `Contributing Error and Exception 
<https://spark.apache.org/docs/latest/api/python/development/contributing.html#contributing-error-and-exception>`_.
-
-+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
-| Error class                                                | Error message   
                                                                                
             |
-+============================================================+==============================================================================================================+
-| ARGUMENT_REQUIRED                                          | Argument 
`<arg_name>` is required when <condition>.                                      
                    |
-+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
-| COLUMN_IN_LIST                                             | `<func_name>` 
does not allow a Column in a list.                                              
               |
-+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
-| DISALLOWED_TYPE_FOR_CONTAINER                              | Argument 
`<arg_name>`(`type`: <arg_type>) should only contain a type in 
[<allowed_types>], got <return_type> |
-+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
-| HIGHER_ORDER_FUNCTION_SHOULD_RETURN_COLUMN                 | Function 
`<func_name>` should return Column, got <return_type>.                          
                    |
-+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
-| NOT_BOOL                                                   | Argument 
`<arg_name>` should be a bool, got <arg_type>.                                  
                    |
-+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
-| NOT_BOOL_OR_DICT_OR_FLOAT_OR_INT_OR_LIST_OR_STR_OR_TUPLE   | Argument 
`<arg_name>` should be a bool, dict, float, int, str or tuple, got <arg_type>.  
                    |
-+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
-| NOT_BOOL_OR_DICT_OR_FLOAT_OR_INT_OR_STR                    | Argument 
`<arg_name>` should be a bool, dict, float, int or str, got <arg_type>.         
                    |
-+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
-| NOT_BOOL_OR_LIST                                           | Argument 
`<arg_name>` should be a bool or list, got <arg_type>.                          
                    |
-+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
-| NOT_BOOL_OR_STR                                            | Argument 
`<arg_name>` should be a bool or str, got <arg_type>.                           
                    |
-+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
-| NOT_COLUMN                                                 | Argument 
`<arg_name>` should be a Column, got <arg_type>.                                
                    |
-+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
-| NOT_COLUMN_OR_DATATYPE_OR_STR                              | Argument 
`<arg_name>` should be a Column, str or DataType, but got <arg_type>.           
                    |
-+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
-| NOT_COLUMN_OR_FLOAT_OR_INT_OR_LIST_OR_STR                  | Argument 
`<arg_name>` should be a column, float, integer, list or string, got 
<arg_type>.                    |
-+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
-| NOT_COLUMN_OR_INT                                          | Argument 
`<arg_name>` should be a Column or int, got <arg_type>.                         
                    |
-+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
-| NOT_COLUMN_OR_INT_OR_STR                                   | Argument 
`<arg_name>` should be a Column, int or str, got <arg_type>.                    
                    |
-+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
-| NOT_COLUMN_OR_STR                                          | Argument 
`<arg_name>` should be a Column or str, got <arg_type>.                         
                    |
-+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
-| NOT_DATAFRAME                                              | Argument 
`<arg_name>` should be a DataFrame, got <arg_type>.                             
                    |
-+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
-| NOT_DATATYPE_OR_STR                                        | Argument 
`<arg_name>` should be a DataType or str, got <arg_type>.                       
                    |
-+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
-| NOT_DICT                                                   | Argument 
`<arg_name>` should be a dict, got <arg_type>.                                  
                    |
-+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
-| NOT_EXPRESSION                                             | Argument 
<arg_name> should be a Expression, got <arg_type>.                              
                    |
-+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
-| NOT_FLOAT_OR_INT                                           | Argument 
<arg_name> should be a float or int, got <arg_type>.                            
                    |
-+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
-| NOT_FLOAT_OR_INT_OR_LIST_OR_STR                            | Argument 
<arg_name> should be a float, int, list or str, got <arg_type>.                 
                    |
-+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
-| NOT_INT                                                    | Argument 
<arg_name> should be an int, got <arg_type>.                                    
                    |
-+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
-| NOT_ITERABLE                                               | <objectName> is 
not iterable.                                                                   
             |
-+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
-| NOT_LIST_OR_STR_OR_TUPLE                                   | Argument 
<arg_name> should be a list, str or tuple, got <arg_type>.                      
                    |
-+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
-| NOT_LIST_OR_TUPLE                                          | Argument 
<arg_name> should be a list or tuple, got <arg_type>.                           
                    |
-+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
-| NOT_SAME_TYPE                                              | Argument 
<arg_name1> and <arg_name2> should be the same type, got <arg_type1> and 
<arg_type2>.               |
-+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
-| NOT_STR                                                    | Argument 
<arg_name> should be a str, got <arg_type>.                                     
                    |
-+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
-| NOT_WINDOWSPEC                                             | Argument 
<arg_name> should be a WindowSpec, got <arg_type>.                              
                    |
-+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
-| UNSUPPORTED_NUMPY_ARRAY_SCALAR                             | The type of 
array scalar '<dtype>' is not supported.                                        
                 |
-+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
-| UNSUPPORTED_PARAM_TYPE_FOR_HIGHER_ORDER_FUNCTION           | Function 
<func_name> should use only POSITIONAL or POSITIONAL OR KEYWORD arguments.      
                    |
-+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
-| WRONG_NUM_ARGS_FOR_HIGHER_ORDER_FUNCTION                   | Function 
<func_name> should take between 1 and 3 arguments, but provided function takes 
<num_args>.          |
-+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
-| WRONG_NUM_COLUMNS                                          | Function 
<func_name> should take at least <num_cols> columns.                            
                    |
-+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
diff --git a/python/pyspark/errors_doc_gen.py b/python/pyspark/errors_doc_gen.py
new file mode 100644
index 00000000000..e9b229062ba
--- /dev/null
+++ b/python/pyspark/errors_doc_gen.py
@@ -0,0 +1,61 @@
+import re
+from pyspark.errors.error_classes import ERROR_CLASSES_MAP
+
+
+def generate_errors_doc(output_rst_file_path: str) -> None:
+    """
+    Generates a reStructuredText (RST) documentation file for PySpark error 
classes.
+
+    This function fetches error classes defined in 
`pyspark.errors.error_classes`
+    and writes them into an RST file. The generated RST file provides an 
overview
+    of common, named error classes returned by PySpark.
+
+    Parameters
+    ----------
+    output_rst_file_path : str
+        The file path where the RST documentation will be written.
+
+    Notes
+    -----
+    The generated RST file can be rendered using Sphinx to visualize the 
documentation.
+    """
+    header = """..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+========================
+Error classes in PySpark
+========================
+
+This is a list of common, named error classes returned by PySpark which are 
defined at `error_classes.py 
<https://github.com/apache/spark/blob/master/python/pyspark/errors/error_classes.py>`_.
+
+When writing PySpark errors, developers must use an error class from the list. 
If an appropriate error class is not available, add a new one into the list. 
For more information, please refer to `Contributing Error and Exception 
<https://spark.apache.org/docs/latest/api/python/development/contributing.html#contributing-error-and-exception>`_.
+"""  # noqa
+    with open(output_rst_file_path, "w") as f:
+        f.write(header + "\n\n")
+        for error_key, error_details in ERROR_CLASSES_MAP.items():
+            f.write(error_key + "\n")
+            # The length of the error class name and underline must be the same
+            # to satisfy the RST format.
+            f.write("-" * len(error_key) + "\n\n")
+            messages = error_details["message"]
+            for message in messages:
+                # Escape parentheses with a backslash when they follow a 
backtick.
+                message = re.sub(r"`(\()", r"`\\\1", message)
+                f.write(message + "\n")
+            # Add 2 new lines between the descriptions of each error class
+            # to improve the readability of the generated RST file.
+            f.write("\n\n")


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to