[ 
https://issues.apache.org/jira/browse/SPARK-40005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40005.
----------------------------------
      Assignee: Hyukjin Kwon
    Resolution: Done

> Self-contained examples with parameter descriptions in PySpark documentation
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-40005
>                 URL: https://issues.apache.org/jira/browse/SPARK-40005
>             Project: Spark
>          Issue Type: Umbrella
>          Components: Documentation, PySpark
>    Affects Versions: 3.4.0
>            Reporter: Hyukjin Kwon
>            Assignee: Hyukjin Kwon
>            Priority: Critical
>
> This JIRA aims to improve PySpark documentation in:
> - {{pyspark}}
> - {{pyspark.sql}}
> - {{pyspark.sql.streaming}}
> We should:
> - Make the examples self-contained, e.g., 
> https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
> - Document {{Parameters}} 
> https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
>  There are many API that misses parameters in PySpark, e.g., 
> [DataFrame.union|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.union.html#pyspark.sql.DataFrame.union]
> If the size of file is large, e.g., dataframe.py, we should split that down 
> into each subtask, and improve documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to