[ 
https://issues.apache.org/jira/browse/SPARK-32082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-32082:
---------------------------------
    Description: 
The importance of Python and PySpark has grown radically in the last few years. 
The number of PySpark downloads reached [more than 1.3 million _every 
week_|https://pypistats.org/packages/pyspark] when we count them _only_ in 
PyPI. Nevertheless, PySpark is still less Pythonic. It exposes many JVM error 
messages as an example, and the API documentation is poorly written.

This epic tickets aims to improve the usability in PySpark, and make it more 
Pythonic. To be more explicit,

*  Better usability in PySpark
* User-facing error message and warnings
   * Documentation
   * User guide
   * Better examples and API documentation, e.g. 
[Koalas|https://koalas.readthedocs.io/en/latest/] and 
[pandas|https://pandas.pydata.org/docs/]


- Interoperability with other Python libraries
  - Visualization
  - Compatibility with other libraries such as NumPy universal functions or 
pandas possibly by leveraging Koalas

4. Pandas UDF enhancements and type hints
5. PySpark with Hadoop 3 support on PyPi


  was:The importance of Python and PySpark has grown radically in the last few 
years. This epic tickets aims to improve the usability in PySpark, and make it 
more Pythonic.


> Project Zen: Improving Python usability
> ---------------------------------------
>
>                 Key: SPARK-32082
>                 URL: https://issues.apache.org/jira/browse/SPARK-32082
>             Project: Spark
>          Issue Type: Epic
>          Components: PySpark
>    Affects Versions: 3.1.0
>            Reporter: Hyukjin Kwon
>            Priority: Major
>
> The importance of Python and PySpark has grown radically in the last few 
> years. The number of PySpark downloads reached [more than 1.3 million _every 
> week_|https://pypistats.org/packages/pyspark] when we count them _only_ in 
> PyPI. Nevertheless, PySpark is still less Pythonic. It exposes many JVM error 
> messages as an example, and the API documentation is poorly written.
> This epic tickets aims to improve the usability in PySpark, and make it more 
> Pythonic. To be more explicit,
> *  Better usability in PySpark
> * User-facing error message and warnings
>    * Documentation
>    * User guide
>    * Better examples and API documentation, e.g. 
> [Koalas|https://koalas.readthedocs.io/en/latest/] and 
> [pandas|https://pandas.pydata.org/docs/]
> - Interoperability with other Python libraries
>   - Visualization
>   - Compatibility with other libraries such as NumPy universal functions or 
> pandas possibly by leveraging Koalas
> 4. Pandas UDF enhancements and type hints
> 5. PySpark with Hadoop 3 support on PyPi



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to