[
https://issues.apache.org/jira/browse/SPARK-39405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xinrong Meng updated SPARK-39405:
---------------------------------
Summary: NumPy input support in PySpark SQL (was: NumPy support in SQL)
> NumPy input support in PySpark SQL
> ----------------------------------
>
> Key: SPARK-39405
> URL: https://issues.apache.org/jira/browse/SPARK-39405
> Project: Spark
> Issue Type: Umbrella
> Components: PySpark
> Affects Versions: 3.4.0
> Reporter: Xinrong Meng
> Assignee: Xinrong Meng
> Priority: Major
>
> NumPy is the fundamental package for scientific computing with Python. It is
> very commonly used, especially in the data science world. For example, Pandas
> is backed by NumPy, and Tensors also supports interchangeable conversion
> from/to NumPy arrays.
>
> However, PySpark only supports Python built-in types with the exception of
> “SparkSession.createDataFrame(pandas.DataFrame)” and “DataFrame.toPandas”.
>
> This issue has been raised multiple times internally and externally, see also
> SPARK-2012, SPARK-37697, SPARK-31776, and SPARK-6857.
>
> With the NumPy support in SQL, we expect more adaptations from naive data
> scientists and newcomers leveraging their existing background and codebase
> with NumPy.
>
> See more
> [https://docs.google.com/document/d/1WsBiHoQB3UWERP47C47n_frffxZ9YIoGRwXSwIeMank/edit#]
> .
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]