[jira] [Updated] (SPARK-39405) NumPy input support in PySpark SQL

Xinrong Meng (Jira) Wed, 26 Oct 2022 14:32:17 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-39405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Xinrong Meng updated SPARK-39405:
---------------------------------
    Summary: NumPy input support in PySpark SQL  (was: NumPy support in SQL)

> NumPy input support in PySpark SQL
> ----------------------------------
>
>                 Key: SPARK-39405
>                 URL: https://issues.apache.org/jira/browse/SPARK-39405
>             Project: Spark
>          Issue Type: Umbrella
>          Components: PySpark
>    Affects Versions: 3.4.0
>            Reporter: Xinrong Meng
>            Assignee: Xinrong Meng
>            Priority: Major
>
> NumPy is the fundamental package for scientific computing with Python. It is 
> very commonly used, especially in the data science world. For example, Pandas 
> is backed by NumPy, and Tensors also supports interchangeable conversion 
> from/to NumPy arrays. 
>  
> However, PySpark only supports Python built-in types with the exception of 
> “SparkSession.createDataFrame(pandas.DataFrame)” and “DataFrame.toPandas”. 
>  
> This issue has been raised multiple times internally and externally, see also 
> SPARK-2012, SPARK-37697, SPARK-31776, and SPARK-6857.
>  
> With the NumPy support in SQL, we expect more adaptations from naive data 
> scientists and newcomers leveraging their existing background and codebase 
> with NumPy.
>  
> See more 
> [https://docs.google.com/document/d/1WsBiHoQB3UWERP47C47n_frffxZ9YIoGRwXSwIeMank/edit#]
> .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-39405) NumPy input support in PySpark SQL

Reply via email to