[ 
https://issues.apache.org/jira/browse/SPARK-54337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18043166#comment-18043166
 ] 

Tim Swast commented on SPARK-54337:
-----------------------------------

Thanks, Devin! Let me know if you need any help with this. My team has some 
experience implementing `__dataframe__` on BigQuery DataFrames.

> Expose __dataframe__ interchange protocol on pyspark RDD, SQL DataFrame, and 
> pandas DataFrame APIs
> --------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-54337
>                 URL: https://issues.apache.org/jira/browse/SPARK-54337
>             Project: Spark
>          Issue Type: Improvement
>          Components: Input/Output
>    Affects Versions: 4.0.1
>            Reporter: Tim Swast
>            Priority: Major
>
> The `__dataframe__` interchange protocol 
> ([https://data-apis.org/dataframe-protocol/latest/purpose_and_scope.html)] 
> enables easy integration with many packages across the Python data ecosystem. 
> This is especially true for visualization packages such as matplotlib and 
> Microsoft's Data Wrangler 
> ([https://github.com/microsoft/vscode-data-wrangler/issues/555#issuecomment-3215797533).]
> I believe this API would be useful across all dataframe-like objects, 
> including:
>  * RDD 
> [https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.RDD.html]
>  * sql.DataFrame 
> [https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.html#pyspark.sql.DataFrame]
>  * pandas.DataFrame 
> [https://spark.apache.org/docs/latest/api/python/reference/pyspark.pandas/frame.html]
> Implementation-wise, this would likely download all data in memory. For 
> example, in BigQuery DataFrames, we expose this API by first serializing to 
> Arrow. 
> https://github.com/googleapis/python-bigquery-dataframes/blob/20ab469d29767a2f04fe02aa66797893ecd1c539/bigframes/core/interchange.py#L88



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to