[
https://issues.apache.org/jira/browse/SPARK-54337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18043166#comment-18043166
]
Tim Swast commented on SPARK-54337:
-----------------------------------
Thanks, Devin! Let me know if you need any help with this. My team has some
experience implementing `__dataframe__` on BigQuery DataFrames.
> Expose __dataframe__ interchange protocol on pyspark RDD, SQL DataFrame, and
> pandas DataFrame APIs
> --------------------------------------------------------------------------------------------------
>
> Key: SPARK-54337
> URL: https://issues.apache.org/jira/browse/SPARK-54337
> Project: Spark
> Issue Type: Improvement
> Components: Input/Output
> Affects Versions: 4.0.1
> Reporter: Tim Swast
> Priority: Major
>
> The `__dataframe__` interchange protocol
> ([https://data-apis.org/dataframe-protocol/latest/purpose_and_scope.html)]
> enables easy integration with many packages across the Python data ecosystem.
> This is especially true for visualization packages such as matplotlib and
> Microsoft's Data Wrangler
> ([https://github.com/microsoft/vscode-data-wrangler/issues/555#issuecomment-3215797533).]
> I believe this API would be useful across all dataframe-like objects,
> including:
> * RDD
> [https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.RDD.html]
> * sql.DataFrame
> [https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.html#pyspark.sql.DataFrame]
> * pandas.DataFrame
> [https://spark.apache.org/docs/latest/api/python/reference/pyspark.pandas/frame.html]
> Implementation-wise, this would likely download all data in memory. For
> example, in BigQuery DataFrames, we expose this API by first serializing to
> Arrow.
> https://github.com/googleapis/python-bigquery-dataframes/blob/20ab469d29767a2f04fe02aa66797893ecd1c539/bigframes/core/interchange.py#L88
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]