Peng Zhong created SPARK-43474:
----------------------------------
Summary: Add support to create DataFrame Reference in Spark connect
Key: SPARK-43474
URL: https://issues.apache.org/jira/browse/SPARK-43474
Project: Spark
Issue Type: Task
Components: Connect, Structured Streaming
Affects Versions: 3.5.0
Reporter: Peng Zhong
Add support in Spark Connect to cache a DataFrame on server side. From client
side, it can create a reference to that DataFrame given the cache key.
This function will be used in streaming foreachBatch(). Server needs to call
user function for every batch which takes a DataFrame as argument. With the new
function, we can just cache the DataFrame on the server. Pass the id back to
client which can creates the DataFrame reference. The server will replace the
reference when transforming.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]