Jungtaek Lim created SPARK-37458:
------------------------------------
Summary: Remove unnecessary object serialization on foreachBatch
Key: SPARK-37458
URL: https://issues.apache.org/jira/browse/SPARK-37458
Project: Spark
Issue Type: Improvement
Components: Structured Streaming
Affects Versions: 3.3.0
Reporter: Jungtaek Lim
Currently, ForeachBatchSink leverages ExternalRDD with converting
RDD[InternalRow] to RDD[T], to provide Dataset[T] to the user function. This
adds SerializeFromObject in the plan, which is actually not required.
We can leverage LogicalRDD instead, to remove SerializeFromObject from the plan.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]