sivabalan narayanan created HUDI-5080:
-----------------------------------------
Summary: UnpersistRdds unpersist all rdds in the spark context
Key: HUDI-5080
URL: https://issues.apache.org/jira/browse/HUDI-5080
Project: Apache Hudi
Issue Type: Bug
Components: writer-core
Reporter: sivabalan narayanan
In SparkRDDWriteClient, we have a method to clean up persisted Rdds to free up
the space occupied.
[https://github.com/apache/hudi/blob/b78c3441c4e28200abec340eaff852375764cbdb/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/SparkRDDWriteClient.java#L584]
But the issue is, it cleans up all persisted rdds in the given spark context.
This will impact, async compaction or any other async table services running.
or even if there are multiple streams writing to different tables, this will be
cause a huge impact.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)