[jira] [Updated] (HUDI-5080) UnpersistRdds unpersist all rdds in the spark context

sivabalan narayanan (Jira) Sat, 22 Oct 2022 17:17:05 -0700


     [ 
https://issues.apache.org/jira/browse/HUDI-5080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


sivabalan narayanan updated HUDI-5080:
--------------------------------------
    Description: 
In SparkRDDWriteClient, we have a method to clean up persisted Rdds to free up 
the space occupied. 

[https://github.com/apache/hudi/blob/b78c3441c4e28200abec340eaff852375764cbdb/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/SparkRDDWriteClient.java#L584]

But the issue is, it cleans up all persisted rdds in the given spark context. 
This will impact, async compaction or any other async table services running. 

or even if there are multiple streams writing to different tables, this will be 
cause a huge impact. 

 

This also needs to be fixed with DeltaSync. 

[https://github.com/apache/hudi/blob/b78c3441c4e28200abec340eaff852375764cbdb/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L345]

 

  was:
In SparkRDDWriteClient, we have a method to clean up persisted Rdds to free up 
the space occupied. 

[https://github.com/apache/hudi/blob/b78c3441c4e28200abec340eaff852375764cbdb/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/SparkRDDWriteClient.java#L584]

But the issue is, it cleans up all persisted rdds in the given spark context. 
This will impact, async compaction or any other async table services running. 

or even if there are multiple streams writing to different tables, this will be 
cause a huge impact. 

 

 

 


> UnpersistRdds unpersist all rdds in the spark context
> -----------------------------------------------------
>
>                 Key: HUDI-5080
>                 URL: https://issues.apache.org/jira/browse/HUDI-5080
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: writer-core
>            Reporter: sivabalan narayanan
>            Assignee: sivabalan narayanan
>            Priority: Critical
>             Fix For: 0.12.2
>
>
> In SparkRDDWriteClient, we have a method to clean up persisted Rdds to free 
> up the space occupied. 
> [https://github.com/apache/hudi/blob/b78c3441c4e28200abec340eaff852375764cbdb/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/SparkRDDWriteClient.java#L584]
> But the issue is, it cleans up all persisted rdds in the given spark context. 
> This will impact, async compaction or any other async table services running. 
> or even if there are multiple streams writing to different tables, this will 
> be cause a huge impact. 
>  
> This also needs to be fixed with DeltaSync. 
> [https://github.com/apache/hudi/blob/b78c3441c4e28200abec340eaff852375764cbdb/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L345]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5080) UnpersistRdds unpersist all rdds in the spark context

Reply via email to