[GitHub] [hudi] conanxjp commented on issue #3324: [SUPPORT]Slow Performance With Spark Structured Streaming

GitBox Mon, 13 Sep 2021 17:16:41 -0700


conanxjp commented on issue #3324:
URL: https://github.com/apache/hudi/issues/3324#issuecomment-918685414



   @nsivabalan Sorry for the delay, here is some updates.
   
   The weird behavior I reported maybe caused by an amazon version of spark, 
but depends on the versions, it sometimes can be triggered by a combination of 
hudi and amazon spark.
   
   To the MOR table, I did give it a try, as well as the clustering feature. 
The compaction for MOR doesn't have a good use case for our streaming app as 
the app is doing deduplication on the run and every records delivered will not 
be modified, not by the streaming app itself. We do have batch external 
modifications jobs that ran occasionally, but we have the requirements to not 
interrupt the running streaming app as the most fresh data is always used. With 
the hudi commits, it seems we can't run parallel hudi jobs committing to the 
same location even though the streaming app and the external modification jobs 
are touching different partitions of the table. Not sure whether there is any 
way that we can achieve this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] conanxjp commented on issue #3324: [SUPPORT]Slow Performance With Spark Structured Streaming

Reply via email to