Rajesh created HUDI-3775:
----------------------------

             Summary: Allow for offline compaction of MOR tables via spark 
streaming
                 Key: HUDI-3775
                 URL: https://issues.apache.org/jira/browse/HUDI-3775
             Project: Apache Hudi
          Issue Type: Improvement
          Components: compaction, spark
            Reporter: Rajesh


Currently there is no way to avoid compaction taking up a lot of resources when 
run inline or async for MOR tables via Spark Streaming. Delta Streamer has ways 
to assign resources between ingestion and async compaction but Spark Streaming 
does not have that option. 

Introducing a flag to turn off automatic compaction and allowing users to run 
compaction in a separate process will decouple both concerns.

This will also allow the users to size the cluster just for ingestion and deal 
with compaction separate without blocking.  We will need to look into 
documenting best practices for running offline compaction.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to