[ 
https://issues.apache.org/jira/browse/HUDI-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17482216#comment-17482216
 ] 

Volodymyr Burenin commented on HUDI-2189:
-----------------------------------------

There is a strong use case for it. At our place we run DeltaStreamer using our 
own scheduler(potentially will become opensource), it schedules all ingestion 
jobs taking into account the amount of incoming data in the kafka queue, 
latency requirements, etc. As well as it is looking at the number of partitions 
and trim them when necessary, so far it happens surgically by modifying 
metastore and and removing data from the storage. The size of those tables is 
gigantic, the data in them deprecates very fast, basically becoming useless 
after 10-14 days - it needs to be trimmed, otherwise the cost of keeping that 
data gets too high.
I would strongly recommend to provide a way to tell DeltaStreamer which 
partitions needs to be dropped, via CLI or via properties file, anything works, 
since the scheduler dynamically generates all these parameters.

[~harsh1231] [~shivnarayan] [~codope] 

> Delete partition support in HoodieDeltaStreamer 
> ------------------------------------------------
>
>                 Key: HUDI-2189
>                 URL: https://issues.apache.org/jira/browse/HUDI-2189
>             Project: Apache Hudi
>          Issue Type: Task
>          Components: deltastreamer
>            Reporter: Samrat Deb
>            Assignee: sivabalan narayanan
>            Priority: Critical
>             Fix For: 0.11.0
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to