[
https://issues.apache.org/jira/browse/HUDI-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17482216#comment-17482216
]
Volodymyr Burenin commented on HUDI-2189:
-----------------------------------------
There is a strong use case for it. At our place we run DeltaStreamer using our
own scheduler(potentially will become opensource), it schedules all ingestion
jobs taking into account the amount of incoming data in the kafka queue,
latency requirements, etc. As well as it is looking at the number of partitions
and trim them when necessary, so far it happens surgically by modifying
metastore and and removing data from the storage. The size of those tables is
gigantic, the data in them deprecates very fast, basically becoming useless
after 10-14 days - it needs to be trimmed, otherwise the cost of keeping that
data gets too high.
I would strongly recommend to provide a way to tell DeltaStreamer which
partitions needs to be dropped, via CLI or via properties file, anything works,
since the scheduler dynamically generates all these parameters.
[~harsh1231] [~shivnarayan] [~codope]
> Delete partition support in HoodieDeltaStreamer
> ------------------------------------------------
>
> Key: HUDI-2189
> URL: https://issues.apache.org/jira/browse/HUDI-2189
> Project: Apache Hudi
> Issue Type: Task
> Components: deltastreamer
> Reporter: Samrat Deb
> Assignee: sivabalan narayanan
> Priority: Critical
> Fix For: 0.11.0
>
> Original Estimate: 4h
> Remaining Estimate: 4h
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)