> I think current design of the load/drop rules is flexible and able to deal 
> with almost all scenarios.

Yeah, I think it was the rationale of the current design, but it doesn't 
explain why `PeriodLoadRule` doesn't consider future data or `PeriodDropRule` 
is designed in such a way. For `PeriodLoadRule`, I think we didn't consider 
future data at that time. For `PeriodDropRule`, I have no idea.

> I prefer the second way and want to modify current PeriodDropRule not add a 
> new one because the current one is very impractical, IMO no people would like 
> to use such drop rule.

I would say it's better to add a new rule because it's much easier. Modifying 
an existing rule in an incompatible way would take a long time, because we need 
to discuss about it and check its use cases as many as possible. Please note 
that modifying `PeriodLoadRule` to include future data is compatible with the 
current behavior. If `PeriodDropRule` is really not useful, we can deprecate 
and remove it in the future.

But, I know there is a production use case that `PeriodDropRule` _might_ be 
useful. They are ingesting data from a single Kafka stream into two different 
Druid dataSources. One stores the recent data of, let's say last month while 
another stores old data. In such cases, they can use `PeriodDropRule` for the 
old dataSource to drop last month data to avoid the potential duplicate data 
issue.

So, I would say adding new rules is more welcomed and having the current 
`PeriodDropRule` would be fine if there's no real problem. What do you think?

[ Full content available at: 
https://github.com/apache/incubator-druid/issues/5869 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to