In conclusion, I think current design of the load/drop rules is flexible and 
able to deal with almost all scenarios. But current `PeriodDropRule` is a 
impractical rule for it will always drop recent data. Then if people want to 
`retain 30 days` data, they can not use such `PeriodDropRule` but have to do 
like: load 30 days, drop forever. And because people have used the `drop 
forever` rule which would drop segments in the future, then below things 
occured:
> 2. The user loads some data from slightly in the future (maybe some clocks 
> are running a bit fast or slow) using streaming ingestion. This creates a 
> segment with an interval that is in the future.
> 3. The coordinator disables the segment immediately upon noticing it (since 
> it is not within the last 30 days).
> 4. The Kafka tasks time out during handoff (because the segments are never 
> loaded).
> 5. And after that timeout, the data that was slightly in the future is still 
> not available!

Then I think there are two ways to solve these things:
1. Period load rules include the future by default
2. Add a new drop rule or modify current `PeriodDropRule` to support `drop 
before a period`, then if people want to `retain 30 days` data, they can do 
like this: drop 30 days before, load forever.

I prefer the second way and want to modify current `PeriodDropRule` not add a 
new one because the current one is very impractical, IMO no people would like 
to use such drop rule.

[ Full content available at: 
https://github.com/apache/incubator-druid/issues/5869 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to