repnop commented on issue #8276: URL: https://github.com/apache/druid/issues/8276#issuecomment-665066785
Hi, I'm having the same kind of problem and came across this issue except that in my case the data isn't duplicated, only the timestamp, and (being very new to druid) I'm not sure I can tell what the solution was here? I'm aggregating values by `longMax` for one-minute intervals and the duplicate data is causing a pretty big issue since I'm trying to determine a rate & graph it. I attempted to set the intermediate persist interval higher to see if that would help but it doesn't seem like that's the case. I also attempted to turn on compaction but I feel like I must have done that incorrectly since I don't think the task has actually run once, but since the data is real time I'm not even sure that helps in my case since we want the most up to date data as possible and having such large spikes that dwarf the rest of the graph is making it pretty unreadable (as `pandas` is adding the two values with the same timestamp together). The data is being consumed from a Kafka topic which has entries thrown onto it every ~30 seconds at the moment for testing. Any help would be appreciated, thank you! ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
