[GitHub] [druid] repnop commented on issue #8276: KIS tasks in 0.15.1 RC2 sometimes duplicate rows with the same dimension values

GitBox Wed, 29 Jul 2020 00:29:09 -0700


repnop commented on issue #8276:
URL: https://github.com/apache/druid/issues/8276#issuecomment-665066785



   Hi, I'm having the same kind of problem and came across this issue except 
that in my case the data isn't duplicated, only the timestamp, and (being very 
new to druid) I'm not sure I can tell what the solution was here? I'm 
aggregating values by `longMax` for one-minute intervals and the duplicate data 
is causing a pretty big issue since I'm trying to determine a rate & graph it. 
I attempted to set the intermediate persist interval higher to see if that 
would help but it doesn't seem like that's the case. I also attempted to turn 
on compaction but I feel like I must have done that incorrectly since I don't 
think the task has actually run once, but since the data is real time I'm not 
even sure that helps in my case since we want the most up to date data as 
possible and having such large spikes that dwarf the rest of the graph is 
making it pretty unreadable (as `pandas` is adding the two values with the same 
timestamp together). The data is being consumed from a Kafka topic which has
  entries thrown onto it every ~30 seconds at the moment for testing. Any help 
would be appreciated, thank you!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] repnop commented on issue #8276: KIS tasks in 0.15.1 RC2 sometimes duplicate rows with the same dimension values

Reply via email to