gvsmirnov opened a new issue #8646: Broker returns data outside of requested 
interval
URL: https://github.com/apache/incubator-druid/issues/8646
 
 
   ### Affected Version
   
   The issue was originally discovered in 0.10.1. After upgrading to 0.15 
(latest at the time), the issue can still be reproduced. I have not tried with 
0.16 yet, but I do not see any changes relevant to this problem.
   
   ### Description
   
   The affected deployment has realtime kafka indexing 
(`druid-kafka-indexing-service` extension) running on one middle manager, and a 
bunch of historical nodes. 
   
   #### Steps to reproduce
   
   To reproduce the issue, we make a simple topN query (groupBy also works) 
using an interval that has some data in the realtime kafka indexing peon: 
[query.txt](https://github.com/apache/incubator-druid/files/3703122/query.txt). 
The query's interval is specified as 
`2019-10-08T13:00:00.000Z/2019-10-08T13:10:00.000Z` (`13:00/13:10`) and has a 
`MINUTE` granularity.
   
   #### Expected behavior
   
   I expect all the returned timestamps to be in the requested interval 
(`13:00/13:10`).
   
   #### Actual behavior
   
   The response contains timestamps that are outside of the requested interval, 
e.g. for `13:11`, often all the way till the end of the hour.
   
   ### Troubleshooting performed
   
   If I use an interval that only has data in the historical nodes (not kafka 
indexing service peons), the issue does not reproduce.
   
   If I make a query to broker's 
`/druid/v2/datasources/{dataSourceName}/candidates` API, then it returns a 
mapping like this:
   
   ```(javascript)
   [{
     "interval" : "2019-10-08T13:00:00.000Z/2019-10-08T13:10:00.000Z", // OK
     "locations": [/* the indexing node */]
   },
   {
     "interval" : "2019-10-08T13:00:00.000Z/2019-10-08T14:00:00.000Z", // WRONG!
     "locations": [/* the historical nodes */]
   }]
   ```
   As you can see, somehow the interval attributed to historical nodes is 
incorrect, and lasts till the end of the hour (instead of till the end of the 
requested interval).
   
   If I enable request logging on historicals, I can see that the sub-queries 
(of the original topN query) sent to historicals by the broker already contain 
the invalid interval.
   
   I have also tried adding some extra debug logging to 
`org.apache.druid.client.ServerViewUtil` and can confirm that the 
`org.apache.druid.timeline.TimelineLookup#lookup` call returns some invalid 
items:
   
   ```
   callling lookup("2019-10-08T13:00:00.000Z/2019-10-08T13:10:00.000Z")
   returned list contains:
   TimelineObjectHolder{
       interval=2019-10-08T13:00:00.000Z/2019-10-08T14:00:00.000Z, // WRONG!
       trueInterval=2019-10-08T13:00:00.000Z/2/2019-10-08T14:00:00.000Z
      // ...
   }
   ```
   
   ... This is how far I have gotten so far. Any pointers on further 
troubleshooting are very welcome. When I get to the bottom of the issue, I will 
be happy to make a pull request for the fix as well.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to