clintropolis commented on a change in pull request #7133: Time Ordering On Scans
URL: https://github.com/apache/incubator-druid/pull/7133#discussion_r269836352
##########
File path:
processing/src/main/java/org/apache/druid/query/scan/ScanQueryLimitRowIterator.java
##########
@@ -81,12 +109,26 @@ public ScanResultValue next()
} else {
// last batch
// single batch length is <= Integer.MAX_VALUE, so this should not
overflow
- int left = (int) (limit - count);
+ int numLeft = (int) (limit - count);
count = limit;
- return new ScanResultValue(batch.getSegmentId(), batch.getColumns(),
events.subList(0, left));
+ return new ScanResultValue(batch.getSegmentId(), batch.getColumns(),
events.subList(0, numLeft));
+ }
+ } else {
+ // Perform single-event ScanResultValue batching at the outer level.
Each scan result value from the yielder
+ // in this case will only have one event so there's no need to iterate
through events.
+ int batchSize = query.getBatchSize();
+ List<Object> eventsToAdd = new ArrayList<>(batchSize);
+ List<String> columns = new ArrayList<>();
+ while (eventsToAdd.size() < batchSize && !yielder.isDone() && count <
limit) {
+ ScanResultValue srv = yielder.get();
+ // Only replace once using the columns from the first event
+ columns = columns.isEmpty() ? srv.getColumns() : columns;
+ eventsToAdd.add(Iterables.getOnlyElement((List<Object>)
srv.getEvents()));
+ yielder = yielder.next(null);
+ count++;
}
+ return new ScanResultValue(null, columns, eventsToAdd);
Review comment:
It might be interesting to encode the interval of events contained in this
batch as an iso interval as the "segmentId", but it also feels a bit magical
and maybe unexpected. It does seem more useful than `null` though, and would
allow seeking to chunks for specific time ranges out of the full result set
without reading every individual event. Thoughts anyone?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]