siddharthteotia commented on a change in pull request #4790: Support ORDER BY
for DISTINCT queries
URL: https://github.com/apache/incubator-pinot/pull/4790#discussion_r342722262
##########
File path:
pinot-core/src/main/java/org/apache/pinot/core/data/table/SimpleIndexedTable.java
##########
@@ -139,6 +144,14 @@ public int size() {
@Override
public Iterator<Record> iterator() {
+ if (_iterator == null) {
Review comment:
The existing implementation seems to be written assuming that clients won't
have to iterate on the table until they call finish() and that is why the
iterator() simply returned the iterator (NULL if client didn't call finish
prior to it). So basically the workflow imposed on the client is:
-- upsert() as many as you want
-- finish()
-- iterator()
-- done
and finish() should only be called once as per the Javadoc of this API.
Now you may still want to simply iterate (over random records) without
calling finish() -- finish() will have a cost of resizing for ORDER BY queries.
We have this need during merge of DistinctAggregationFunction() where we
iterate over 1 ... N-1 indexed tables and merge them into 0th indexed table.
Only after the merge is done, we call finish() on the 0th indexed table since
that has all the records. That way we get the final iterator() and trimmed set.
Now in iterator() function, we can't just simply do
_lookupMap.values().iterator() since if the user called finish() prior to it
and there was order by, then finish() would have created an iterator over
sorted records. So now doing _lookupMap.values().iterator() will give incorrect
order to the caller.
I agree with the fact that user of a data structure should be able to create
an iterator as many times as they want. However, for IndexedTable the creation
of iterator is tied to finish().
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]