siddharthteotia commented on a change in pull request #4790: Support ORDER BY 
for DISTINCT queries
URL: https://github.com/apache/incubator-pinot/pull/4790#discussion_r342722262
 
 

 ##########
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/data/table/SimpleIndexedTable.java
 ##########
 @@ -139,6 +144,14 @@ public int size() {
 
   @Override
   public Iterator<Record> iterator() {
+    if (_iterator == null) {
 
 Review comment:
   The existing implementation seems to be written assuming that clients won't 
have to iterate on the table until they call finish() and that is why the 
iterator() simply returned the iterator (NULL if client didn't call finish 
prior to it). So basically the workflow imposed on the client is:
   
   -- upsert() as many as you want
   -- finish()
   -- iterator()
   -- done 
   
   and finish() should only be called once as per the Javadoc of this API.
   
   Now you may still want to simply iterate (over random records) without 
calling finish() -- finish() will have a cost of resizing for ORDER BY queries. 
We have this need during merge of DistinctAggregationFunction() where we 
iterate over 1 ... N-1 indexed tables and merge them into 0th indexed table.  
Only after the merge is done, we call finish() on the 0th indexed table since 
that has all the records. That way we get the final iterator() and trimmed set.
   
   Now in iterator() function, we can't just simply do 
_lookupMap.values().iterator() since if the user called finish() prior to it 
and there was order by, then finish() would have created an iterator over 
sorted records. So now doing _lookupMap.values().iterator() will give incorrect 
order to the caller. 
   
   I agree with the fact that user of a data structure should be able to create 
an iterator as many times as they want. However, for IndexedTable the creation 
of iterator is tied to finish().  

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to