gianm opened a new pull request, #13516:
URL: https://github.com/apache/druid/pull/13516

   Prior to this patch, IndexedTable indexes are sorted IntList. This works 
great when we have a single-column join key: we simply retrieve the list and we 
know what rows match. However, when we have a composite key, we need to merge 
the sorted lists. This is inefficient when one is very dense and others are 
very sparse.
   
   This patch switches from sorted IntList to IntSortedSet, and changes to the 
following intersection algorithm:
   
   1) Initialize the intersection set to the smallest matching set from the 
various parts of the composite key.
   
   2) For each element in that smallest set, check other sets for that element. 
If any do *not* include it, then remove the element from the intersection set.
   
   This way, complexity scales with the size of the smallest set, not the 
largest one.
   
   I tried this out locally on a 3-part key with cardinalities 200, 200, and 
300000. The test query completed in 1 second instead of 45 seconds.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to