gianm opened a new pull request, #13516: URL: https://github.com/apache/druid/pull/13516
Prior to this patch, IndexedTable indexes are sorted IntList. This works great when we have a single-column join key: we simply retrieve the list and we know what rows match. However, when we have a composite key, we need to merge the sorted lists. This is inefficient when one is very dense and others are very sparse. This patch switches from sorted IntList to IntSortedSet, and changes to the following intersection algorithm: 1) Initialize the intersection set to the smallest matching set from the various parts of the composite key. 2) For each element in that smallest set, check other sets for that element. If any do *not* include it, then remove the element from the intersection set. This way, complexity scales with the size of the smallest set, not the largest one. I tried this out locally on a 3-part key with cardinalities 200, 200, and 300000. The test query completed in 1 second instead of 45 seconds. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
