gortiz commented on code in PR #11112:
URL: https://github.com/apache/pinot/pull/11112#discussion_r1265085660
##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/HashJoinOperator.java:
##########
@@ -169,9 +169,14 @@ private void buildBroadcastHashTable() {
}
List<Object[]> container = rightBlock.getContainer();
// put all the rows into corresponding hash collections keyed by the key
selector function.
+ int initialHeuristicSize = 16;
for (Object[] row : container) {
- List<Object[]> hashCollection =
- _broadcastRightTable.computeIfAbsent(new
Key(_rightKeySelector.getKey(row)), k -> new ArrayList<>());
+ ArrayList<Object[]> hashCollection =
Review Comment:
I won't care that much about adding a conditional here given the complexity
of
[HashMap.computeIfAbsent](https://github.com/openjdk/jdk/blob/acf591e856ce4b43303b1578bd64a8c9ab0063ea/src/java.base/share/classes/java/util/HashMap.java#L1195).
> Also do you know if the JDK can do loop unrolling here?
I don't know, but I would guess it doesn't. What we do here is too complex.
We are creating a new instance that copy some data from an array (in another
loop) then we lookup for that new object in the map and in case the value is
not there we call a lambda to create the value of that key. After that we just
add the element to the list.
We can try to apply some extra optimizations here. For example we can use a
lightweight version of Key that does not copy the array of keys but get a
reference to the column and the same `_columnIndices` we use right now and uses
that to calculate hash and equals. Therefore we wouldn't need to create heavier
instances for each row. The main problem with this approach is that the
hashCode and equals will be a bit slower and we would need to keep a reference
to the original row. But the latter can be further optimized
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]