agavra opened a new issue, #9998: URL: https://github.com/apache/pinot/issues/9998
Background: imagine we have the two following table schemas: ``` (A) colA INT, colB STRING (B) col1 STRING, col2 INT ``` and I want to issue `SELECT * FROM A JOIN B ON A.colA = B.col2 AND A.colB = B.col1`. In this case, we will need to hash shuffle the rows to match partitions. In Calcite, we generate a hash distribution with the same join keys: ``` leftExchange = LogicalExchange.create(leftInput, RelDistributions.hash(joinInfo.leftKeys)); rightExchange = LogicalExchange.create(rightInput, RelDistributions.hash(joinInfo.rightKeys)); ``` When we do this, Calcite will order the join keys passed into `RelDistributions.hash` to be in ascending order, even if we passed them in with a particular ordering. In the example above, we would have called: ``` RelDistributions.hash(1, 2); // for A, because join keys are in order RelDistributsions.hash(2, 1); // for B, because we want col2 to be the first join key ``` But calcite reorders them to both be `[1, 2]`. This causes a problem when we hash our keys. Imagine we had the following rows: ``` (a) colA: 1, colB: "foo" (b) col1: "foo", col2: 1 ``` If we simply use the column ordering that calcite provides for the hash exchange, we may not hash these two rows into the same partition. #9996 provides a workaround for the solution by using a hash code algorithm that intentionally generates collisions independent of the ordering of the columns (using hash code addition). This is likely an acceptable solution in the long run so long as the distribution of the hash codes is semi-random even after addition (some initial experimentations show that it is). A better solution, however, would be to fix calcite to maintain the join key ordering so that we can just hash the exact keys. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
