soumyakanti3578 commented on PR #6027: URL: https://github.com/apache/hive/pull/6027#issuecomment-3190855413
@kasakrisz Sorry I should have clarified it in more detail. The bug was fixed by this: ``` if (longColVector.isRepeating) { if (longColVector.noNulls || !longColVector.isNull[0]) { lastValue = longColVector.vector[0]; isGroupResultNull = false; } else { -- isGroupResultNull = true; ++ isGroupResultNull = doesRespectNulls() || lastValue == null; } } ``` When `longColVector.isRepeating = true`, it means that the column vector contains the same value for all the rows in the batch and thus we just store the value in the `0th` index. When `if (longColVector.noNulls || !longColVector.isNull[0])` condition is `true`, we can simply pick up the value from the `0th` index because all rows in the batch has the same repeating non null value, so `lastValue` would be the same value too, and hence the group result is not null as indicated by `isGroupResultNull = false`. But when the condition is `false`, i.e., when values in the batch are repeating for the column (`longColVector.isRepeating = true`) AND the value in the first element is null (`!(longColVector.noNulls || !longColVector.isNull[0])`) we were simply setting `isGroupResultNull = true;` as indeed then the value in the column vector is null. This was fine when we were not supporting `IGNORE NULLS`. Now, we must check if we respect nulls with `doesRespectNulls()`. If we do, then the group result is indeed null and we can set it to `true`. Else if we are ignoring nulls, then we must check if lastValue was set by the previous batch or not. If it was set, then we must set `isGroupResultNull = false;` In short, when we are in the `else` block, the value in the column vector is indeed `null`. But we must check the `lastValue` because it could have been set by the previous batch. In the int test file, we have these three rows for id = 2: `(2, 20), (2, null), (2, null)`. Since we have `id` and `int_col` in the ORDER BY of `LAST_VALUE(int_col IGNORE NULLS) OVER(PARTITION BY id ORDER BY id, int_col) AS last_int` this would create two batches: ``` batch 1: [2, 20] batch 2: [[2, null], [2, null]] ``` as we are dealing with `RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`, i.e., both `(2, null)`s will be together. Also note that for batch 2, `longColVector.isRepeating = true`. While computing batch 2, we will go to the else block and we would have set the result to null before. Hope this clarifies the bug! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org