Copilot commented on code in PR #1738:
URL: https://github.com/apache/auron/pull/1738#discussion_r2612868348
##########
spark-extension/src/main/scala/org/apache/spark/sql/auron/SparkUDTFWrapperContext.scala:
##########
@@ -149,13 +149,17 @@ case class SparkUDTFWrapperContext(serialized:
ByteBuffer) extends Logging {
// evaluate expression and write to output root
val reusedOutputRow = new GenericInternalRow(Array[Any](null, null))
val outputWriter = ArrowWriter.create(outputRoot)
-
+ val lastRowIdx = if (currentRowIdx > 0) {
+ currentRowIdx - 1
+ } else {
+ currentRowIdx
+ }
Review Comment:
Consider adding a comment to explain that `lastRowIdx` represents the index
of the last input row processed in `evalLoop`, and why it's computed as
`currentRowIdx - 1`. This would make the code more maintainable and clarify the
edge case behavior when `currentRowIdx == 0`.
##########
spark-extension/src/main/scala/org/apache/spark/sql/auron/SparkUDTFWrapperContext.scala:
##########
@@ -149,13 +149,17 @@ case class SparkUDTFWrapperContext(serialized:
ByteBuffer) extends Logging {
// evaluate expression and write to output root
val reusedOutputRow = new GenericInternalRow(Array[Any](null, null))
val outputWriter = ArrowWriter.create(outputRoot)
-
+ val lastRowIdx = if (currentRowIdx > 0) {
+ currentRowIdx - 1
+ } else {
+ currentRowIdx
Review Comment:
When `currentRowIdx` is 0, it indicates no input rows have been processed in
`evalLoop`, yet this code assigns `lastRowIdx = 0`, suggesting row 0 was
processed. This could lead to semantically incorrect row IDs in the output.
Consider using -1 or another sentinel value to indicate that no input rows were
associated with these terminate output rows, or document why 0 is the correct
value in this edge case.
```suggestion
-1 // Use -1 as a sentinel to indicate no input rows were processed
```
##########
spark-extension/src/main/scala/org/apache/spark/sql/auron/SparkUDTFWrapperContext.scala:
##########
@@ -149,13 +149,17 @@ case class SparkUDTFWrapperContext(serialized:
ByteBuffer) extends Logging {
// evaluate expression and write to output root
val reusedOutputRow = new GenericInternalRow(Array[Any](null, null))
val outputWriter = ArrowWriter.create(outputRoot)
-
+ val lastRowIdx = if (currentRowIdx > 0) {
+ currentRowIdx - 1
+ } else {
+ currentRowIdx
+ }
while (allocator.getAllocatedMemory < maxBatchMemorySize
&& outputWriter.currentCount < batchSize
&& terminateIter.hasNext) {
val outputRow = terminateIter.next()
- reusedOutputRow.setInt(0, currentRowIdx)
+ reusedOutputRow.setInt(0, lastRowIdx)
Review Comment:
This fix addresses an index out of bounds error, but there's no test
coverage for the UDTF functionality. Consider adding tests that verify: 1) the
correct row IDs are assigned in `terminateLoop`, 2) the edge case when
`currentRowIdx` is 0, and 3) the original bug scenario with `GenericUDTFCount2`
from the PR description to prevent regression.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]