Copilot commented on code in PR #1738:
URL: https://github.com/apache/auron/pull/1738#discussion_r2612868348


##########
spark-extension/src/main/scala/org/apache/spark/sql/auron/SparkUDTFWrapperContext.scala:
##########
@@ -149,13 +149,17 @@ case class SparkUDTFWrapperContext(serialized: 
ByteBuffer) extends Logging {
         // evaluate expression and write to output root
         val reusedOutputRow = new GenericInternalRow(Array[Any](null, null))
         val outputWriter = ArrowWriter.create(outputRoot)
-
+        val lastRowIdx = if (currentRowIdx > 0) {
+          currentRowIdx - 1
+        } else {
+          currentRowIdx
+        }

Review Comment:
   Consider adding a comment to explain that `lastRowIdx` represents the index 
of the last input row processed in `evalLoop`, and why it's computed as 
`currentRowIdx - 1`. This would make the code more maintainable and clarify the 
edge case behavior when `currentRowIdx == 0`.



##########
spark-extension/src/main/scala/org/apache/spark/sql/auron/SparkUDTFWrapperContext.scala:
##########
@@ -149,13 +149,17 @@ case class SparkUDTFWrapperContext(serialized: 
ByteBuffer) extends Logging {
         // evaluate expression and write to output root
         val reusedOutputRow = new GenericInternalRow(Array[Any](null, null))
         val outputWriter = ArrowWriter.create(outputRoot)
-
+        val lastRowIdx = if (currentRowIdx > 0) {
+          currentRowIdx - 1
+        } else {
+          currentRowIdx

Review Comment:
   When `currentRowIdx` is 0, it indicates no input rows have been processed in 
`evalLoop`, yet this code assigns `lastRowIdx = 0`, suggesting row 0 was 
processed. This could lead to semantically incorrect row IDs in the output. 
Consider using -1 or another sentinel value to indicate that no input rows were 
associated with these terminate output rows, or document why 0 is the correct 
value in this edge case.
   ```suggestion
             -1 // Use -1 as a sentinel to indicate no input rows were processed
   ```



##########
spark-extension/src/main/scala/org/apache/spark/sql/auron/SparkUDTFWrapperContext.scala:
##########
@@ -149,13 +149,17 @@ case class SparkUDTFWrapperContext(serialized: 
ByteBuffer) extends Logging {
         // evaluate expression and write to output root
         val reusedOutputRow = new GenericInternalRow(Array[Any](null, null))
         val outputWriter = ArrowWriter.create(outputRoot)
-
+        val lastRowIdx = if (currentRowIdx > 0) {
+          currentRowIdx - 1
+        } else {
+          currentRowIdx
+        }
         while (allocator.getAllocatedMemory < maxBatchMemorySize
           && outputWriter.currentCount < batchSize
           && terminateIter.hasNext) {
 
           val outputRow = terminateIter.next()
-          reusedOutputRow.setInt(0, currentRowIdx)
+          reusedOutputRow.setInt(0, lastRowIdx)

Review Comment:
   This fix addresses an index out of bounds error, but there's no test 
coverage for the UDTF functionality. Consider adding tests that verify: 1) the 
correct row IDs are assigned in `terminateLoop`, 2) the edge case when 
`currentRowIdx` is 0, and 3) the original bug scenario with `GenericUDTFCount2` 
from the PR description to prevent regression.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to