viirya opened a new issue, #1059: URL: https://github.com/apache/datafusion-comet/issues/1059
### Describe the bug This was found during debugging CI failures of #1050. One example of failed test is `date_add with int scalars` in `CometExpressionSuite`. The query is `"SELECT _20 + CAST(2 as $intType) from tbl` which has simply a CometScan + CometProject + Spark ColumnarToRowExec. CometProject (i.e., DataFusion ProjectExec) doesn't store arrays internally. The only possibility that fails the safety check is that the arrays are not released before we fill next values into the CometBuffers. In Spark `ColumnarToRowExec`, once it pulls out all rows from current ColumnarBatch, it simply assigns it to null to release the JVM object, but `close` is never called on the batch object to release vector resources (e.g., for Comet, Arrow arrays). It is more complicated than just add a `close` call there because Spark uses WritableColumnVector there for some components (e.g., Parquet reader). Once `close` is called on a WritableColumnVector, it will make the vector "not" writable anymore. To completely fix it, we need some changes in Spark. I did a quick experiment locally in Spark and verified that if a `close` is properly called on non WritableColumnVector there, failed tests can pass without failing the safety check. ### Steps to reproduce _No response_ ### Expected behavior _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org