rahil-c commented on PR #17904:
URL: https://github.com/apache/hudi/pull/17904#issuecomment-3769559405
Was able to dig further on this memory leak issue, which helped expose some
bug in the existing code as well.
> Caused by: java.lang.IllegalStateException: Memory was leaked by query.
Memory leaked: (64)
>
Allocator(hudi-arrow-SparkLanceReaderBase-data-/var/folders/lm/0j1q1s_n09b4wgqkdqbzpbkm0000gn/T/junit-9822325871446255147/dataset/test_lance_schema_evolution_copy_on_write/19f34eb0-e71d-4b91-b08f-b2fd95e266dc-0_0-16-19_20260116111900287.lance)
0/64/128/125829120 (res/actual/peak/limit)
>
> at org.apache.arrow.memory.BaseAllocator.close(BaseAllocator.java:504)
> at
org.apache.spark.sql.execution.datasources.lance.SparkLanceReaderBase.read(SparkLanceReaderBase.scala:141)
> at
org.apache.hudi.SparkFileFormatInternalRowReaderContext.getFileRecordIterator(SparkFileFormatInternalRowReaderContext.scala:105)
> at
org.apache.hudi.common.engine.HoodieReaderContext.getFileRecordIterator(HoodieReaderContext.java:273)
> at
org.apache.hudi.common.table.read.HoodieFileGroupReader.makeBaseFileIterator(HoodieFileGroupReader.java:157)
> at
org.apache.hudi.common.table.read.HoodieFileGroupReader.initRecordIterators(HoodieFileGroupReader.java:129)
> at
org.apache.hudi.common.table.read.HoodieFileGroupReader.getBufferedRecordIterator(HoodieFileGroupReader.java:292)
> at
org.apache.hudi.common.table.read.HoodieFileGroupReader.getClosableHoodieRecordIterator(HoodieFileGroupReader.java:308)
> at
org.apache.hudi.io.FileGroupReaderBasedMergeHandle.doMerge(FileGroupReaderBasedMergeHandle.java:269)
> at org.apache.hudi.io.IOUtils.runMerge(IOUtils.java:120)
> at
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:392)
> at
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:358)
> ... 35 more
> ```
>
> After I explicitly provided a `null` valuie for fields that did not exist
in the row the `testSchemaEvolutionAddColumn` would pass. See
`LanceFileFormatHelper#generateUnsafeProjection`.
Currently if an exception occurs (such as not padding with null), the actual
true exception would occur in `LanceFileFormatHelper#generateUnsafeProjection`
would be the following:
```
Caused by: scala.MatchError: None (of class scala.None$)
at
org.apache.spark.sql.execution.datasources.lance.LanceFileFormatHelper$.$anonfun$generateUnsafeProjection$2(LanceFileFormatHelper.scala:135)
at
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
at
org.apache.spark.sql.execution.datasources.lance.LanceFileFormatHelper$.generateUnsafeProjection(LanceFileFormatHelper.scala:134)
at
org.apache.spark.sql.execution.datasources.lance.LanceBasicSchemaEvolution.generateUnsafeProjection(LanceBasicSchemaEvolution.scala:68)
at
org.apache.spark.sql.execution.datasources.lance.SparkLanceReaderBase.read(SparkLanceReaderBase.scala:131)
... 45 more
```
However this exception would end up getting suppressed due to the try catch
logic we have.
1. The generateUnsafeProjection in the `try` would fail
https://github.com/apache/hudi/pull/17904/files#diff-bdccaaaeb061abdf550efec86661f9d3790c66d53e04b1ed2e9cf9a61ea06e13R126
2. And then would enter this catch block
https://github.com/apache/hudi/pull/17904/files#diff-bdccaaaeb061abdf550efec86661f9d3790c66d53e04b1ed2e9cf9a61ea06e13R141
which is currently catching all exceptions with `Exception`.
Now coming to why we are still getting a memory leak exception instead of
eventually printing out the original exception stacktrace here
https://github.com/apache/hudi/pull/17904/files#diff-bdccaaaeb061abdf550efec86661f9d3790c66d53e04b1ed2e9cf9a61ea06e13R142
This is due to the fact that we prematurely closing the `allocator.close()`
https://github.com/apache/hudi/pull/17904/files#diff-bdccaaaeb061abdf550efec86661f9d3790c66d53e04b1ed2e9cf9a61ea06e13L135
even though technically the life cycle of the allocator should handled by
the `LanceRecordIterator#close`
https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/LanceRecordIterator.java#L173.
Since the `LanceRecordIterator` still has reference to the `allocator` when
we close prematurely ourselves in this line
https://github.com/apache/hudi/pull/17904/files#diff-bdccaaaeb061abdf550efec86661f9d3790c66d53e04b1ed2e9cf9a61ea06e13R141
it ends up throwing that memory leak exception.
I have a fix for this flow that will try to put out for closing resources
correctly when exceptions occur.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]