rahil-c commented on PR #17904:
URL: https://github.com/apache/hudi/pull/17904#issuecomment-3769559405

   Was able to dig further on this memory leak issue, which helped expose some 
bug in the existing code as well.
   
   > Caused by: java.lang.IllegalStateException: Memory was leaked by query. 
Memory leaked: (64)
   > 
Allocator(hudi-arrow-SparkLanceReaderBase-data-/var/folders/lm/0j1q1s_n09b4wgqkdqbzpbkm0000gn/T/junit-9822325871446255147/dataset/test_lance_schema_evolution_copy_on_write/19f34eb0-e71d-4b91-b08f-b2fd95e266dc-0_0-16-19_20260116111900287.lance)
 0/64/128/125829120 (res/actual/peak/limit)
   > 
   >    at org.apache.arrow.memory.BaseAllocator.close(BaseAllocator.java:504)
   >    at 
org.apache.spark.sql.execution.datasources.lance.SparkLanceReaderBase.read(SparkLanceReaderBase.scala:141)
   >    at 
org.apache.hudi.SparkFileFormatInternalRowReaderContext.getFileRecordIterator(SparkFileFormatInternalRowReaderContext.scala:105)
   >    at 
org.apache.hudi.common.engine.HoodieReaderContext.getFileRecordIterator(HoodieReaderContext.java:273)
   >    at 
org.apache.hudi.common.table.read.HoodieFileGroupReader.makeBaseFileIterator(HoodieFileGroupReader.java:157)
   >    at 
org.apache.hudi.common.table.read.HoodieFileGroupReader.initRecordIterators(HoodieFileGroupReader.java:129)
   >    at 
org.apache.hudi.common.table.read.HoodieFileGroupReader.getBufferedRecordIterator(HoodieFileGroupReader.java:292)
   >    at 
org.apache.hudi.common.table.read.HoodieFileGroupReader.getClosableHoodieRecordIterator(HoodieFileGroupReader.java:308)
   >    at 
org.apache.hudi.io.FileGroupReaderBasedMergeHandle.doMerge(FileGroupReaderBasedMergeHandle.java:269)
   >    at org.apache.hudi.io.IOUtils.runMerge(IOUtils.java:120)
   >    at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:392)
   >    at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:358)
   >    ... 35 more
   > ```
   > 
   > After I explicitly provided a `null` valuie for fields that did not exist 
in the row the `testSchemaEvolutionAddColumn` would pass. See 
`LanceFileFormatHelper#generateUnsafeProjection`.
   
   Currently if an exception occurs (such as not padding with null), the actual 
true exception would occur in `LanceFileFormatHelper#generateUnsafeProjection` 
would be the following:
    ```
   Caused by: scala.MatchError: None (of class scala.None$)
        at 
org.apache.spark.sql.execution.datasources.lance.LanceFileFormatHelper$.$anonfun$generateUnsafeProjection$2(LanceFileFormatHelper.scala:135)
        at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
        at 
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
        at 
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
        at scala.collection.TraversableLike.map(TraversableLike.scala:286)
        at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
        at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
        at 
org.apache.spark.sql.execution.datasources.lance.LanceFileFormatHelper$.generateUnsafeProjection(LanceFileFormatHelper.scala:134)
        at 
org.apache.spark.sql.execution.datasources.lance.LanceBasicSchemaEvolution.generateUnsafeProjection(LanceBasicSchemaEvolution.scala:68)
        at 
org.apache.spark.sql.execution.datasources.lance.SparkLanceReaderBase.read(SparkLanceReaderBase.scala:131)
        ... 45 more
   ```
   
   However this exception would end up getting suppressed due to the try catch 
logic we have.
   
   1. The generateUnsafeProjection in the `try` would fail 
https://github.com/apache/hudi/pull/17904/files#diff-bdccaaaeb061abdf550efec86661f9d3790c66d53e04b1ed2e9cf9a61ea06e13R126
   
   2. And then would enter this catch block 
https://github.com/apache/hudi/pull/17904/files#diff-bdccaaaeb061abdf550efec86661f9d3790c66d53e04b1ed2e9cf9a61ea06e13R141
 which is currently catching all exceptions with `Exception`.
   
   Now coming to why we are still getting a  memory leak exception instead of 
eventually printing out the original exception stacktrace here 
https://github.com/apache/hudi/pull/17904/files#diff-bdccaaaeb061abdf550efec86661f9d3790c66d53e04b1ed2e9cf9a61ea06e13R142
   
   This is due to the fact that we prematurely closing the `allocator.close()` 
https://github.com/apache/hudi/pull/17904/files#diff-bdccaaaeb061abdf550efec86661f9d3790c66d53e04b1ed2e9cf9a61ea06e13L135
   
   even though technically the life cycle of the allocator should handled by 
the `LanceRecordIterator#close` 
https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/LanceRecordIterator.java#L173.
    
   Since the `LanceRecordIterator` still has reference to the `allocator` when 
we close prematurely ourselves in this line 
https://github.com/apache/hudi/pull/17904/files#diff-bdccaaaeb061abdf550efec86661f9d3790c66d53e04b1ed2e9cf9a61ea06e13R141
 it ends up throwing that memory leak exception. 
   
   I have a fix for this flow that will try to put out for closing resources 
correctly when exceptions occur.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to