[PR] Core: Track & close FileIO used for remote scan planning [iceberg]

via GitHub Wed, 25 Feb 2026 02:18:43 -0800


nastra opened a new pull request, #15439:
URL: https://github.com/apache/iceberg/pull/15439


   This uses a similar approach to what we do in the `RESTSessionCatalog` with 
the `FileIOTracker` by wrapping the `RESTTableScan` in a `WeakReference` and 
close the attached `FileIO` instance when the `RESTTableScan` object is 
garbage-collected. 
   I have verified that this works in combination of 
https://github.com/apache/iceberg/pull/15368, where the `fileIOForPlanId` 
wasn't closed properly without this fix here as can be seen below:
   ```
   6/02/25 09:25:19 WARN ResolvingFileIO: Unclosed ResolvingFileIO instance 
created by:
        org.apache.iceberg.io.ResolvingFileIO.<init>(ResolvingFileIO.java:85)
        
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)
        
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
        
java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        
java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
        
java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
        
org.apache.iceberg.common.DynConstructors$Ctor.newInstanceChecked(DynConstructors.java:51)
        
org.apache.iceberg.common.DynConstructors$Ctor.newInstance(DynConstructors.java:64)
        org.apache.iceberg.CatalogUtil.loadFileIO(CatalogUtil.java:401)
        
org.apache.iceberg.rest.RESTTableScan.fileIOForPlanId(RESTTableScan.java:202)
        
org.apache.iceberg.rest.RESTTableScan.planTableScan(RESTTableScan.java:180)
        org.apache.iceberg.rest.RESTTableScan.planFiles(RESTTableScan.java:163)
        org.apache.iceberg.BatchScanAdapter.planFiles(BatchScanAdapter.java:125)
        
org.apache.iceberg.spark.source.SparkPartitioningAwareScan.tasks(SparkPartitioningAwareScan.java:185)
        
org.apache.iceberg.spark.source.SparkPartitioningAwareScan.taskGroups(SparkPartitioningAwareScan.java:213)
        
org.apache.iceberg.spark.source.SparkPartitioningAwareScan.outputPartitioning(SparkPartitioningAwareScan.java:115)
        
org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioningAndOrdering$$anonfun$partitioning$1.applyOrElse(V2ScanPartitioningAndOrdering.scala:45)
        
org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioningAndOrdering$$anonfun$partitioning$1.applyOrElse(V2ScanPartitioningAndOrdering.scala:43)
        
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:491)
   ```
   
   I'm currently checking to see how to properly test this in 
`TestRESTScanPlanning`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Core: Track & close FileIO used for remote scan planning [iceberg]

Reply via email to