dramaticlly commented on issue #7480:
URL: https://github.com/apache/iceberg/issues/7480#issuecomment-1544936003

   I dont think there's guarantee for keeping the API consistent between 
iceberg SparkAction and SparkProcedure. The Procedure can be exposed and used 
by client who's more familiar with SparkSQL interface while SparkAction provide 
more versatile capabilities to allow native integration in java or scala. 
   
   If you want to run multithreading delete in spark 3.1 actions, this is how 
it can be done below in scala/java
   
   ```scala
   import org.apache.iceberg.Table
   import org.apache.iceberg.actions.DeleteOrphanFiles
   import org.apache.iceberg.spark.actions.SparkActions
   import org.apache.spark.sql.SparkSession
   
   import java.util.concurrent.{Executors, TimeUnit}
   
   class RemoveOrphansAPI {
   
     def removeOrphansWithSparkAction(
         sparkSession: SparkSession,
         table: Table,
         threadsCount: Int,
         olderThanTS: Long
     ): DeleteOrphanFiles.Result = {
   
       val executor = Executors.newFixedThreadPool(threadsCount)
       val result: DeleteOrphanFiles.Result = SparkActions
         .get(sparkSession)
         .deleteOrphanFiles(table)
         .olderThan(olderThanTS)
         .executeDeleteWith(executor)
         .execute()
   
       executor.shutdown()
       result
     }
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to