dramaticlly commented on issue #7480:
URL: https://github.com/apache/iceberg/issues/7480#issuecomment-1544936003
I dont think there's guarantee for keeping the API consistent between
iceberg SparkAction and SparkProcedure. The Procedure can be exposed and used
by client who's more familiar with SparkSQL interface while SparkAction provide
more versatile capabilities to allow native integration in java or scala.
If you want to run multithreading delete in spark 3.1 actions, this is how
it can be done below in scala/java
```scala
import org.apache.iceberg.Table
import org.apache.iceberg.actions.DeleteOrphanFiles
import org.apache.iceberg.spark.actions.SparkActions
import org.apache.spark.sql.SparkSession
import java.util.concurrent.{Executors, TimeUnit}
class RemoveOrphansAPI {
def removeOrphansWithSparkAction(
sparkSession: SparkSession,
table: Table,
threadsCount: Int,
olderThanTS: Long
): DeleteOrphanFiles.Result = {
val executor = Executors.newFixedThreadPool(threadsCount)
val result: DeleteOrphanFiles.Result = SparkActions
.get(sparkSession)
.deleteOrphanFiles(table)
.olderThan(olderThanTS)
.executeDeleteWith(executor)
.execute()
executor.shutdown()
result
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]