raghavendraD opened a new issue #3054:
URL: https://github.com/apache/iceberg/issues/3054


   Hi,
   
   RemoveOrphanFiles is working with only hadoop FS/IO and when run from local 
with hadoop catalog. when i try to run it for S3 files using glue catalog and 
from EMR. It throws the below error. i have tried with both iceberg 11,12 and 
also spark 3.0.1, spark 3.1.1 (all combinations) and also tried both the 
commands from Actions API and also from Spark Actions API. the result does not 
change.
   
   
Actions.forTable(table).removeOrphanFiles().olderThan(removeOrphanFilesOlderThan).execute();
   or
   
SparkActions.get().deleteOrphanFiles(table).olderThan(removeOrphanFilesOlderThan).execute();
   
   and the error is 
   
   21/08/31 05:40:36 ERROR RemoveOrphanFilesMaintenanceJob: Error in 
RemoveOrphanFilesMaintenanceJob - removeOrphanFilesOlderThanTimestamp, Illegal 
Arguments in table properties - Can't parse null value from table properties, 
tenant: tenantId1, table: lakehouse_database.mobiletest1, 
removeOrphanFilesOlderThan: 1630388136606, Status: Failed, Reason: {}.
   java.lang.IllegalArgumentException: Cannot find the metadata table for 
glue_catalog.lakehouse_database.mobiletest1 of type ALL_MANIFESTS
        at 
org.apache.iceberg.spark.SparkTableUtil.loadMetadataTable(SparkTableUtil.java:634)
        at 
org.apache.iceberg.spark.actions.BaseSparkAction.loadMetadataTable(BaseSparkAction.java:153)
        at 
org.apache.iceberg.spark.actions.BaseSparkAction.buildValidDataFileDF(BaseSparkAction.java:119)
        at 
org.apache.iceberg.spark.actions.BaseDeleteOrphanFilesSparkAction.doExecute(BaseDeleteOrphanFilesSparkAction.java:154)
        at 
org.apache.iceberg.spark.actions.BaseSparkAction.withJobGroupInfo(BaseSparkAction.java:99)
        at 
org.apache.iceberg.spark.actions.BaseDeleteOrphanFilesSparkAction.execute(BaseDeleteOrphanFilesSparkAction.java:141)
        at 
org.apache.iceberg.spark.actions.BaseDeleteOrphanFilesSparkAction.execute(BaseDeleteOrphanFilesSparkAction.java:76)
        at 
org.apache.iceberg.actions.RemoveOrphanFilesAction.execute(RemoveOrphanFilesAction.java:87)
        at 
com.salesforce.cdp.lakehouse.spark.tablemaintenance.job.RemoveOrphanFilesMaintenanceJob.removeOrphanFilesOlderThanTimestamp(RemoveOrphanFilesMaintenanceJob.java:273)
        at 
com.salesforce.cdp.lakehouse.spark.tablemaintenance.job.RemoveOrphanFilesMaintenanceJob.removeOrphanFiles(RemoveOrphanFilesMaintenanceJob.java:133)
        at 
com.salesforce.cdp.lakehouse.spark.tablemaintenance.job.RemoveOrphanFilesMaintenanceJob.maintain(RemoveOrphanFilesMaintenanceJob.java:58)
        at 
com.salesforce.cdp.lakehouse.spark.tablemaintenance.LakeHouseTableMaintenanceJob.run(LakeHouseTableMaintenanceJob.java:136)
        at 
com.salesforce.cdp.spark.core.job.SparkJob.submitAndRun(SparkJob.java:76)
        at 
com.salesforce.cdp.lakehouse.spark.tablemaintenance.LakeHouseTableMaintenanceJob.main(LakeHouseTableMaintenanceJob.java:236)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:735)
   
   and i tried sql version of remove orphan files too and faced below error
   
   sparkSession.sql("CALL 
glue_catalog.lakehouse_database.remove_orphan_files(table => 
'db.mobiletest1')").show();
   
   and the error is 
   
   Exception in thread "main" org.apache.iceberg.exceptions.RuntimeIOException: 
org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme 
"s3"
   at 
org.apache.iceberg.spark.actions.BaseDeleteOrphanFilesSparkAction.listDirRecursively(BaseDeleteOrphanFilesSparkAction.java:236)
   at 
org.apache.iceberg.spark.actions.BaseDeleteOrphanFilesSparkAction.buildActualFileDF(BaseDeleteOrphanFilesSparkAction.java:184)
   at 
org.apache.iceberg.spark.actions.BaseDeleteOrphanFilesSparkAction.doExecute(BaseDeleteOrphanFilesSparkAction.java:157)
   at 
org.apache.iceberg.spark.actions.BaseSparkAction.withJobGroupInfo(BaseSparkAction.java:99)
   at 
org.apache.iceberg.spark.actions.BaseDeleteOrphanFilesSparkAction.execute(BaseDeleteOrphanFilesSparkAction.java:141)
   at 
org.apache.iceberg.spark.actions.BaseDeleteOrphanFilesSparkAction.execute(BaseDeleteOrphanFilesSparkAction.java:76)
   at 
com.salesforce.cdp.lakehouse.spark.tablemaintenance.TestWriter.main(TestWriter.java:133)
   Caused by: org.apache.hadoop.fs.UnsupportedFileSystemException: No 
FileSystem for scheme "s3"
   at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3281)
   at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3301)
   at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
   at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
   at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
   at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
   at 
org.apache.iceberg.spark.actions.BaseDeleteOrphanFilesSparkAction.listDirRecursively(BaseDeleteOrphanFilesSparkAction.java:214)
   
   Is it something to do with my implementation or is it a bug with an iceberg? 
or am i missing something her? please help !
   
   Thanks,
   Raghu


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to