kbendick opened a new issue #3453: URL: https://github.com/apache/iceberg/issues/3453
While testing the 0.12.1 release candidate, I tried running the `add_files` procedure with an ORC table. I ran it once, so the files were imported. I then dropped the table, created a new table, and tried to reimport from the same path. I got the following `NumberFormatException` when trying to parse the number of imported files. ``` scala> spark.sql("CALL hive.system.add_files(table => 'hive.default.test2', source_table => '`orc`.`hdfs://hdfs-box:8020/user/hive/warehouse/orc`')").show java.lang.NumberFormatException: null at java.lang.Long.parseLong(Long.java:552) at java.lang.Long.parseLong(Long.java:631) at org.apache.iceberg.spark.procedures.AddFilesProcedure.lambda$importToIceberg$1(AddFilesProcedure.java:135) at org.apache.iceberg.spark.procedures.BaseProcedure.execute(BaseProcedure.java:85) at org.apache.iceberg.spark.procedures.BaseProcedure.modifyIcebergTable(BaseProcedure.java:74) at org.apache.iceberg.spark.procedures.AddFilesProcedure.importToIceberg(AddFilesProcedure.java:121) at org.apache.iceberg.spark.procedures.AddFilesProcedure.call(AddFilesProcedure.java:108) at org.apache.spark.sql.execution.datasources.v2.CallExec.run(CallExec.scala:33) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:40) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:40) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:46) at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228) at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3685) at org.apache.spark.sql.Dataset.<init>(Dataset.scala:228) at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96) at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613) ... 47 elided ``` The path in the HDFS existed, but there were no files there, so `null` seems to have been returned as a result. This seems to still exist in current Iceberg: https://github.com/apache/iceberg/blob/1c158df94bf004d43867939841e75e4bfb941c16/spark/v3.1/spark/src/main/java/org/apache/iceberg/spark/procedures/AddFilesProcedure.java#L144 We should fail early if there are no files present in the path based directory (or at least say number of files is zero). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org