[GitHub] [hudi] functicons opened a new issue, #6514: [SUPPORT] Creating table with SparkSQL fails with FileNotFoundException

GitBox Fri, 26 Aug 2022 17:20:43 -0700


functicons opened a new issue, #6514:
URL: https://github.com/apache/hudi/issues/6514


   **Describe the problem you faced**
   
   I'm trying to create a new table with SparkSQL in spark-shell:
   
   ```
   spark.sql("""create table test8(id int,name string) using hudi options 
(primaryKey='id', type='cow') LOCATION 'hdfs:///hudi/test8'""")
   ```
   
   The error is really confusing to me, why does Hudi expects the path to exist 
in advance?
   
   ```
   java.io.FileNotFoundException: File does not exist: hdfs:/hudi/test8
     at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1533)
     at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1526)
     at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
     at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1541)
     at 
org.apache.hudi.common.util.TablePathUtils.getTablePath(TablePathUtils.java:50)
     at org.apache.hudi.DataSourceUtils.getTablePath(DataSourceUtils.java:79)
     at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:94)
     at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
     at 
org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:78)
     at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
     at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
     at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
     at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
     at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
     at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3369)
     at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
     at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
     at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
     at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withAction(Dataset.scala:3368)
     at org.apache.spark.sql.Dataset.<init>(Dataset.scala:194)
     at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
     at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643)
     ... 45 elided
   ```
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   Spark 2.4.8, Scala 2.12, Hudi 2.12:0.11.1
   
   ```
   $ spark-shell --packages org.apache.hudi:hudi-spark-bundle_2.12:0.11.1 
--conf "spark.serializer=org.apache.spark.serializer.KryoSerializer"
   
   scala> spark.sql("""create table test9(id int,name string) using hudi 
options (primaryKey='id', type='cow') LOCATION 'hdfs:///hudi/test9'""")
   ivysettings.xml file not found in HIVE_HOME or 
HIVE_CONF_DIR,/etc/hive/conf.dist/ivysettings.xml will be used
   java.io.FileNotFoundException: File does not exist: hdfs:/hudi/test9
     at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1528)
     at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1521)
     at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
     at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1536)
     at 
org.apache.hudi.common.util.TablePathUtils.getTablePath(TablePathUtils.java:50)
     at org.apache.hudi.DataSourceUtils.getTablePath(DataSourceUtils.java:79)
     at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:94)
     at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
     at 
org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:78)
     at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
     at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
     at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
     at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:194)
     at org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:3369)
     at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:80)
     at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
     at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
     at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3369)
     at org.apache.spark.sql.Dataset.<init>(Dataset.scala:194)
     at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
     at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643)
     ... 49 elided
   ```
   
   **Expected behavior**
   
   The table creation should NOT expect the path to exist.
   
   **Environment Description**
   
   * Hudi version : org.apache.hudi:hudi-spark-bundle_2.12:0.11.1 
   
   * Spark version : 2.4.8 (Scala 2.12)
   
   * Hive version : 2.3.7
   
   * Hadoop version : 2.10.2
   
   * Storage (HDFS/S3/GCS..) : HDFS
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] functicons opened a new issue, #6514: [SUPPORT] Creating table with SparkSQL fails with FileNotFoundException

Reply via email to