functicons opened a new issue, #6514:
URL: https://github.com/apache/hudi/issues/6514
**Describe the problem you faced**
I'm trying to create a new table with SparkSQL in spark-shell:
```
spark.sql("""create table test8(id int,name string) using hudi options
(primaryKey='id', type='cow') LOCATION 'hdfs:///hudi/test8'""")
```
The error is really confusing to me, why does Hudi expects the path to exist
in advance?
```
java.io.FileNotFoundException: File does not exist: hdfs:/hudi/test8
at
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1533)
at
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1526)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1541)
at
org.apache.hudi.common.util.TablePathUtils.getTablePath(TablePathUtils.java:50)
at org.apache.hudi.DataSourceUtils.getTablePath(DataSourceUtils.java:79)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:94)
at
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
at
org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:78)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3369)
at
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
at
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withAction(Dataset.scala:3368)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:194)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643)
... 45 elided
```
**To Reproduce**
Steps to reproduce the behavior:
Spark 2.4.8, Scala 2.12, Hudi 2.12:0.11.1
```
$ spark-shell --packages org.apache.hudi:hudi-spark-bundle_2.12:0.11.1
--conf "spark.serializer=org.apache.spark.serializer.KryoSerializer"
scala> spark.sql("""create table test9(id int,name string) using hudi
options (primaryKey='id', type='cow') LOCATION 'hdfs:///hudi/test9'""")
ivysettings.xml file not found in HIVE_HOME or
HIVE_CONF_DIR,/etc/hive/conf.dist/ivysettings.xml will be used
java.io.FileNotFoundException: File does not exist: hdfs:/hudi/test9
at
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1528)
at
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1521)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1536)
at
org.apache.hudi.common.util.TablePathUtils.getTablePath(TablePathUtils.java:50)
at org.apache.hudi.DataSourceUtils.getTablePath(DataSourceUtils.java:79)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:94)
at
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
at
org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:78)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:194)
at org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:3369)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:80)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3369)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:194)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643)
... 49 elided
```
**Expected behavior**
The table creation should NOT expect the path to exist.
**Environment Description**
* Hudi version : org.apache.hudi:hudi-spark-bundle_2.12:0.11.1
* Spark version : 2.4.8 (Scala 2.12)
* Hive version : 2.3.7
* Hadoop version : 2.10.2
* Storage (HDFS/S3/GCS..) : HDFS
* Running on Docker? (yes/no) : no
**Additional context**
Add any other context about the problem here.
**Stacktrace**
```Add the stacktrace of the error.```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]