zztttt opened a new issue #4072:
URL: https://github.com/apache/hudi/issues/4072


   **_Tips before filing an issue_**
   
   - Have you gone through our 
[FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)? yes
   
   - Join the mailing list to engage in conversations and get faster support at 
[email protected].
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   I want to run a sql command to create a table in apache hudi.  I can use the 
tool 'bin/spark-sql' to create the table successfully, also I can use 
spark.sql("...") to reproduce this process too. However, when I program with 
Scala2.11 to run spark.sql, I face the problem about "Exception in thread 
"main" java.io.FileNotFoundException: File does not exist: 
hdfs://localhost:9000/scala/table6".
   
   "Table6" is exactly what I want to create through this sql text, I can run 
this sql text successfully by two other methods mentioned above. But 
programming with Scala fails it, and the Exception is convincing. What 
configuration do I miss?
   
   **To Reproduce**
   this is all dependency I use to program with Scala2.11.
           <dependency>
               <groupId>org.apache.spark</groupId>
               <artifactId>spark-core_2.11</artifactId>
               <version>2.4.8</version>
           </dependency>
           <dependency>
               <groupId>org.apache.spark</groupId>
               <artifactId>spark-sql_2.11</artifactId>
               <version>2.4.8</version>
           </dependency>
           <dependency>
               <groupId>org.apache.hudi</groupId>
               <artifactId>hudi-spark-bundle_2.11</artifactId>
               <version>0.9.0</version>
           </dependency>
           <dependency>
               <groupId>org.apache.spark</groupId>
               <artifactId>spark-avro_2.11</artifactId>
               <version>2.4.4</version>
           </dependency>
   
   Steps to reproduce the behavior:
   
   1. create sparkSession by: 
       "val spark = SparkSession.builder
         .appName("spark sql")
         .config("spark.serializer", 
"org.apache.spark.serializer.KryoSerializer")
         .master("local[*]")
         .getOrCreate();"
   2. init sql text:
       var create = "create table if not exists table6 (id int, name string, 
price double) using hudi location 'hdfs://localhost:9000/scala/table6' options 
(type='mor', primaryKey='id')"
   3. run spark.sql(create)
   4. meet the exception mentioned above
   
   **Expected behavior**
   
   I want to create the table, just like I use 'bin/spark-sql' and 
'bin/spark-shell' with spark2.4.8 & hadoop2.7 & hudi 0.9.9 succussfully.
   
   **Environment Description**
   
   * Hudi version : 0.9.0
   
   * Spark version : 2.4.8
   
   * Hive version : None
   
   * Hadoop version : 2.7
   
   * Storage (HDFS/S3/GCS..) : HDFS
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   Exception in thread "main" java.io.FileNotFoundException: File does not 
exist: hdfs://localhost:9000/scala/table6
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1122)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
        at 
org.apache.hudi.common.util.TablePathUtils.getTablePath(TablePathUtils.java:50)
        at org.apache.hudi.DataSourceUtils.getTablePath(DataSourceUtils.java:74)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:101)
        at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
        at 
org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:78)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
        at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
        at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
        at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3369)
        at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
        at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withAction(Dataset.scala:3368)
        at org.apache.spark.sql.Dataset.<init>(Dataset.scala:194)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643)
        at SparkSql$.runCreateTable(SparkSql.scala:58)
        at SparkSql$.main(SparkSql.scala:22)
        at SparkSql.main(SparkSql.scala)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to