[GitHub] [hudi] pengzhiwei2018 edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

GitBox Wed, 28 Apr 2021 19:05:32 -0700


pengzhiwei2018 edited a comment on pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#issuecomment-828893245



   > @pengzhiwei2018 could we make the spark-shell experience better? I think 
we need the extensions added by default when the jar is pulled in?
   > 
   > ```scala
   > $ spark-shell --jars $HUDI_SPARK_BUNDLE --conf 
'spark.serializer=org.apache.spark.serializer.KryoSerializer'
   > 
   > scala> spark.sql("create table t1 (id int, name string, price double, ts 
long) using hudi options(primaryKey= 'id', preCombineField = 'ts')").show 
   > t, returning NoSuchObjectException
   > org.apache.hudi.exception.HoodieException: 'path' or 
'hoodie.datasource.read.paths' or both must be specified.
   >   at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:77)
   >   at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:337)
   >   at 
org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:78)
   >   at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   >   at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   >   at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
   >   at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229)
   >   at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3616)
   >   at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
   >   at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
   >   at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
   >   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
   >   at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
   >   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3614)
   >   at org.apache.spark.sql.Dataset.<init>(Dataset.scala:229)
   >   at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
   >   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
   >   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
   >   at 
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:606)
   >   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
   >   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:601)
   > ```
   
   Hi @vinothchandar , you can test this by the following command
   
   - Using spark-sql
   
   > spark-sql --jars $HUDI_SPARK_BUNDLE \\
   --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'  \\
   --conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
   
   - Using spark-shell
   
   > spark-shell --jars $HUDI_SPARK_BUNDLE \\
   --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'  \\
   --conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
   
   
   just set the `spark.sql.extensions` to 
`org.apache.spark.sql.hudi.HoodieSparkSessionExtension`.
   Thanks~
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] pengzhiwei2018 edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

Reply via email to