[GitHub] [hudi] cbomgit opened a new issue, #9481: [SUPPORT] Potentially Incorret or incomplete Documentation

via GitHub Fri, 18 Aug 2023 15:19:02 -0700


cbomgit opened a new issue, #9481:
URL: https://github.com/apache/hudi/issues/9481


   Hello,
   
   I am following the steps here to load data from an existing HUDI table using 
spark-sql shell. 
https://hudi.apache.org/docs/0.11.0/quick-start-guide#create-table
   
   Specifically, the section "Create Table for an existing Hudi Table" with the 
following tip:
   
   You don't need to specify schema and any properties except the partitioned 
columns if existed. Hudi can automatically recognize the schema and 
configurations.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Begin spark-sql shell with spark-sql --packages 
org.apache.hudi:hudi-spark3.2-bundle_2.12:0.11.0 --conf 
'spark.serializer=org.apache.spark.serializer.KryoSerializer' --conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' 
--conf 
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
   2. execute set hoodie.schema.on.read.enable=true;
   3. next run 
   `create table if not exists myTable location 's3://uri';`
   4. see error:
   ```
   2023-08-18 22:13:20,417 [WARN] (main) 
org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog: A Hive serde 
table will be created as there is no table provider specified. You can set 
spark.sql.legacy.createHiveTableByDefault to false so that native data source 
table will be created instead.
   2023-08-18 22:13:20,445 [WARN] (main) 
org.apache.hadoop.hive.ql.session.SessionState: METASTORE_FILTER_HOOK will be 
ignored, since hive.security.authorization.manager is set to instance of 
HiveAuthorizerFactory.
   Error in query: Unable to infer the schema. The schema specification is 
required to create the table `default`.`myTableThatDoesntExist`.
   ```
   
   Based on the documentation, I would expect the above command to successfully 
create my table.
   
   Further, if I specify my schema and properties like so:
   
   ```
   create table table (
     col1 string,
     col2 string,
     col3 string,
     col4 double,
     col5 string
   ) using hudi
   tblproperties (
     type = 'cow',
     primaryKey = 'col1,col2,col3',
     preCombineField = 'col4'
    )
   partitioned by (col4, col5)
   location 's3://uri/*/*/*';
   ```
   
   I receive this error:
   
   ```
   Error in query: Specified schema in create table statement is not equal to 
the table schema.You should not specify the schema for an exist table:
   ```
   
   I'm wondering what the exact steps are to load a table in the spark-sql 
shell with Hudi 11.0 on spark 3.2.1. Thank you.
   
   **Environment Description**
   
   * Hudi version : 11.0
   
   * Spark version : 3.2.1
   
   * Hive version : 3.1.3
   
   * Hadoop version : 3.2.1
   
   * Storage (HDFS/S3/GCS..) : s3
   
   * Running on Docker? (yes/no) : no 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] cbomgit opened a new issue, #9481: [SUPPORT] Potentially Incorret or incomplete Documentation

Reply via email to