cbomgit opened a new issue, #9481: URL: https://github.com/apache/hudi/issues/9481
Hello, I am following the steps here to load data from an existing HUDI table using spark-sql shell. https://hudi.apache.org/docs/0.11.0/quick-start-guide#create-table Specifically, the section "Create Table for an existing Hudi Table" with the following tip: You don't need to specify schema and any properties except the partitioned columns if existed. Hudi can automatically recognize the schema and configurations. **To Reproduce** Steps to reproduce the behavior: 1. Begin spark-sql shell with spark-sql --packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.11.0 --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' --conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' --conf 'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog' 2. execute set hoodie.schema.on.read.enable=true; 3. next run `create table if not exists myTable location 's3://uri';` 4. see error: ``` 2023-08-18 22:13:20,417 [WARN] (main) org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog: A Hive serde table will be created as there is no table provider specified. You can set spark.sql.legacy.createHiveTableByDefault to false so that native data source table will be created instead. 2023-08-18 22:13:20,445 [WARN] (main) org.apache.hadoop.hive.ql.session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory. Error in query: Unable to infer the schema. The schema specification is required to create the table `default`.`myTableThatDoesntExist`. ``` Based on the documentation, I would expect the above command to successfully create my table. Further, if I specify my schema and properties like so: ``` create table table ( col1 string, col2 string, col3 string, col4 double, col5 string ) using hudi tblproperties ( type = 'cow', primaryKey = 'col1,col2,col3', preCombineField = 'col4' ) partitioned by (col4, col5) location 's3://uri/*/*/*'; ``` I receive this error: ``` Error in query: Specified schema in create table statement is not equal to the table schema.You should not specify the schema for an exist table: ``` I'm wondering what the exact steps are to load a table in the spark-sql shell with Hudi 11.0 on spark 3.2.1. Thank you. **Environment Description** * Hudi version : 11.0 * Spark version : 3.2.1 * Hive version : 3.1.3 * Hadoop version : 3.2.1 * Storage (HDFS/S3/GCS..) : s3 * Running on Docker? (yes/no) : no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
