[GitHub] [hudi] Akshay2Agarwal opened a new issue #2913: [SUPPORT] Hudi + Hive Metastore Sync

GitBox Mon, 03 May 2021 04:37:56 -0700


Akshay2Agarwal opened a new issue #2913:
URL: https://github.com/apache/hudi/issues/2913



   Do we need hive server compulsorily for running hive sync as I tried to use 
metastore jdbc url in `hoodie.datasource.hive_sync.jdbcurl` as the mysql jdbc 
url. It gave syntax error in SQL statement when tried to sync hudi table.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.```spark-shell \
     --packages 
org.apache.hudi:hudi-spark-bundle_2.11:0.8.0,org.apache.spark:spark-avro_2.11:2.4.4
 \
     --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'```
   2. `val df = Seq(
     (1, 213213213, "2014/01/01"),
     (2, 343432434, "2014/11/30"),
     (3, 343242323, "2016/12/29"),
     (4, 344234242, "2016/05/09")
   ).toDF("typeId","eventTime","partition")`
   3. ```df.write.format("hudi").
     options(getQuickstartWriteConfigs).
     option(PRECOMBINE_FIELD_OPT_KEY, "eventTime").
     option(RECORDKEY_FIELD_OPT_KEY, "typeId").
     option(PARTITIONPATH_FIELD_OPT_KEY, "partition").
     option(HIVE_PARTITION_FIELDS_OPT_KEY, "partition").
     option(HIVE_STYLE_PARTITIONING_OPT_KEY, false).
     option(HIVE_SYNC_ENABLED_OPT_KEY, true).
     option(HIVE_TABLE_OPT_KEY, "hive_test_data").
     option(HIVE_USER_OPT_KEY, "hive").
     option(HIVE_PASS_OPT_KEY, "XXXXXXX").
     option(HIVE_URL_OPT_KEY, "jdbc:mysql://XXXXXXXX.compute.internal:3306").
     option(TABLE_NAME, "hudi_events_test").
     mode(Overwrite).
     save("s3a://XXXXXXXX/test-lake-data/hudi_events_test/")```
   
   **Expected behavior**
   
   I expected, it should sync hudi table to metastore
   
   **Environment Description**
   
   * Hudi version : 0.8.0
   
   * Spark version : 2.4.4
   
   * Hive version : 2
   
   * Hadoop version :2
   
   * Storage (HDFS/S3/GCS..) :S3
   
   * Running on Docker? (yes/no) :no
   
   
   **Stacktrace**
   
   ```Caused by: java.sql.SQLSyntaxErrorException: (conn=47) You have an error 
in your SQL syntax; check the manual that corresponds to your MySQL server 
version for the right syntax to use near 'EXTERNAL TABLE  IF NOT EXISTS 
`default`.`hive_test_data`( `_hoodie_commit_time` ' at line 1
     at 
org.mariadb.jdbc.internal.util.exceptions.ExceptionMapper.get(ExceptionMapper.java:243)
     at 
org.mariadb.jdbc.internal.util.exceptions.ExceptionMapper.getException(ExceptionMapper.java:164)
     at 
org.mariadb.jdbc.MariaDbStatement.executeExceptionEpilogue(MariaDbStatement.java:258)
     at 
org.mariadb.jdbc.MariaDbStatement.executeInternal(MariaDbStatement.java:349)
     at org.mariadb.jdbc.MariaDbStatement.execute(MariaDbStatement.java:484)
     at 
org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:367)
     ... 110 more
   Caused by: java.sql.SQLException: You have an error in your SQL syntax; 
check the manual that corresponds to your MySQL server version for the right 
syntax to use near 'EXTERNAL TABLE  IF NOT EXISTS `default`.`hive_test_data`( 
`_hoodie_commit_time` ' at line 1
   Query is: CREATE EXTERNAL TABLE  IF NOT EXISTS `default`.`hive_test_data`( 
`_hoodie_commit_time` string, `_hoodie_commit_seqno` string, 
`_hoodie_record_key` string, `_hoodie_partition_path` string, 
`_hoodie_file_name` string, `typeId` int, `eventTime` int) PARTITIONED BY 
(`partition` string) ROW FORMAT SERDE 
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS 
INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 
's3a://XXXXXXXXX/test-lake-data/hudi_events_test'
   java thread: main
     at 
org.mariadb.jdbc.internal.util.LogQueryTool.exceptionWithQuery(LogQueryTool.java:134)
     at 
org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.executeQuery(AbstractQueryProtocol.java:184)
     at 
org.mariadb.jdbc.MariaDbStatement.executeInternal(MariaDbStatement.java:343)
     ... 112 more```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] Akshay2Agarwal opened a new issue #2913: [SUPPORT] Hudi + Hive Metastore Sync

Reply via email to