[GitHub] [hudi] mingujotemp opened a new issue #1909: [SUPPORT] "Failed to get update last commit time synced to 20200804071144"

GitBox Tue, 04 Aug 2020 00:20:05 -0700


mingujotemp opened a new issue #1909:
URL: https://github.com/apache/hudi/issues/1909



   **Describe the problem you faced**
   
   HUDI 0.5.0 (using on EMR) 
   
   I encounter `org.apache.hudi.hive.HoodieHiveSyncException: Failed to get 
update last commit time synced to 20200804071144` when I try to write a 
non-partitioned table on Glue(S3).
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. create a pyspark dataframe
   2. Write a new df by runnning with the following options
   ```
   hudi_options = {
     'hoodie.table.name': tableName,
     'hoodie.datasource.write.recordkey.field': 'id',
     'hoodie.index.type': 'BLOOM',
     'hoodie.datasource.write.partitionpath.field': '',
     'hoodie.datasource.write.keygenerator.class': 
'org.apache.hudi.NonpartitionedKeyGenerator',
     'hoodie.datasource.write.table.name': tableName,
     'hoodie.datasource.write.operation': 'upsert',
     'hoodie.datasource.write.precombine.field': 'updated_at',
     'hoodie.upsert.shuffle.parallelism': 2, 
     'hoodie.insert.shuffle.parallelism': 2,
     'hoodie.bulkinsert.shuffle.parallelism': 10,
     'hoodie.datasource.hive_sync.database': databaseName,
     'hoodie.datasource.hive_sync.table': tableName,
     'hoodie.datasource.hive_sync.enable': 'true',
     'hoodie.datasource.hive_sync.assume_date_partitioning': 'false',
     'hoodie.datasource.hive_sync.partition_extractor_class': 
'org.apache.hudi.hive.NonPartitionedExtractor',
     'hoodie.datasource.hive_sync.partition_fields': '',
   }
   df.write.format("org.apache.hudi"). \
     options(**hudi_options). \
     mode("overwrite"). \
     save(basePath)
   ```
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.5.0
   
   * Spark version : 2.4.4
   
   * Hive version : 3.1.2 (Using Glue)
   
   * Hadoop version : 3.2.1-amzn-0
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   using the following jars
   `/usr/lib/hudi/hudi-spark-bundle.jar`
   `/usr/lib/spark/external/lib/spark-avro.jar`
   installed on EMR 6.0.0
   
   **Stacktrace**
   
   ```
   20/08/04 07:11:50 WARN HiveConf: HiveConf of name hive.server2.thrift.url 
does not exist
   Traceback (most recent call last):
     File "<stdin>", line 5, in <module>
     File "/usr/lib/spark/python/pyspark/sql/readwriter.py", line 738, in save
       self._jwrite.save(path)
     File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", 
line 1257, in __call__
     File "/usr/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
       return f(*a, **kw)
     File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", 
line 328, in get_return_value
   py4j.protocol.Py4JJavaError: An error occurred while calling o273.save.
   : org.apache.hudi.hive.HoodieHiveSyncException: Failed to get update last 
commit time synced to 20200804071144
        at 
org.apache.hudi.hive.HoodieHiveClient.updateLastCommitTimeSynced(HoodieHiveClient.java:667)
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:109)
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:67)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:236)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:169)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
        at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:156)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:83)
        at 
org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:676)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:84)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:165)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
        at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
        at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.IllegalArgumentException: Can not create a Path from an 
empty string
        at org.apache.hadoop.fs.Path.checkPathArg(Path.java:172)
        at org.apache.hadoop.fs.Path.<init>(Path.java:184)
        at 
org.apache.hadoop.hive.metastore.Warehouse.getDatabasePath(Warehouse.java:172)
        at 
org.apache.hadoop.hive.metastore.Warehouse.getTablePath(Warehouse.java:184)
        at 
org.apache.hadoop.hive.metastore.Warehouse.getFileStatusesForUnpartitionedTable(Warehouse.java:520)
        at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.updateUnpartitionedTableStatsFast(MetaStoreUtils.java:180)
        at 
com.amazonaws.glue.shims.AwsGlueSparkHiveShims.updateTableStatsFast(AwsGlueSparkHiveShims.java:75)
        at 
com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.alterTable(GlueMetastoreClientDelegate.java:538)
        at 
com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.alter_table(AWSCatalogMetastoreClient.java:374)
        at 
com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.alter_table(AWSCatalogMetastoreClient.java:359)
        at 
org.apache.hudi.hive.HoodieHiveClient.updateLastCommitTimeSynced(HoodieHiveClient.java:665)
        ... 35 more
   ```
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] mingujotemp opened a new issue #1909: [SUPPORT] "Failed to get update last commit time synced to 20200804071144"

Reply via email to