[GitHub] [hudi] vicuna96 opened a new issue, #5582: [SUPPORT]

GitBox Fri, 13 May 2022 18:08:59 -0700


vicuna96 opened a new issue, #5582:
URL: https://github.com/apache/hudi/issues/5582


   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
[email protected].
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   Hi team, we are getting a NullPointerException in trying to use a merge 
statement to update columns in a table which is saved in hive. We perform the 
initial load of the table using hive sync options, but we do not use these hive 
sync options in subsequent runs as this would lead to a 
`java.lang.NoSuchMethodError: 
org.apache.hadoop.hive.metastore.IMetaStoreClient.alter_table_with_environmentContext(Ljava/lang/String;Ljava/lang/String;Lorg/apache/hadoop/hive/metastore/api/Table;Lorg/apache/hadoop/hive/metastore/api/EnvironmentContext;)V
   `.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Created a table with hms hive sync using the following syntax
   ```
     sfoSubDF.write.format("hudi").
       options(hudiOptions).
       option(TABLE_TYPE.key(), "COPY_ON_WRITE").
       option(OPERATION.key(), "bulk_insert").
       option(KEYGENERATOR_CLASS_NAME.key(), 
"org.apache.hudi.keygen.ComplexKeyGenerator").
       option(PRECOMBINE_FIELD.key(), "PROCESSING_TS").
       option(RECORDKEY_FIELD.key(), "KEY1,KEY2").
       option(PARTITIONPATH_FIELD.key(), "PARTITION_DT").
       option(HIVE_STYLE_PARTITIONING.key(), "true").
       option(HIVE_SYNC_MODE.key(), "hms").
       option(HIVE_DATABASE.key(), database).
       option(HIVE_TABLE.key(), tableName).
       option(HIVE_SYNC_ENABLED.key(), "true").
       option(TBL_NAME.key(), tableName).
       mode(Overwrite).
       save(toPath)
   ```
   Then we attempt to run a partial update on top of the table, using the merge 
spark-sql syntax
   ```
    merge into $HIVE_DB.$datasetName as target
    using $sourceAliasOrder as source
    on ${getDefaultMergeCondition()}
    when matched and ${PTC.RECORD_TS} <> source.${PTC.RECORD_TS} then update set
     ${PTC.RECORD_TS} = source.${PTC.RECORD_TS}
   ```
   This automatically raises this null pointer exception, on a call to 
parametersWithWriteDefaults function as detailed in the stack trace included.
   
   **Expected behavior**
   
   We are expecting the partial update `merge into` statement to lead to an 
update of the corresponding columns in the base table. Note that since we are 
not able to use hive sync using hms on hudi (as described in 
https://github.com/apache/hudi/issues/4700) we would then run msck repair to 
update any necessary table metadata.
   
   **Environment Description**
   
   * Hudi version : 0.10.0
   
   * Spark version : 2.4.7
   
   * Hive version : 2.3.7
   
   * Hadoop version : 2.10.1
   
   * Storage (HDFS/S3/GCS..) : GCS
   
   * Running on Docker? (yes/no) : No
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   
   ```
   22/05/14 00:50:12 INFO org.apache.hudi.common.table.HoodieTableMetaClient: 
Loading HoodieTableMetaClient from 
gs://my_hudi_bucket/staging_zone/WorkflowPublish/orderTableTesting
   22/05/14 00:50:12 INFO org.apache.hudi.common.table.HoodieTableConfig: 
Loading table properties from 
gs://my_hudi_bucket/staging_zone/WorkflowPublish/orderTableTesting/.hoodie/hoodie.properties
   22/05/14 00:50:12 INFO org.apache.hudi.common.table.HoodieTableMetaClient: 
Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) 
from gs://my_hudi_bucket/staging_zone/WorkflowPublish/orderTableTesting
   22/05/14 00:50:12 INFO 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline: Loaded instants 
upto : Option{val=[20220513213440105__commit__COMPLETED]}
   22/05/14 00:50:12 WARN 
org.apache.hudi.common.config.DFSPropertiesConfiguration: Cannot find 
HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf
   22/05/14 00:50:12 WARN 
org.apache.hudi.common.config.DFSPropertiesConfiguration: Properties file 
file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file
   22/05/14 00:50:12 INFO org.apache.hudi.common.table.HoodieTableMetaClient: 
Loading HoodieTableMetaClient from 
gs://my_hudi_bucket/staging_zone/WorkflowPublish/orderTableTesting
   22/05/14 00:50:12 INFO org.apache.hudi.common.table.HoodieTableConfig: 
Loading table properties from 
gs://my_hudi_bucket/staging_zone/WorkflowPublish/orderTableTesting/.hoodie/hoodie.properties
   22/05/14 00:50:12 INFO org.apache.hudi.common.table.HoodieTableMetaClient: 
Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) 
from gs://my_hudi_bucket/staging_zone/WorkflowPublish/orderTableTesting
   22/05/14 00:50:12 WARN 
com.google.cloud.hadoop.fs.gcs.GoogleHadoopSyncableOutputStream: hflush(): 
No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* 
yet see flushed data for 
gs://opddev-dev-dpaas-phs-logs/history-server/spark-events/ghs-gif-streaming/application_1652454321869_0247_1.lz4.inprogress
   22/05/14 00:50:12 ERROR com.walmart.archetype.core.WorkFlowManager: 
Exception while running Some(WorkflowPublish) Exception = null
   22/05/14 00:50:12 ERROR org.apache.spark.deploy.yarn.ApplicationMaster: User 
class threw exception: java.lang.NullPointerException
   java.lang.NullPointerException
        at java.util.Hashtable.put(Hashtable.java:460)
        at java.util.Hashtable.putAll(Hashtable.java:524)
        at 
org.apache.hudi.HoodieWriterUtils$.parametersWithWriteDefaults(HoodieWriterUtils.scala:52)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.mergeParamsAndGetHoodieConfig(HoodieSparkSqlWriter.scala:722)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:91)
        at 
org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.executeUpsert(MergeIntoHoodieTableCommand.scala:285)
        at 
org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.run(MergeIntoHoodieTableCommand.scala:155)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
        at 
org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:194)
        at 
org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:3369)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:80)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
        at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3369)
        at org.apache.spark.sql.Dataset.<init>(Dataset.scala:194)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643)
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] vicuna96 opened a new issue, #5582: [SUPPORT]

Reply via email to