lewyh opened a new issue #4904:
URL: https://github.com/apache/hudi/issues/4904


   **Describe the problem you faced**
   
   Using DynamoDB as the lock provider for concurrent writes results in an 
error if `hoodie.write.lock.dynamodb.endpoint_url` is not provided when using 
Hudi 0.10.1. 
   
   The documentation says this option is present from 0.11.0, and should be 
optional. Providing my region's DynamoDB endpoint as the option value works, 
but this behaviour is unexpected.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.  Build Hudi from 0.10.1 source files
   2. Provide the following Hudi write options as part of a PySpark script: 
`'hoodie.write.concurrency.mode': 'optimistic_concurrency_control', 
'hoodie.cleaner.policy.failed.writes': 'LAZY', 'hoodie.write.lock.provider': 
'org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider', 
'hoodie.write.lock.dynamodb.table': '<TABLE_NAME>', 
'hoodie.write.lock.dynamodb.partition_key': '<KEY_NAME>'`
   
   **Expected behavior**
   
   Table created in DynamoDB to provide locking functionality for concurrent 
writes.
   
   **Environment Description**
   
   * Hudi version : 0.10.1
   
   * Spark version : 3.1.1
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : No
   
   
   **Additional context**
   
   PySpark application running as a Glue ETL job. Once the appropriate endpoint 
URL is added to the options, the lock table is created as expected.
   
   **Stacktrace**
   
   ```
   org.apache.hudi.exception.HoodieException: Unable to instantiate class 
org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider
        at 
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:91)
        at 
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:100)
        at 
org.apache.hudi.client.transaction.lock.LockManager.getLockProvider(LockManager.java:91)
        at 
org.apache.hudi.client.transaction.lock.LockManager.unlock(LockManager.java:83)
        at 
org.apache.hudi.client.transaction.TransactionManager.endTransaction(TransactionManager.java:71)
        at 
org.apache.hudi.client.SparkRDDWriteClient.getTableAndInitCtx(SparkRDDWriteClient.java:445)
        at 
org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:157)
        at 
org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:217)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:277)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
        at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
        at 
org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
        at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
        at 
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
        at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
        at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
        at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
        at 
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at 
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:89)
        ... 47 more
   Caused by: java.lang.IllegalArgumentException: Property 
hoodie.write.lock.dynamodb.endpoint_url not found
        at 
org.apache.hudi.common.config.TypedProperties.checkKey(TypedProperties.java:48)
        at 
org.apache.hudi.common.config.TypedProperties.getString(TypedProperties.java:58)
        at 
org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.getDynamoDBClient(DynamoDBBasedLockProvider.java:159)
        at 
org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.<init>(DynamoDBBasedLockProvider.java:87)
        at 
org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.<init>(DynamoDBBasedLockProvider.java:77)
        ... 52 more
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to