jdattani opened a new issue, #5451:
URL: https://github.com/apache/hudi/issues/5451

   **Describe the problem you faced**
   
   Using DynamoDB as the lock provider for concurrent writes results in an 
error stating java.lang.NoClassDefFoundError: 
com/amazonaws/services/dynamodbv2/model/LockNotGrantedException
   
   **To Reproduce**
   
   Steps to reproduce the behaviour:
   
   - Build Hudi from 0.10.1 source files
   
   - Provide the following Hudi write options as part of a PySpark script: 
'hoodie.write.concurrency.mode': 'optimistic_concurrency_control', 
'hoodie.cleaner.policy.failed.writes': 'LAZY', 'hoodie.write.lock.provider': 
'org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider', 
'hoodie.write.lock.dynamodb.table': '<TABLE_NAME>', 
'hoodie.write.lock.dynamodb.partition_key': '<KEY_NAME>'
   
   
   **Expected behavior**
   
   Job is able to acquire lock.
   
   **Environment Description**
   
   * Hudi version : 0.10.1 
   
   * Spark version : 3.1.2
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   
   
   **Additional context**
   
   Using on Glue 3.0. Dynamo DB table is already created manually and role 
assigned to the job has all the permissions to operate on the table. 
   
   ```
   'hoodie.write.concurrency.mode': 'optimistic_concurrency_control',
   'hoodie.cleaner.policy.failed.writes': 'LAZY',
   'hoodie.write.lock.dynamodb.endpoint_url': 
'dynamodb.us-east-1.amazonaws.com',
   'hoodie.write.lock.provider': 
'org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider',
   'hoodie.write.lock.dynamodb.table': '<TABLE_NAME>',
   'hoodie.write.lock.dynamodb.partition_key': '<KEY_NAME>',
   'hoodie.write.lock.dynamodb.region': 'us-east-1',
   
   ```
   
   Tried both with and without providing 
"hoodie.write.lock.dynamodb.endpoint_url"
   
   Jars included:
   
   extra-jars/hudi-spark3.1.2-bundle_2.12-0.10.1.jar
   extra-jars/spark-avro_2.12-3.1.2.jar
   
   Job runs fine without concurrency mode configurations.
   
   **Stacktrace**
   
   ```
   2022-04-27 14:13:05,812 ERROR [main] glue.ProcessLauncher 
(Logging.scala:logError(73)): Error from Python:Traceback (most recent call 
last):
     File "/tmp/glue_process_bundle.py", line 17, in <module>
       start_process(glue_ctx, config, glue_catalog_svc)
     File "/tmp/glue_process_bundle.zip/jobs/process.py", line 180, in 
start_signal_process
       load(final_df, config)
     File "/tmp/glue_process_bundle.zip/jobs/process.py", line 99, in load
       
df.write.format("hudi").options(**hudi_options).mode("append").save(config.params.processed_bucket)
     File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", 
line 1109, in save
       self._jwrite.save(path)
     File 
"/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 
1305, in __call__
       answer, self.gateway_client, self.target_id, self.name)
     File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 
111, in deco
       return f(*a, **kw)
     File "/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", 
line 328, in get_return_value
       format(target_id, ".", name), value)
   py4j.protocol.Py4JJavaError: An error occurred while calling o255.save.
   : java.lang.NoClassDefFoundError: 
com/amazonaws/services/dynamodbv2/model/LockNotGrantedException
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:264)
        at 
org.apache.hudi.common.util.ReflectionUtils.getClass(ReflectionUtils.java:54)
        at 
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:89)
        at 
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:100)
        at 
org.apache.hudi.client.transaction.lock.LockManager.getLockProvider(LockManager.java:91)
        at 
org.apache.hudi.client.transaction.lock.LockManager.unlock(LockManager.java:83)
        at 
org.apache.hudi.client.transaction.TransactionManager.endTransaction(TransactionManager.java:71)
        at 
org.apache.hudi.client.SparkRDDWriteClient.getTableAndInitCtx(SparkRDDWriteClient.java:445)
        at 
org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:157)
        at 
org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:217)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:277)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
        at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
        at 
org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
        at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
        at 
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
        at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
        at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
        at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
        at 
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:750)
   Caused by: java.lang.ClassNotFoundException: 
com.amazonaws.services.dynamodbv2.model.LockNotGrantedException
        at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 51 more
   ```
   
   Since this is NoClassDefFoundError, was wondering if there are some 
additional sdk jars that I need to include to use this functionality?
   
   Thanks.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to