jdattani opened a new issue, #5451:
URL: https://github.com/apache/hudi/issues/5451
**Describe the problem you faced**
Using DynamoDB as the lock provider for concurrent writes results in an
error stating java.lang.NoClassDefFoundError:
com/amazonaws/services/dynamodbv2/model/LockNotGrantedException
**To Reproduce**
Steps to reproduce the behaviour:
- Build Hudi from 0.10.1 source files
- Provide the following Hudi write options as part of a PySpark script:
'hoodie.write.concurrency.mode': 'optimistic_concurrency_control',
'hoodie.cleaner.policy.failed.writes': 'LAZY', 'hoodie.write.lock.provider':
'org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider',
'hoodie.write.lock.dynamodb.table': '<TABLE_NAME>',
'hoodie.write.lock.dynamodb.partition_key': '<KEY_NAME>'
**Expected behavior**
Job is able to acquire lock.
**Environment Description**
* Hudi version : 0.10.1
* Spark version : 3.1.2
* Hive version :
* Hadoop version :
* Storage (HDFS/S3/GCS..) : S3
* Running on Docker? (yes/no) : no
**Additional context**
Using on Glue 3.0. Dynamo DB table is already created manually and role
assigned to the job has all the permissions to operate on the table.
```
'hoodie.write.concurrency.mode': 'optimistic_concurrency_control',
'hoodie.cleaner.policy.failed.writes': 'LAZY',
'hoodie.write.lock.dynamodb.endpoint_url':
'dynamodb.us-east-1.amazonaws.com',
'hoodie.write.lock.provider':
'org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider',
'hoodie.write.lock.dynamodb.table': '<TABLE_NAME>',
'hoodie.write.lock.dynamodb.partition_key': '<KEY_NAME>',
'hoodie.write.lock.dynamodb.region': 'us-east-1',
```
Tried both with and without providing
"hoodie.write.lock.dynamodb.endpoint_url"
Jars included:
extra-jars/hudi-spark3.1.2-bundle_2.12-0.10.1.jar
extra-jars/spark-avro_2.12-3.1.2.jar
Job runs fine without concurrency mode configurations.
**Stacktrace**
```
2022-04-27 14:13:05,812 ERROR [main] glue.ProcessLauncher
(Logging.scala:logError(73)): Error from Python:Traceback (most recent call
last):
File "/tmp/glue_process_bundle.py", line 17, in <module>
start_process(glue_ctx, config, glue_catalog_svc)
File "/tmp/glue_process_bundle.zip/jobs/process.py", line 180, in
start_signal_process
load(final_df, config)
File "/tmp/glue_process_bundle.zip/jobs/process.py", line 99, in load
df.write.format("hudi").options(**hudi_options).mode("append").save(config.params.processed_bucket)
File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py",
line 1109, in save
self._jwrite.save(path)
File
"/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line
1305, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line
111, in deco
return f(*a, **kw)
File "/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py",
line 328, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o255.save.
: java.lang.NoClassDefFoundError:
com/amazonaws/services/dynamodbv2/model/LockNotGrantedException
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at
org.apache.hudi.common.util.ReflectionUtils.getClass(ReflectionUtils.java:54)
at
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:89)
at
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:100)
at
org.apache.hudi.client.transaction.lock.LockManager.getLockProvider(LockManager.java:91)
at
org.apache.hudi.client.transaction.lock.LockManager.unlock(LockManager.java:83)
at
org.apache.hudi.client.transaction.TransactionManager.endTransaction(TransactionManager.java:71)
at
org.apache.hudi.client.SparkRDDWriteClient.getTableAndInitCtx(SparkRDDWriteClient.java:445)
at
org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:157)
at
org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:217)
at
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:277)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
at
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
at
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
at
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181)
at
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
at
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
at
org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
at
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
at
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
at
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
at
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
at
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
at
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.ClassNotFoundException:
com.amazonaws.services.dynamodbv2.model.LockNotGrantedException
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 51 more
```
Since this is NoClassDefFoundError, was wondering if there are some
additional sdk jars that I need to include to use this functionality?
Thanks.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]