nochimow commented on issue #4456: URL: https://github.com/apache/hudi/issues/4456#issuecomment-1008846694
Hi there, my code basically reads some avro file into a dataframe then we write this dataframe into a hudi table. I'm using the following hudi confs during the write. (It's a python on AWS Glue 3.0) oodie.datasource.write.keygenerator.class": "org.apache.hudi.keygen.ComplexKeyGenerator", oodie.datasource.write.payload.class": "org.apache.hudi.common.model.DefaultHoodieRecordPayload", hoodie.datasource.hive_sync.partition_extractor_class": "org.apache.hudi.hive.MultiPartKeysValueExtractor", hoodie.table.name": table_name, hoodie.datasource.write.recordkey.field": IDX_COL, hoodie.datasource.write.partitionpath.field": pks, hoodie.datasource.write.hive_style_partitioning": "true", hoodie.datasource.write.precombine.field": tiebreaker, hoodie.datasource.write.operation": operation, hoodie.write.lock.provider=org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider hoodie.write.lock.dynamodb.table hoodie.write.lock.dynamodb.partition_key hoodie.write.lock.dynamodb.region hoodie.write.lock.dynamodb.billing_mode=PAY_PER_REQUEST My dynamodb is a simple table with just the partition_key field as a string. There is any recommendation on how the dynamodb structure have to be? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
