pushpavanthar commented on issue #8614: URL: https://github.com/apache/hudi/issues/8614#issuecomment-1772585088
@danny0405 we are facing same issue in Hudi version 0.13.1 and spark version 3.2.1 and 3.3.2. Below is the command we use to run, Same command used to work fine with Hudi 0.11.1. `spark-submit --master yarn --packages org.apache.spark:spark-avro_2.12:3.2.1,org.apache.hudi:hudi-utilities-bundle_2.12:0.13.1,org.apache.hudi:hudi-spark3.2-bundle_2.12:0.13.1,org.apache.hudi:hudi-aws-bundle:0.13.1 --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer --conf spark.executor.cores=5 --conf spark.driver.memory=3200m --conf spark.driver.memoryOverhead=800m --conf spark.executor.memoryOverhead=1400m --conf spark.executor.memory=14600m --conf spark.dynamicAllocation.enabled=true --conf spark.dynamicAllocation.initialExecutors=1 --conf spark.dynamicAllocation.minExecutors=1 --conf spark.dynamicAllocation.maxExecutors=21 --conf spark.scheduler.mode=FAIR --conf spark.task.maxFailures=5 --conf spark.rdd.compress=true --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.shuffle.service.enabled=true --conf spark.sql.hive.convertMetastoreParquet=false --conf spark.yarn.max.executor.failures=5 --conf spark.driver.userClassPathFirst=true --conf spark.executor.userClassPathFirst=true --conf spark.sql.catalogImplementation=hive --deploy-mode client s3://bucket_name/custom_jar-2.0.jar --hoodie-conf hoodie.parquet.compression.codec=snappy --hoodie-conf hoodie.deltastreamer.source.hoodieincr.num_instants=100 --table-type COPY_ON_WRITE --source-class org.apache.hudi.utilities.sources.HoodieIncrSource --hoodie-conf hoodie.deltastreamer.source.hoodieincr.path=s3://bucket_name/ml_attributes/features --hoodie-conf hoodie.metrics.on=true --hoodie-conf hoodie.metrics.reporter.type=PROMETHEUS_PUSHGATEWAY --hoodie-conf hoodie.metrics.pushgateway.host=pushgateway.in --hoodie-conf hoodie.metrics.pushgateway.port=443 --hoodie-conf hoodie.metrics.pushgateway.delete.on.shutdown=false --hoodie-conf hoodie.metrics.pushgateway.job.name=hudi_transformed_features_accounts_hudi --hoodie-conf hoodie.metrics.pushgateway.random.job.name.suffix=false --hoodie-conf hoodie.metadata.enable=true --hoodie-conf hoodie.metrics.reporter.metricsname.pr efix=hudi --target-base-path s3://bucket_name_transformed/features_accounts --target-table features_accounts --enable-sync --hoodie-conf hoodie.datasource.hive_sync.database=hudi_transformed --hoodie-conf hoodie.datasource.hive_sync.table=features_accounts --sync-tool-classes org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool --hoodie-conf hoodie.datasource.write.recordkey.field=id,pos --hoodie-conf hoodie.datasource.write.precombine.field=id --hoodie-conf hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.CustomKeyGenerator --hoodie-conf hoodie.datasource.write.partitionpath.field=created_at_dt --hoodie-conf hoodie.datasource.hive_sync.partition_fields=created_at_dt --hoodie-conf hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor --hoodie-conf hoodie.deltastreamer.keygen.timebased.timestamp.type=DATE_STRING --hoodie-conf "hoodie.deltastreamer.keygen.timebased.input.dateformat=yyyy-MM-dd'T'HH:mm:ss.SSSSSS'Z', yyyy-MM-dd' 'HH:mm:ss.SSSSSS,yyyy-MM-dd' 'HH:mm:ss,yyyy-MM-dd'T'HH:mm:ss'Z'" --hoodie-conf hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy/MM/dd --source-ordering-field id --hoodie-conf secret.key.name=some-secret --hoodie-conf transformer.decrypt.cols=features_json --hoodie-conf transformer.uncompress.cols=false --hoodie-conf transformer.jsonToStruct.column=features_json --hoodie-conf transformer.normalize.column=features_json.accounts --hoodie-conf transformer.copy.fields=created_at,created_at_dt --transformer-class com.custom.transform.DecryptTransformer,com.custom.transform.JsonToStructTypeTransformer,com.custom.transform.NormalizeArrayTransformer,com.custom.transform.FlatteningTransformer,com.custom.transform.CopyFieldTransformer` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
