maabkhan commented on issue #11971:
URL: https://github.com/apache/hudi/issues/11971#issuecomment-2367793940
@ad1happy2go
spark configs passed , rest configs will take default values -
"sparkConf": {
"spark.local.dir": "/tmp/spark-local-dir-shuffle-f2086f4d",
"spark.serializer": "org.apache.spark.serializer.KryoSerializer",
"spark.sql.extensions":
"org.apache.spark.sql.hudi.HoodieSparkSessionExtension",
"spark.hadoop.fs.s3.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",
"spark.sql.caseSensitive": "false",
"spark.decommission.enabled": "true",
"spark.sql.adaptive.enabled": "true",
"spark.eventLog.rolling.enabled": "true",
"spark.dynamicAllocation.enabled": "true",
"spark.sql.catalog.spark_catalog":
"org.apache.spark.sql.hudi.catalog.HoodieCatalog",
"spark.sql.catalogImplementation": "hive",
"spark.cleaner.periodicGC.interval": "1min",
"spark.storage.decommission.enabled": "true",
"spark.dynamicAllocation.maxExecutors": "200",
"spark.dynamicAllocation.minExecutors": "1",
"spark.kubernetes.allocation.batch.size": "10",
"spark.kubernetes.driver.requestTimeout": "30000",
"spark.sql.avro.datetimeRebaseModeInRead": "CORRECTED",
"spark.dynamicAllocation.initialExecutors": "1",
"spark.sql.avro.datetimeRebaseModeInWrite": "CORRECTED",
"spark.sql.execution.arrow.sparkr.enabled": "true",
"spark.kubernetes.driver.connectionTimeout": "30000",
"spark.sql.execution.arrow.pyspark.enabled": "true",
"spark.sql.parquet.datetimeRebaseModeInRead": "CORRECTED",
"spark.sql.legacy.pathOptionBehavior.enabled": "true",
"spark.sql.parquet.datetimeRebaseModeInWrite": "CORRECTED",
"spark.storage.decommission.rddBlocks.enabled": "true",
"spark.dynamicAllocation.executorAllocationRatio": "0.33",
"spark.dynamicAllocation.shuffleTracking.enabled": "True",
"spark.storage.decommission.shuffleBlocks.enabled": "true",
"spark.kubernetes.allocation.driver.readinessTimeout": "120s",
"spark.dynamicAllocation.sustainedSchedulerBacklogTimeout": "60",
"spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version": "2"
}
"deps": {
"jars": [
"https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark3.4-bundle_2.12/0.14.0/hudi-spark3.4-bundle_2.12-0.14.0.jar",
"https://repo1.maven.org/maven2/org/apache/hive/hcatalog/hive-hcatalog-core/3.1.3/hive-hcatalog-core-3.1.3.jar"
]
}
Hudi configs passed , rest will take default values -
{
"className": "org.apache.hudi",
"hoodie.datasource.hive_sync.use_jdbc": "false",
"hoodie.datasource.write.precombine.field": "dms_timestamp",
"hoodie.datasource.write.recordkey.field": "uuid",
"hoodie.table.name": "users",
"hoodie.consistency.check.enabled": "false",
"hoodie.datasource.hive_sync.table": "users",
"hoodie.datasource.hive_sync.database": "luna_lazypay",
"hoodie.datasource.hive_sync.enable": "true",
"hoodie.datasource.hive_sync.mode": "hms",
"hoodie.datasource.hive_sync.support_timestamp": "true",
"hoodie.datasource.write.reconcile.schema": "true",
"path": "s3a://refined-luna-prod/luna_lazypay/users/",
"hoodie.datasource.write.keygenerator.class":
"org.apache.hudi.keygen.ComplexKeyGenerator",
"hoodie.datasource.write.partitionpath.field": "year,month,day",
"hoodie.datasource.hive_sync.partition_extractor_class":
"org.apache.hudi.hive.MultiPartKeysValueExtractor",
"hoodie.datasource.hive_sync.partition_fields": "year,month,day",
"hoodie.datasource.write.hive_style_partitioning": "true",
"hoodie.upsert.shuffle.parallelism": 40,
"hoodie.datasource.write.operation": "upsert",
"hoodie.cleaner.policy": "KEEP_LATEST_COMMITS",
"hoodie.cleaner.commits.retained": 1
}
These configs are of the job details shared above which is trying to read
from a table while it was getting updated.
Also the table my job is reading is also a hudi table and that gets updated
by similar kind of spark-hudi job but with some other configs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]