[I] org.apache.hudi.exception.HoodieException: org.apache.avro.AvroTypeException: Cannot encode decimal with precision 14 as max precision 13 [hudi]

via GitHub Mon, 27 May 2024 04:43:17 -0700


KSubramanyaH opened a new issue, #11335:
URL: https://github.com/apache/hudi/issues/11335


   *Hell All,
   
    We have a issue while doing clustering for one of our job:
   
   Caused by: org.apache.hudi.exception.HoodieException: 
org.apache.hudi.exception.HoodieException: org.apache.avro.AvroTypeException: 
Cannot encode decimal with precision 14 as max precision 13
        at 
org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:149)
        at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdateInternal(BaseSparkCommitActionExecutor.java:387)
        at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:369)
        at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:335)
        ... 28 more
   Caused by: org.apache.hudi.exception.HoodieException: 
org.apache.avro.AvroTypeException: Cannot encode decimal with precision 14 as 
max precision 13
        at 
org.apache.hudi.common.util.queue.SimpleExecutor.execute(SimpleExecutor.java:75)
        at 
org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:147)
        ... 31 more
   Caused by: org.apache.avro.AvroTypeException: Cannot encode decimal with 
precision 14 as max precision 13
        at 
org.apache.avro.Conversions$DecimalConversion.validate(Conversions.java:140)
        at 
org.apache.avro.Conversions$DecimalConversion.toFixed(Conversions.java:104)
        at 
org.apache.hudi.avro.HoodieAvroUtils.rewritePrimaryTypeWithDiffSchemaType(HoodieAvroUtils.java:1077)
        at 
org.apache.hudi.avro.HoodieAvroUtils.rewritePrimaryType(HoodieAvroUtils.java:1001)
        at 
org.apache.hudi.avro.HoodieAvroUtils.rewriteRecordWithNewSchemaInternal(HoodieAvroUtils.java:946)
        at 
org.apache.hudi.avro.HoodieAvroUtils.rewriteRecordWithNewSchema(HoodieAvroUtils.java:873)
        at 
org.apache.hudi.avro.HoodieAvroUtils.rewriteRecordWithNewSchemaInternal(HoodieAvroUtils.java:944)
        at 
org.apache.hudi.avro.HoodieAvroUtils.rewriteRecordWithNewSchema(HoodieAvroUtils.java:873)
        at 
org.apache.hudi.avro.HoodieAvroUtils.rewriteRecordWithNewSchemaInternal(HoodieAvroUtils.java:894)
        at 
org.apache.hudi.avro.HoodieAvroUtils.rewriteRecordWithNewSchema(HoodieAvroUtils.java:873)
        at 
org.apache.hudi.avro.HoodieAvroUtils.rewriteRecordWithNewSchema(HoodieAvroUtils.java:843)
        at 
org.apache.hudi.common.model.HoodieAvroIndexedRecord.rewriteRecordWithNewSchema(HoodieAvroIndexedRecord.java:123)
        at 
org.apache.hudi.common.model.HoodieRecord.rewriteRecordWithNewSchema(HoodieRecord.java:382)
        at 
org.apache.hudi.table.action.commit.HoodieMergeHelper.lambda$runMerge$0(HoodieMergeHelper.java:136)
        at 
org.apache.hudi.common.util.queue.SimpleExecutor.execute(SimpleExecutor.java:68)
        ... 32 more
   
   
    There are some fields , which are  having data type decimal(13,4) and 
incoming data is not having more precision than prescribed one . still we are 
getting above issue. max  we get (10,4) precision values.
   
    is there any solution solve this . This issue occurred after we start to 
use clustering for our tables
   
     Hudi configs:
     {'hoodie.clustering.plan.strategy.target.file.max.bytes': '525829120', 
'hoodie.datasource.hive_sync.table': '*******', 
'hoodie.datasource.write.reconcile.schema': 'true', 'hoodie.index.type': 
'SIMPLE', 'hoodie.clean.automatic': 'true', 'hoodie.write.markers.type': 
'direct', 'hoodie.datasource.hive_sync.partition_extractor_class': 
'org.apache.hudi.hive.MultiPartKeysValueExtractor', 
'hoodie.schema.on.read.enable': 'true', 'hoodie.datasource.hive_sync.mode': 
'hms', 'hoodie.datasource.write.drop.partition.columns': 'true', 
'hoodie.datasource.write.recordkey.field': 
'_partition_year,_partition_month,_partition_day,id', 
'hoodie.datasource.hive_sync.support_timestamp': 'true', 
'hoodie.metadata.enable': 'false', 
'hoodie.datasource.write.keygenerator.class': 
'org.apache.hudi.keygen.ComplexKeyGenerator', 
'hoodie.datasource.write.table.type': 'COPY_ON_WRITE', 
'hoodie.datasource.write.hive_style_partitioning': 'true', 
'hoodie.clustering.plan.strategy.max.bytes.per.group': '525829120', 'hood
 ie.parquet.small.file.limit': '0', 'hoodie.datasource.hive_sync.enable': 
'true', 'hoodie.cleaner.policy': 'KEEP_LATEST_FILE_VERSIONS', 
'hoodie.clustering.inline.max.commits': '4', 'hoodie.clustering.inline': 
'true', 'hoodie.clustering.plan.strategy.max.num.groups': '200', 
'hoodie.datasource.write.schema.allow.auto.evolution.column.drop': 'false', 
'hoodie.datasource.hive_sync.use_jdbc': 'false', 
'hoodie.datasource.write.partitionpath.field': 
'_partition_year,_partition_month,_partition_day', 
'hoodie.cleaner.fileversions.retained': '1', 'hoodie.table.name': '*******', 
'hoodie.clustering.plan.strategy.small.file.limit': '512000', 
'hoodie.datasource.write.payload.class': 
'org.apache.hudi.common.model.DefaultHoodieRecordPayload', 
'hoodie.datasource.write.precombine.field': '_acq_change_seq', 
'hoodie.datasource.hive_sync.database': '*******', 
'hoodie.datasource.write.operation': 'upsert'} 
   
   
     Hudi version : 0.14.0
   
   Also i tried ,  'spark.sql.storeAssignmentPolicy'='legacy'
   
   Below is spark session
   spark = (
       SparkSession.builder.appName("test")
       .config('spark.hadoop.hive.metastore.client.factory.class',
               
'com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory')
       .config('spark.serializer', 'org.apache.spark.serializer.KryoSerializer')
       .config('spark.sql.extensions', 
'org.apache.spark.sql.hudi.HoodieSparkSessionExtension')
       .config('spark.hadoop.spark.sql.legacy.parquet.nanosAsLong', 'false')
       .config('spark.hadoop.spark.sql.parquet.binaryAsString', 'false')
       .config('spark.hadoop.spark.sql.parquet.int96AsTimestamp', 'true')
       .config('spark.hadoop.spark.sql.caseSensitive', 'false')
       .config('spark.sql.parquet.datetimeRebaseModeInWrite','CORRECTED')
       .config('spark.sql.parquet.datetimeRebaseModeInRead','CORRECTED')
       .config('spark.sql.parquet.int96RebaseModeInWrite', 'CORRECTED')
       .config('spark.sql.storeAssignmentPolicy', 'legacy')
       .config("spark.executor.extraJavaOptions",
               "-XX:+UseG1GC -XX:+UnlockDiagnosticVMOptions 
-XX:+G1SummarizeConcMark -XX:InitiatingHeapOccupancyPercent=35 -verbose:gc 
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:OnOutOfMemoryError='kill -9 %p'")
       .config("spark.driver.extraJavaOptions",
               "-XX:+UseG1GC -XX:+UnlockDiagnosticVMOptions 
-XX:+G1SummarizeConcMark -XX:InitiatingHeapOccupancyPercent=35 -verbose:gc 
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:OnOutOfMemoryError='kill -9 %p'")
       .config('spark.sql.hive.convertMetastoreParquet', 'false')
       .config('conf spark.kryoserializer.buffer.max', '2040M')
       .config('fs.s3.maxRetries', '5')
       .enableHiveSupport()
       .getOrCreate()
   )
   
   Could anybody help here . It is a production error 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] org.apache.hudi.exception.HoodieException: org.apache.avro.AvroTypeException: Cannot encode decimal with precision 14 as max precision 13 [hudi]

Reply via email to