[ 
https://issues.apache.org/jira/browse/HUDI-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230386#comment-17230386
 ] 

Balaji Varadarajan commented on HUDI-1205:
------------------------------------------

[~leehuynh] [~zuyanton] [~garyli1019] Please see the above comment and try 
master. 

> Serialization fail when log file is larger than 2GB
> ---------------------------------------------------
>
>                 Key: HUDI-1205
>                 URL: https://issues.apache.org/jira/browse/HUDI-1205
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Yanjia Gary Li
>            Priority: Major
>
> When scanning the log file, if the log file(or log file group) is larger than 
> 2GB, serialization will fail because Hudi uses Integer to store size in byte 
> for the log file. The maximum integer representing bytes is 2GB.
> Caused by: com.esotericsoftware.kryo.KryoException: Unable to find class: 
> org.apache.hudi.common.model.OverwriteWithLatestAvroPayload$$Lambda$45/62103784
> Serialization trace:
> orderingVal (org.apache.hudi.common.model.OverwriteWithLatestAvroPayload)
> data (org.apache.hudi.common.model.HoodieRecord)
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:160)
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
> at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:693)
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:118)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543)
> at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731)
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813)
> at 
> org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.deserialize(SerializationUtils.java:107)
> at 
> org.apache.hudi.common.util.SerializationUtils.deserialize(SerializationUtils.java:81)
> at 
> org.apache.hudi.common.util.collection.DiskBasedMap.get(DiskBasedMap.java:217)
> at 
> org.apache.hudi.common.util.collection.DiskBasedMap.get(DiskBasedMap.java:211)
> at 
> org.apache.hudi.common.util.collection.DiskBasedMap.get(DiskBasedMap.java:207)
> at 
> org.apache.hudi.common.util.collection.ExternalSpillableMap.get(ExternalSpillableMap.java:168)
> at 
> org.apache.hudi.common.util.collection.ExternalSpillableMap.get(ExternalSpillableMap.java:55)
> at 
> org.apache.hudi.HoodieMergeOnReadRDD$$anon$1.hasNext(HoodieMergeOnReadRDD.scala:128)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithoutKey_0$(Unknown
>  Source)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$11$$anon$1.hasNext(WholeStageCodegenExec.scala:624)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
> at org.apache.spark.scheduler.Task.run(Task.scala:121)
> at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$11.apply(Executor.scala:407)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1408)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:413)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hudi.common.model.OverwriteWithLatestAvroPayload$$Lambda$45/62103784
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154)
> ... 31 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to