Rong Ma created HUDI-2031:
-----------------------------
Summary: JVM occasionally crashes during compaction when spark
speculative execution is enabled
Key: HUDI-2031
URL: https://issues.apache.org/jira/browse/HUDI-2031
Project: Apache Hudi
Issue Type: Bug
Reporter: Rong Ma
This could happen when speculative execution is triggered. The duplicated tasks
are expected to terminate normally, but sometimes they cannot and will cause
the JVM crashes.
From executor logs:
ERROR [Executor task launch worker for task 6828] HoodieMergeHandle: Error
writing record HoodieRecord{key=HoodieKey
{ recordKey=45246275517 partitionPath=2021-06-13}, currentLocation='null',
newLocation='null'}ERROR [Executor task launch worker for task 6828]
HoodieMergeHandle: Error writing record HoodieRecord\{key=HoodieKey {
recordKey=45246275517 partitionPath=2021-06-13}
, currentLocation='null',
newLocation='null'}java.lang.IllegalArgumentException: You cannot call
toBytes() more than once without calling reset() at
org.apache.parquet.Preconditions.checkArgument(Preconditions.java:53) at
org.apache.parquet.column.values.rle.RunLengthBitPackingHybridEncoder.toBytes(RunLengthBitPackingHybridEncoder.java:254)
at
org.apache.parquet.column.values.rle.RunLengthBitPackingHybridValuesWriter.getBytes(RunLengthBitPackingHybridValuesWriter.java:65)
at
org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:148)
at
org.apache.parquet.column.impl.ColumnWriterV1.accountForValueWritten(ColumnWriterV1.java:106)
at
org.apache.parquet.column.impl.ColumnWriterV1.write(ColumnWriterV1.java:200) at
org.apache.parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.addBinary(MessageColumnIO.java:469)
at
org.apache.parquet.avro.AvroWriteSupport.writeValueWithoutConversion(AvroWriteSupport.java:346)
at
org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:278)
at
org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:191)
at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:165)
at
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:299) at
org.apache.hudi.io.storage.HoodieParquetWriter.writeAvroWithMetadata(HoodieParquetWriter.java:83)
at
org.apache.hudi.io.HoodieMergeHandle.writeRecord(HoodieMergeHandle.java:252) at
org.apache.hudi.io.HoodieMergeHandle.close(HoodieMergeHandle.java:336) at
org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:107)
at
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleUpdateInternal(HoodieSparkCopyOnWriteTable.java:199)
at
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleUpdate(HoodieSparkCopyOnWriteTable.java:190)
at
org.apache.hudi.table.action.compact.HoodieSparkMergeOnReadTableCompactor.compact(HoodieSparkMergeOnReadTableCompactor.java:154)
at
org.apache.hudi.table.action.compact.HoodieSparkMergeOnReadTableCompactor.lambda$compact$9ec9d4c7$1(HoodieSparkMergeOnReadTableCompactor.java:105)
at
org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1041)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) at
scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484) at
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490) at
org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
at
org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:349)
at
org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1388)
at
org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1298)
at
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1362) at
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1186)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:360) at
org.apache.spark.rdd.RDD.iterator(RDD.scala:311) at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at
org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at
org.apache.spark.scheduler.Task.run(Task.scala:127) at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$runWithUgi$3(Executor.scala:462)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) at
org.apache.spark.executor.Executor$TaskRunner.runWithUgi(Executor.scala:465) at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:394) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f2b0b37042a, pid=10120, tid=0x00007f2b0b16c700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_201-b09) (build
1.8.0_201-b09)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.201-b09 mixed mode linux-amd64
compressed oops)
# Problematic frame:
# C [libz.so.1+0x342a]
#
# Failed to write core dump. Core dumps have been disabled. To enable core
dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
#
/home/vipshop/ssd_disk/0/yarn/local14/usercache/hdfs/appcache/application_1620320166879_33384625/container_e104_1620320166879_33384625_01_000008/hs_err_pid10120.log
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
--
This message was sent by Atlassian Jira
(v8.3.4#803005)