gowriGH opened a new issue, #13072:
URL: https://github.com/apache/hudi/issues/13072
We are trying to create a Hudi table in HDFS using spark-submit, but we
encounter the following error during compaction:
[2025-03-11T01:01:27.119+0000] {ssh.py:478} INFO - Caused by:
org.apache.hudi.exception.HoodieCompactionException: Could not compact
hdfs://sdpprodnn01.techsophy.com:9820/techsophy/raw/biometric/np/cow/hive/biometric_table/.hoodie/metadata
**To Reproduce**
Steps to reproduce the behavior:
1. Create a Hudi table in HDFS using spark-submit with Copy-On-Write (COW)
storage type.
2. Run the job multiple times until the compaction error occurs.
**Expected behavior**
According to the Hudi documentation, compaction should not occur for
Copy-On-Write (COW) tables. However, in our case, compaction is being triggered
unexpectedly.
A clear and concise description of what you expected to happen.
**Environment Description**
* Hudi version : 0.14.1
* Spark version : 3.4.4
* Hive version : 4.0.1
* Hadoop version : 3.4.1
* Storage (HDFS/S3/GCS..) : HDFS
* Running on Docker? (yes/no) : no
**Additional context**
1. Rebuilt Hudi with Hadoop 3 & HBase 2.4.9 JARs
Ensured compatibility by rebuilding Hudi with the correct dependencies.
2. Checked hadoop-hdfs-client JARs for all hadoop versions but,
did not find the java.lang.NoSuchMethodError:
org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics() issue.
Instead, we encountered:
java.lang.NoSuchMethodError:
org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()Lorg/apache/hadoop/hdfs/$ReadStatistics;
3. Tested with Hudi 1.0.1
As per the document , this was fixed in hudi 1.0.1, so we tried with hudi
1.0.1 , we even successfully created the tables. but queries are not working in
hive on tez.
Add any other context about the problem here.
hudi properties passing from python file
hudi_write_params = {
'hoodie.datasource.write.storage.type':
hudi_params["table_type"],
'hoodie.datasource.write.table.type': hudi_params["table_type"],
'hoodie.datasource.write.operation':
hudi_params["write_operation"],
'hoodie.datasource.write.compression.type':
hudi_params["compression_type"],
'hoodie.compact.inline': 'false',
'hoodie.compact.schedule.inline': 'false',
'hoodie.datasource.compaction.async.enable': 'true',
'hoodie.datasource.table.name': hudi_params["table_name"],
'hoodie.table.name': hudi_params["table_name"],
'hoodie.datasource.write.recordkey.field':
hudi_params["primary_key_fields"],
'hoodie.datasource.write.precombine.field':
hudi_params["precombine_field"],
'hoodie.datasource.write.schema': updated_schema.json()
}
**Stacktrace**
[2025-03-11T01:01:23.581+0000] {ssh.py:483} WARNING - 25/03/11 01:01:23 WARN
TaskSetManager: Lost task 0.0 in stage 3.0 (TID 4) (sdpproddn01.techsophy.com
executor 1): java.lang.NoSuchMethodError:
org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()Lorg/apache/hadoop/hdfs/DFSInputStream$ReadStatistics;
[2025-03-11T01:01:23.581+0000] {ssh.py:483} WARNING - at
org.apache.hudi.org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.updateInputStreamStatistics(FSDataInputStreamWrapper.java:249)
[2025-03-11T01:01:23.582+0000] {ssh.py:483} WARNING - at
org.apache.hudi.org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.close(FSDataInputStreamWrapper.java:296)
[2025-03-11T01:01:23.582+0000] {ssh.py:483} WARNING - at
org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.closeStreams(HFileBlock.java:1825)
[2025-03-11T01:01:23.582+0000] {ssh.py:483} WARNING - at
org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFilePreadReader.close(HFilePreadReader.java:107)
[2025-03-11T01:01:23.582+0000] {ssh.py:483} WARNING - at
org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.close(HFileReaderImpl.java:1421)
[2025-03-11T01:01:23.582+0000] {ssh.py:483} WARNING - at
org.apache.hudi.io.storage.HoodieAvroHFileReader$RecordIterator.close(HoodieAvroHFileReader.java:725)
[2025-03-11T01:01:23.583+0000] {ssh.py:483} WARNING - at
org.apache.hudi.common.util.collection.CloseableMappingIterator.close(CloseableMappingIterator.java:35)
[2025-03-11T01:01:23.583+0000] {ssh.py:483} WARNING - at
org.apache.hudi.common.util.queue.SimpleExecutor.shutdownNow(SimpleExecutor.java:83)
[2025-03-11T01:01:23.583+0000] {ssh.py:483} WARNING - at
org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:154)
[2025-03-11T01:01:23.583+0000] {ssh.py:483} WARNING - at
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleUpdateInternal(HoodieSparkCopyOnWriteTable.java:252)
[2025-03-11T01:01:23.584+0000] {ssh.py:483} WARNING - at
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleUpdate(HoodieSparkCopyOnWriteTable.java:235)
[2025-03-11T01:01:23.584+0000] {ssh.py:483} WARNING - at
org.apache.hudi.table.action.compact.CompactionExecutionHelper.writeFileAndGetWriteStats(CompactionExecutionHelper.java:64)
[2025-03-11T01:01:23.584+0000] {ssh.py:483} WARNING - at
org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:237)
[2025-03-11T01:01:23.585+0000] {ssh.py:483} WARNING - at
org.apache.hudi.table.action.compact.HoodieCompactor.lambda$compact$988df80a$1(HoodieCompactor.java:132)
[2025-03-11T01:01:23.585+0000] {ssh.py:483} WARNING - at
org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
[2025-03-11T01:01:23.585+0000] {ssh.py:483} WARNING - at
scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
[2025-03-11T01:01:23.585+0000] {ssh.py:483} WARNING - at
scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
[2025-03-11T01:01:23.585+0000] {ssh.py:483} WARNING - at
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
[2025-03-11T01:01:23.586+0000] {ssh.py:483} WARNING - at
org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:223)
[2025-03-11T01:01:23.586+0000] {ssh.py:483} WARNING - at
org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:352)
[2025-03-11T01:01:23.586+0000] {ssh.py:483} WARNING - at
org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1548)
[2025-03-11T01:01:23.586+0000] {ssh.py:483} WARNING - at
org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1458)
[2025-03-11T01:01:23.586+0000] {ssh.py:483} WARNING - at
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1522)
[2025-03-11T01:01:23.586+0000] {ssh.py:483} WARNING - at
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1349)
[2025-03-11T01:01:23.587+0000] {ssh.py:483} WARNING - at
org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:378)
[2025-03-11T01:01:23.587+0000] {ssh.py:483} WARNING - at
org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
[2025-03-11T01:01:23.587+0000] {ssh.py:483} WARNING - at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
[2025-03-11T01:01:23.587+0000] {ssh.py:483} WARNING - at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
[2025-03-11T01:01:23.587+0000] {ssh.py:483} WARNING - at
org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
[2025-03-11T01:01:23.588+0000] {ssh.py:483} WARNING - at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
[2025-03-11T01:01:23.588+0000] {ssh.py:483} WARNING - at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
[2025-03-11T01:01:23.588+0000] {ssh.py:483} WARNING - at
org.apache.spark.scheduler.Task.run(Task.scala:139)
[2025-03-11T01:01:23.588+0000] {ssh.py:483} WARNING - at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
[2025-03-11T01:01:23.588+0000] {ssh.py:483} WARNING - at
org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
[2025-03-11T01:01:23.589+0000] {ssh.py:483} WARNING - at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
[2025-03-11T01:01:23.589+0000] {ssh.py:483} WARNING - at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[2025-03-11T01:01:23.589+0000] {ssh.py:483} WARNING - at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[2025-03-11T01:01:23.589+0000] {ssh.py:483} WARNING - at
java.lang.Thread.run(Thread.java:750)
[2025-03-11T01:01:23.590+0000] {ssh.py:483} WARNING - 25/03/11 01:01:23 INFO
TaskSetManager: Starting task 0.1 in stage 3.0 (TID 5)
(sdpproddn01.techsophy.com, executor 2, partition 0, PROCESS_LOCAL, 9070 bytes)
[2025-03-11T01:01:23.621+0000] {ssh.py:483} WARNING - 25/03/11 01:01:23 INFO
BlockManagerInfo: Added broadcast_4_piece0 in memory on
sdpproddn01.techsophy.com:7090 (size: 133.3 KiB, free: 366.0 MiB)
[2025-03-11T01:01:25.959+0000] {ssh.py:483} WARNING - 25/03/11 01:01:25 INFO
TaskSetManager: Lost task 0.1 in stage 3.0 (TID 5) on
sdpproddn01.techsophy.com, executor 2: java.lang.NoSuchMethodError
(org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()Lorg/apache/hadoop/hdfs/DFSInputStream$ReadStatistics;)
[duplicate 1]
[2025-03-11T01:01:25.961+0000] {ssh.py:483} WARNING - 25/03/11 01:01:25 INFO
TaskSetManager: Starting task 0.2 in stage 3.0 (TID 6)
(sdpproddn01.techsophy.com, executor 2, partition 0, PROCESS_LOCAL, 9070 bytes)
[2025-03-11T01:01:26.262+0000] {ssh.py:483} WARNING - 25/03/11 01:01:26 INFO
TaskSetManager: Lost task 0.2 in stage 3.0 (TID 6) on
sdpproddn01.techsophy.com, executor 2: java.lang.NoSuchMethodError
(org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()Lorg/apache/hadoop/hdfs/DFSInputStream$ReadStatistics;)
[duplicate 2]
[2025-03-11T01:01:26.266+0000] {ssh.py:483} WARNING - 25/03/11 01:01:26 INFO
TaskSetManager: Starting task 0.3 in stage 3.0 (TID 7)
(sdpproddn01.techsophy.com, executor 1, partition 0, PROCESS_LOCAL, 9070 bytes)
[2025-03-11T01:01:26.547+0000] {ssh.py:483} WARNING - 25/03/11 01:01:26 INFO
TaskSetManager: Lost task 0.3 in stage 3.0 (TID 7) on
sdpproddn01.techsophy.com, executor 1: java.lang.NoSuchMethodError
(org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()Lorg/apache/hadoop/hdfs/DFSInputStream$ReadStatistics;)
[duplicate 3]
[2025-03-11T01:01:26.548+0000] {ssh.py:483} WARNING - 25/03/11 01:01:26
ERROR TaskSetManager: Task 0 in stage 3.0 failed 4 times; aborting job
[2025-03-11T01:01:26.550+0000] {ssh.py:483} WARNING - 25/03/11 01:01:26 INFO
YarnScheduler: Removed TaskSet 3.0, whose tasks have all completed, from pool
[2025-03-11T01:01:26.553+0000] {ssh.py:483} WARNING - 25/03/11 01:01:26 INFO
YarnScheduler: Cancelling stage 3
[2025-03-11T01:01:26.553+0000] {ssh.py:483} WARNING - 25/03/11 01:01:26 INFO
YarnScheduler: Killing all running tasks in stage 3: Stage cancelled
[2025-03-11T01:01:26.554+0000] {ssh.py:483} WARNING - 25/03/11 01:01:26 INFO
DAGScheduler: ResultStage 3 (collect at HoodieJavaRDD.java:177) failed in 5.763
s due to Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times,
most recent failure: Lost task 0.3 in stage 3.0 (TID 7)
(sdpproddn01.techsophy.com executor 1): java.lang.NoSuchMethodError:
org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()Lorg/apache/hadoop/hdfs/DFSInputStream$ReadStatistics;
[2025-03-11T01:01:26.554+0000] {ssh.py:483} WARNING - at
org.apache.hudi.org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.updateInputStreamStatistics(FSDataInputStreamWrapper.java:249)
[2025-03-11T01:01:26.554+0000] {ssh.py:483} WARNING - at
org.apache.hudi.org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.close(FSDataInputStreamWrapper.java:296)
[2025-03-11T01:01:26.555+0000] {ssh.py:483} WARNING - at
org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.closeStreams(HFileBlock.java:1825)
[2025-03-11T01:01:26.555+0000] {ssh.py:483} WARNING - at
org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFilePreadReader.close(HFilePreadReader.java:107)
[2025-03-11T01:01:26.555+0000] {ssh.py:483} WARNING - at
org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.close(HFileReaderImpl.java:1421)
[2025-03-11T01:01:26.555+0000] {ssh.py:483} WARNING - at
org.apache.hudi.io.storage.HoodieAvroHFileReader$RecordIterator.close(HoodieAvroHFileReader.java:725)
[2025-03-11T01:01:26.555+0000] {ssh.py:483} WARNING - at
org.apache.hudi.common.util.collection.CloseableMappingIterator.close(CloseableMappingIterator.java:35)
[2025-03-11T01:01:26.555+0000] {ssh.py:483} WARNING - at
org.apache.hudi.common.util.queue.SimpleExecutor.shutdownNow(SimpleExecutor.java:83)
[2025-03-11T01:01:26.555+0000] {ssh.py:483} WARNING - at
org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:154)
[2025-03-11T01:01:26.555+0000] {ssh.py:483} WARNING - at
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleUpdateInternal(HoodieSparkCopyOnWriteTable.java:252)
[2025-03-11T01:01:26.555+0000] {ssh.py:483} WARNING - at
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleUpdate(HoodieSparkCopyOnWriteTable.java:235)
[2025-03-11T01:01:26.556+0000] {ssh.py:483} WARNING - at
org.apache.hudi.table.action.compact.CompactionExecutionHelper.writeFileAndGetWriteStats(CompactionExecutionHelper.java:64)
[2025-03-11T01:01:26.556+0000] {ssh.py:483} WARNING - at
org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:237)
[2025-03-11T01:01:26.556+0000] {ssh.py:483} WARNING - at
org.apache.hudi.table.action.compact.HoodieCompactor.lambda$compact$988df80a$1(HoodieCompactor.java:132)
[2025-03-11T01:01:26.556+0000] {ssh.py:483} WARNING - at
org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
[2025-03-11T01:01:26.556+0000] {ssh.py:483} WARNING - at
scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
[2025-03-11T01:01:26.556+0000] {ssh.py:483} WARNING - at
scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
[2025-03-11T01:01:26.557+0000] {ssh.py:483} WARNING - at
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
[2025-03-11T01:01:26.557+0000] {ssh.py:483} WARNING - at
org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:223)
[2025-03-11T01:01:26.557+0000] {ssh.py:483} WARNING - at
org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:352)
[2025-03-11T01:01:26.557+0000] {ssh.py:483} WARNING - at
org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1548)
[2025-03-11T01:01:26.557+0000] {ssh.py:483} WARNING - at
org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1458)
[2025-03-11T01:01:26.557+0000] {ssh.py:483} WARNING - at
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1522)
[2025-03-11T01:01:26.557+0000] {ssh.py:483} WARNING - at
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1349)
[2025-03-11T01:01:26.557+0000] {ssh.py:483} WARNING - at
org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:378)
[2025-03-11T01:01:26.558+0000] {ssh.py:483} WARNING - at
org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
[2025-03-11T01:01:26.558+0000] {ssh.py:483} WARNING - at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
[2025-03-11T01:01:26.558+0000] {ssh.py:483} WARNING - at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
[2025-03-11T01:01:26.558+0000] {ssh.py:483} WARNING - at
org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
[2025-03-11T01:01:26.558+0000] {ssh.py:483} WARNING - at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
[2025-03-11T01:01:26.558+0000] {ssh.py:483} WARNING - at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
[2025-03-11T01:01:26.558+0000] {ssh.py:483} WARNING - at
org.apache.spark.scheduler.Task.run(Task.scala:139)
[2025-03-11T01:01:26.558+0000] {ssh.py:483} WARNING - at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
[2025-03-11T01:01:26.558+0000] {ssh.py:483} WARNING - at
org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
[2025-03-11T01:01:26.559+0000] {ssh.py:483} WARNING - at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
[2025-03-11T01:01:26.559+0000] {ssh.py:483} WARNING - at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[2025-03-11T01:01:26.559+0000] {ssh.py:483} WARNING - at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[2025-03-11T01:01:26.559+0000] {ssh.py:483} WARNING - at
java.lang.Thread.run(Thread.java:750)
[2025-03-11T01:01:26.559+0000] {ssh.py:483} WARNING -
[2025-03-11T01:01:26.559+0000] {ssh.py:483} WARNING - Driver stacktrace:
[2025-03-11T01:01:26.559+0000] {ssh.py:483} WARNING - 25/03/11 01:01:26 INFO
DAGScheduler: Job 3 failed: collect at HoodieJavaRDD.java:177, took 5.779449 s
[2025-03-11T01:01:26.580+0000] {ssh.py:478} INFO - 01:01:26.551 [Thread-6]
ERROR org.apache.hudi.metadata.HoodieBackedTableMetadataWriter - Exception in
running table services on metadata table
[2025-03-11T01:01:26.580+0000] {ssh.py:478} INFO -
org.apache.hudi.exception.HoodieCompactionException: Could not compact
hdfs://sdpprodnn01.techsophy.com:9820/techsophy/raw/biometric/np/cow/hive/biometric_table/.hoodie/metadata
[2025-03-11T01:01:27.119+0000] {ssh.py:478} INFO - Caused by:
org.apache.hudi.exception.HoodieCompactionException: Could not compact
hdfs://sdpprodnn01.techsophy.com:9820/techsophy/raw/biometric/np/cow/hive/biometric_table/.hoodie/metadata
[2025-03-11T01:01:27.119+0000] {ssh.py:478} INFO - at
org.apache.hudi.table.action.compact.RunCompactionActionExecutor.execute(RunCompactionActionExecutor.java:129)
[2025-03-11T01:01:27.119+0000] {ssh.py:478} INFO - at
org.apache.hudi.table.HoodieSparkMergeOnReadTable.compact(HoodieSparkMergeOnReadTable.java:155)
[2025-03-11T01:01:27.120+0000] {ssh.py:478} INFO - at
org.apache.hudi.client.BaseHoodieTableServiceClient.compact(BaseHoodieTableServiceClient.java:298)
[2025-03-11T01:01:27.120+0000] {ssh.py:478} INFO - at
org.apache.hudi.client.BaseHoodieWriteClient.compact(BaseHoodieWriteClient.java:1145)
[2025-03-11T01:01:27.120+0000] {ssh.py:478} INFO - at
org.apache.hudi.client.BaseHoodieWriteClient.compact(BaseHoodieWriteClient.java:1065)
[2025-03-11T01:01:27.120+0000] {ssh.py:478} INFO - at
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.compactIfNecessary(HoodieBackedTableMetadataWriter.java:1254)
[2025-03-11T01:01:27.120+0000] {ssh.py:478} INFO - at
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.performTableServices(HoodieBackedTableMetadataWriter.java:1205)
[2025-03-11T01:01:27.120+0000] {ssh.py:478} INFO - at
org.apache.hudi.client.SparkRDDWriteClient.initializeMetadataTable(SparkRDDWriteClient.java:290)
[2025-03-11T01:01:27.120+0000] {ssh.py:478} INFO - ... 51 more
[2025-03-11T01:01:27.120+0000] {ssh.py:478} INFO - Caused by:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID
7) (sdpproddn01.techsophy.com executor 1): java.lang.NoSuchMethodError:
org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()Lorg/apache/hadoop/hdfs/DFSInputStream$ReadStatistics;
[2025-03-11T01:01:27.121+0000] {ssh.py:478} INFO - at
org.apache.hudi.org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.updateInputStreamStatistics(FSDataInputStreamWrapper.java:249)
[2025-03-11T01:01:27.121+0000] {ssh.py:478} INFO - at
org.apache.hudi.org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.close(FSDataInputStreamWrapper.java:296)
[2025-03-11T01:01:27.121+0000] {ssh.py:478} INFO - at
org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.closeStreams(HFileBlock.java:1825)
[2025-03-11T01:01:27.121+0000] {ssh.py:478} INFO - at
org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFilePreadReader.close(HFilePreadReader.java:107)
[2025-03-11T01:01:27.121+0000] {ssh.py:478} INFO - at
org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.close(HFileReaderImpl.java:1421)
[2025-03-11T01:01:27.121+0000] {ssh.py:478} INFO - at
org.apache.hudi.io.storage.HoodieAvroHFileReader$RecordIterator.close(HoodieAvroHFileReader.java:725)
[2025-03-11T01:01:27.121+0000] {ssh.py:478} INFO - at
org.apache.hudi.common.util.collection.CloseableMappingIterator.close(CloseableMappingIterator.java:35)
[2025-03-11T01:01:27.121+0000] {ssh.py:478} INFO - at
org.apache.hudi.common.util.queue.SimpleExecutor.shutdownNow(SimpleExecutor.java:83)
[2025-03-11T01:01:27.121+0000] {ssh.py:478} INFO - at
org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:154)
[2025-03-11T01:01:27.122+0000] {ssh.py:478} INFO - at
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleUpdateInternal(HoodieSparkCopyOnWriteTable.java:252)
[2025-03-11T01:01:27.122+0000] {ssh.py:478} INFO - at
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleUpdate(HoodieSparkCopyOnWriteTable.java:235)
[2025-03-11T01:01:27.122+0000] {ssh.py:478} INFO - at
org.apache.hudi.table.action.compact.CompactionExecutionHelper.writeFileAndGetWriteStats(CompactionExecutionHelper.java:64)
[2025-03-11T01:01:27.122+0000] {ssh.py:478} INFO - at
org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:237)
[2025-03-11T01:01:27.122+0000] {ssh.py:478} INFO - at
org.apache.hudi.table.action.compact.HoodieCompactor.lambda$compact$988df80a$1(HoodieCompactor.java:132)
[2025-03-11T01:01:27.122+0000] {ssh.py:478} INFO - at
org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
[2025-03-11T01:01:27.122+0000] {ssh.py:478} INFO - at
scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
[2025-03-11T01:01:27.122+0000] {ssh.py:478} INFO - at
scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
[2025-03-11T01:01:27.122+0000] {ssh.py:478} INFO - at
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
[2025-03-11T01:01:27.122+0000] {ssh.py:478} INFO - at
org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:223)
[2025-03-11T01:01:27.123+0000] {ssh.py:478} INFO - at
org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:352)
[2025-03-11T01:01:27.123+0000] {ssh.py:478} INFO - at
org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1548)
[2025-03-11T01:01:27.123+0000] {ssh.py:478} INFO - at
org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1458)
[2025-03-11T01:01:27.123+0000] {ssh.py:478} INFO - at
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1522)
[2025-03-11T01:01:27.123+0000] {ssh.py:478} INFO - at
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1349)
[2025-03-11T01:01:27.123+0000] {ssh.py:478} INFO - at
org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:378)
[2025-03-11T01:01:27.123+0000] {ssh.py:478} INFO - at
org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
[2025-03-11T01:01:27.123+0000] {ssh.py:478} INFO - at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
[2025-03-11T01:01:27.123+0000] {ssh.py:478} INFO - at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
[2025-03-11T01:01:27.124+0000] {ssh.py:478} INFO - at
org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
[2025-03-11T01:01:27.124+0000] {ssh.py:478} INFO - at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
[2025-03-11T01:01:27.124+0000] {ssh.py:478} INFO - at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
[2025-03-11T01:01:27.124+0000] {ssh.py:478} INFO - at
org.apache.spark.scheduler.Task.run(Task.scala:139)
[2025-03-11T01:01:27.124+0000] {ssh.py:478} INFO - at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
[2025-03-11T01:01:27.124+0000] {ssh.py:478} INFO - at
org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
[2025-03-11T01:01:27.124+0000] {ssh.py:478} INFO - at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
[2025-03-11T01:01:27.124+0000] {ssh.py:478} INFO - at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[2025-03-11T01:01:27.124+0000] {ssh.py:478} INFO - at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[2025-03-11T01:01:27.125+0000] {ssh.py:478} INFO - at
java.lang.Thread.run(Thread.java:750)
[2025-03-11T01:01:27.125+0000] {ssh.py:478} INFO -
[2025-03-11T01:01:27.125+0000] {ssh.py:478} INFO - Driver stacktrace:
[2025-03-11T01:01:27.125+0000] {ssh.py:478} INFO - at
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2790)
[2025-03-11T01:01:27.125+0000] {ssh.py:478} INFO - at
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2726)
[2025-03-11T01:01:27.125+0000] {ssh.py:478} INFO - at
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2725)
[2025-03-11T01:01:27.125+0000] {ssh.py:478} INFO - at
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
[2025-03-11T01:01:27.125+0000] {ssh.py:478} INFO - at
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
[2025-03-11T01:01:27.125+0000] {ssh.py:478} INFO - at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
[2025-03-11T01:01:27.125+0000] {ssh.py:478} INFO - at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2725)
[2025-03-11T01:01:27.126+0000] {ssh.py:478} INFO - at
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1211)
[2025-03-11T01:01:27.126+0000] {ssh.py:478} INFO - at
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1211)
[2025-03-11T01:01:27.126+0000] {ssh.py:478} INFO - at
scala.Option.foreach(Option.scala:407)
[2025-03-11T01:01:27.126+0000] {ssh.py:478} INFO - at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1211)
[2025-03-11T01:01:27.126+0000] {ssh.py:478} INFO - at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2989)
[2025-03-11T01:01:27.126+0000] {ssh.py:478} INFO - at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2928)
[2025-03-11T01:01:27.126+0000] {ssh.py:478} INFO - at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2917)
[2025-03-11T01:01:27.126+0000] {ssh.py:478} INFO - at
org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
[2025-03-11T01:01:27.126+0000] {ssh.py:478} INFO - at
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:976)
[2025-03-11T01:01:27.127+0000] {ssh.py:478} INFO - at
org.apache.spark.SparkContext.runJob(SparkContext.scala:2258)
[2025-03-11T01:01:27.127+0000] {ssh.py:478} INFO - at
org.apache.spark.SparkContext.runJob(SparkContext.scala:2279)
[2025-03-11T01:01:27.127+0000] {ssh.py:478} INFO - at
org.apache.spark.SparkContext.runJob(SparkContext.scala:2298)
[2025-03-11T01:01:27.127+0000] {ssh.py:478} INFO - at
org.apache.spark.SparkContext.runJob(SparkContext.scala:2323)
[2025-03-11T01:01:27.127+0000] {ssh.py:478} INFO - at
org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1022)
[2025-03-11T01:01:27.127+0000] {ssh.py:478} INFO - at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
[2025-03-11T01:01:27.127+0000] {ssh.py:478} INFO - at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
[2025-03-11T01:01:27.127+0000] {ssh.py:478} INFO - at
org.apache.spark.rdd.RDD.withScope(RDD.scala:408)
[2025-03-11T01:01:27.127+0000] {ssh.py:478} INFO - at
org.apache.spark.rdd.RDD.collect(RDD.scala:1021)
[2025-03-11T01:01:27.127+0000] {ssh.py:478} INFO - at
org.apache.spark.api.java.JavaRDDLike.collect(JavaRDDLike.scala:362)
[2025-03-11T01:01:27.128+0000] {ssh.py:478} INFO - at
org.apache.spark.api.java.JavaRDDLike.collect$(JavaRDDLike.scala:361)
[2025-03-11T01:01:27.128+0000] {ssh.py:478} INFO - at
org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
[2025-03-11T01:01:27.128+0000] {ssh.py:478} INFO - at
org.apache.hudi.data.HoodieJavaRDD.collectAsList(HoodieJavaRDD.java:177)
[2025-03-11T01:01:27.128+0000] {ssh.py:478} INFO - at
org.apache.hudi.table.action.compact.RunCompactionActionExecutor.execute(RunCompactionActionExecutor.java:113)
[2025-03-11T01:01:27.128+0000] {ssh.py:478} INFO - ... 58 more
[2025-03-11T01:01:27.128+0000] {ssh.py:478} INFO - Caused by:
java.lang.NoSuchMethodError:
org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()Lorg/apache/hadoop/hdfs/DFSInputStream$ReadStatistics;
[2025-03-11T01:01:27.128+0000] {ssh.py:478} INFO - at
org.apache.hudi.org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.updateInputStreamStatistics(FSDataInputStreamWrapper.java:249)
[2025-03-11T01:01:27.128+0000] {ssh.py:478} INFO - at
org.apache.hudi.org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.close(FSDataInputStreamWrapper.java:296)
[2025-03-11T01:01:27.128+0000] {ssh.py:478} INFO - at
org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.closeStreams(HFileBlock.java:1825)
[2025-03-11T01:01:27.129+0000] {ssh.py:478} INFO - at
org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFilePreadReader.close(HFilePreadReader.java:107)
[2025-03-11T01:01:27.129+0000] {ssh.py:478} INFO - at
org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.close(HFileReaderImpl.java:1421)
[2025-03-11T01:01:27.129+0000] {ssh.py:478} INFO - at
org.apache.hudi.io.storage.HoodieAvroHFileReader$RecordIterator.close(HoodieAvroHFileReader.java:725)
[2025-03-11T01:01:27.129+0000] {ssh.py:478} INFO - at
org.apache.hudi.common.util.collection.CloseableMappingIterator.close(CloseableMappingIterator.java:35)
[2025-03-11T01:01:27.129+0000] {ssh.py:478} INFO - at
org.apache.hudi.common.util.queue.SimpleExecutor.shutdownNow(SimpleExecutor.java:83)
[2025-03-11T01:01:27.129+0000] {ssh.py:478} INFO - at
org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:154)
[2025-03-11T01:01:27.129+0000] {ssh.py:478} INFO - at
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleUpdateInternal(HoodieSparkCopyOnWriteTable.java:252)
[2025-03-11T01:01:27.129+0000] {ssh.py:478} INFO - at
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleUpdate(HoodieSparkCopyOnWriteTable.java:235)
[2025-03-11T01:01:27.129+0000] {ssh.py:478} INFO - at
org.apache.hudi.table.action.compact.CompactionExecutionHelper.writeFileAndGetWriteStats(CompactionExecutionHelper.java:64)
[2025-03-11T01:01:27.129+0000] {ssh.py:478} INFO - at
org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:237)
[2025-03-11T01:01:27.130+0000] {ssh.py:478} INFO - at
org.apache.hudi.table.action.compact.HoodieCompactor.lambda$compact$988df80a$1(HoodieCompactor.java:132)
[2025-03-11T01:01:27.130+0000] {ssh.py:478} INFO - at
org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
[2025-03-11T01:01:27.130+0000] {ssh.py:478} INFO - at
scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
[2025-03-11T01:01:27.130+0000] {ssh.py:478} INFO - at
scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
[2025-03-11T01:01:27.130+0000] {ssh.py:478} INFO - at
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
[2025-03-11T01:01:27.130+0000] {ssh.py:478} INFO - at
org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:223)
[2025-03-11T01:01:27.130+0000] {ssh.py:478} INFO - at
org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:352)
[2025-03-11T01:01:27.130+0000] {ssh.py:478} INFO - at
org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1548)
[2025-03-11T01:01:27.130+0000] {ssh.py:478} INFO - at
org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1458)
[2025-03-11T01:01:27.131+0000] {ssh.py:478} INFO - at
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1522)
[2025-03-11T01:01:27.131+0000] {ssh.py:478} INFO - at
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1349)
[2025-03-11T01:01:27.131+0000] {ssh.py:478} INFO - at
org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:378)
[2025-03-11T01:01:27.131+0000] {ssh.py:478} INFO - at
org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
[2025-03-11T01:01:27.131+0000] {ssh.py:478} INFO - at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
[2025-03-11T01:01:27.131+0000] {ssh.py:478} INFO - at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
[2025-03-11T01:01:27.131+0000] {ssh.py:478} INFO - at
org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
[2025-03-11T01:01:27.131+0000] {ssh.py:478} INFO - at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
[2025-03-11T01:01:27.131+0000] {ssh.py:478} INFO - at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
[2025-03-11T01:01:27.131+0000] {ssh.py:478} INFO - at
org.apache.spark.scheduler.Task.run(Task.scala:139)
[2025-03-11T01:01:27.132+0000] {ssh.py:478} INFO - at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
[2025-03-11T01:01:27.132+0000] {ssh.py:478} INFO - at
org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
[2025-03-11T01:01:27.132+0000] {ssh.py:478} INFO - at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
[2025-03-11T01:01:27.132+0000] {ssh.py:478} INFO - at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[2025-03-11T01:01:27.132+0000] {ssh.py:478} INFO - at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[2025-03-11T01:01:27.132+0000] {ssh.py:478} INFO - ... 1 more
[2025-03-11T01:01:27.159+0000] {ssh.py:483} WARNING - 25/03/11 01:01:27 INFO
SparkContext: SparkContext is stopping with exitCode 0.
**Add the stacktrace of the error.**
[2025-03-11T01:01:27.120+0000] {ssh.py:478} INFO - Caused by:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID
7) (sdpproddn01.techsophy.com executor 1): java.lang.NoSuchMethodError:
org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()Lorg/apache/hadoop/hdfs/DFSInputStream$ReadStatistics;
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]