gowriGH opened a new issue, #13072:
URL: https://github.com/apache/hudi/issues/13072

   
   We are trying to create a Hudi table in HDFS using spark-submit, but we 
encounter the following error during compaction:
   
   [2025-03-11T01:01:27.119+0000] {ssh.py:478} INFO - Caused by: 
org.apache.hudi.exception.HoodieCompactionException: Could not compact 
hdfs://sdpprodnn01.techsophy.com:9820/techsophy/raw/biometric/np/cow/hive/biometric_table/.hoodie/metadata
   
   
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Create a Hudi table in HDFS using spark-submit with Copy-On-Write (COW) 
storage type.
   2. Run the job multiple times until the compaction error occurs.
   
   **Expected behavior**
   
   According to the Hudi documentation, compaction should not occur for 
Copy-On-Write (COW) tables. However, in our case, compaction is being triggered 
unexpectedly.
   
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.14.1
   
   * Spark version : 3.4.4
   
   * Hive version : 4.0.1
   
   * Hadoop version : 3.4.1
   
   * Storage (HDFS/S3/GCS..) : HDFS
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   1. Rebuilt Hudi with Hadoop 3 & HBase 2.4.9 JARs
   
       Ensured compatibility by rebuilding Hudi with the correct dependencies.
   
   2. Checked hadoop-hdfs-client JARs for all hadoop versions but,
   
       did not find the java.lang.NoSuchMethodError: 
org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics() issue.
   
       Instead, we encountered:
   
       java.lang.NoSuchMethodError: 
org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()Lorg/apache/hadoop/hdfs/$ReadStatistics;
   
   3. Tested with Hudi 1.0.1
   
    As per the document , this was fixed in hudi 1.0.1, so we tried with hudi 
1.0.1 , we even successfully created the tables. but queries are not working in 
hive on tez.
   Add any other context about the problem here.
   
   hudi properties passing from python file
   
          hudi_write_params = {
               'hoodie.datasource.write.storage.type': 
hudi_params["table_type"],
               'hoodie.datasource.write.table.type': hudi_params["table_type"],
               'hoodie.datasource.write.operation': 
hudi_params["write_operation"],
               'hoodie.datasource.write.compression.type': 
hudi_params["compression_type"],
            'hoodie.compact.inline': 'false',
            'hoodie.compact.schedule.inline': 'false',
            'hoodie.datasource.compaction.async.enable': 'true',
               'hoodie.datasource.table.name': hudi_params["table_name"],
               'hoodie.table.name': hudi_params["table_name"],
               'hoodie.datasource.write.recordkey.field': 
hudi_params["primary_key_fields"],
               'hoodie.datasource.write.precombine.field': 
hudi_params["precombine_field"],
               'hoodie.datasource.write.schema': updated_schema.json()
           }
   
   
   
   **Stacktrace**
   
   
   [2025-03-11T01:01:23.581+0000] {ssh.py:483} WARNING - 25/03/11 01:01:23 WARN 
TaskSetManager: Lost task 0.0 in stage 3.0 (TID 4) (sdpproddn01.techsophy.com 
executor 1): java.lang.NoSuchMethodError: 
org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()Lorg/apache/hadoop/hdfs/DFSInputStream$ReadStatistics;
   [2025-03-11T01:01:23.581+0000] {ssh.py:483} WARNING -        at 
org.apache.hudi.org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.updateInputStreamStatistics(FSDataInputStreamWrapper.java:249)
   [2025-03-11T01:01:23.582+0000] {ssh.py:483} WARNING -        at 
org.apache.hudi.org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.close(FSDataInputStreamWrapper.java:296)
   [2025-03-11T01:01:23.582+0000] {ssh.py:483} WARNING -        at 
org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.closeStreams(HFileBlock.java:1825)
   [2025-03-11T01:01:23.582+0000] {ssh.py:483} WARNING -        at 
org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFilePreadReader.close(HFilePreadReader.java:107)
   [2025-03-11T01:01:23.582+0000] {ssh.py:483} WARNING -        at 
org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.close(HFileReaderImpl.java:1421)
   [2025-03-11T01:01:23.582+0000] {ssh.py:483} WARNING -        at 
org.apache.hudi.io.storage.HoodieAvroHFileReader$RecordIterator.close(HoodieAvroHFileReader.java:725)
   [2025-03-11T01:01:23.583+0000] {ssh.py:483} WARNING -        at 
org.apache.hudi.common.util.collection.CloseableMappingIterator.close(CloseableMappingIterator.java:35)
   [2025-03-11T01:01:23.583+0000] {ssh.py:483} WARNING -        at 
org.apache.hudi.common.util.queue.SimpleExecutor.shutdownNow(SimpleExecutor.java:83)
   [2025-03-11T01:01:23.583+0000] {ssh.py:483} WARNING -        at 
org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:154)
   [2025-03-11T01:01:23.583+0000] {ssh.py:483} WARNING -        at 
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleUpdateInternal(HoodieSparkCopyOnWriteTable.java:252)
   [2025-03-11T01:01:23.584+0000] {ssh.py:483} WARNING -        at 
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleUpdate(HoodieSparkCopyOnWriteTable.java:235)
   [2025-03-11T01:01:23.584+0000] {ssh.py:483} WARNING -        at 
org.apache.hudi.table.action.compact.CompactionExecutionHelper.writeFileAndGetWriteStats(CompactionExecutionHelper.java:64)
   [2025-03-11T01:01:23.584+0000] {ssh.py:483} WARNING -        at 
org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:237)
   [2025-03-11T01:01:23.585+0000] {ssh.py:483} WARNING -        at 
org.apache.hudi.table.action.compact.HoodieCompactor.lambda$compact$988df80a$1(HoodieCompactor.java:132)
   [2025-03-11T01:01:23.585+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
   [2025-03-11T01:01:23.585+0000] {ssh.py:483} WARNING -        at 
scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
   [2025-03-11T01:01:23.585+0000] {ssh.py:483} WARNING -        at 
scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
   [2025-03-11T01:01:23.585+0000] {ssh.py:483} WARNING -        at 
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
   [2025-03-11T01:01:23.586+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:223)
   [2025-03-11T01:01:23.586+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:352)
   [2025-03-11T01:01:23.586+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1548)
   [2025-03-11T01:01:23.586+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1458)
   [2025-03-11T01:01:23.586+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1522)
   [2025-03-11T01:01:23.586+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1349)
   [2025-03-11T01:01:23.587+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:378)
   [2025-03-11T01:01:23.587+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
   [2025-03-11T01:01:23.587+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
   [2025-03-11T01:01:23.587+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
   [2025-03-11T01:01:23.587+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
   [2025-03-11T01:01:23.588+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
   [2025-03-11T01:01:23.588+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
   [2025-03-11T01:01:23.588+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.scheduler.Task.run(Task.scala:139)
   [2025-03-11T01:01:23.588+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
   [2025-03-11T01:01:23.588+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
   [2025-03-11T01:01:23.589+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
   [2025-03-11T01:01:23.589+0000] {ssh.py:483} WARNING -        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   [2025-03-11T01:01:23.589+0000] {ssh.py:483} WARNING -        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   [2025-03-11T01:01:23.589+0000] {ssh.py:483} WARNING -        at 
java.lang.Thread.run(Thread.java:750)
   [2025-03-11T01:01:23.590+0000] {ssh.py:483} WARNING - 25/03/11 01:01:23 INFO 
TaskSetManager: Starting task 0.1 in stage 3.0 (TID 5) 
(sdpproddn01.techsophy.com, executor 2, partition 0, PROCESS_LOCAL, 9070 bytes) 
   [2025-03-11T01:01:23.621+0000] {ssh.py:483} WARNING - 25/03/11 01:01:23 INFO 
BlockManagerInfo: Added broadcast_4_piece0 in memory on 
sdpproddn01.techsophy.com:7090 (size: 133.3 KiB, free: 366.0 MiB)
   [2025-03-11T01:01:25.959+0000] {ssh.py:483} WARNING - 25/03/11 01:01:25 INFO 
TaskSetManager: Lost task 0.1 in stage 3.0 (TID 5) on 
sdpproddn01.techsophy.com, executor 2: java.lang.NoSuchMethodError 
(org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()Lorg/apache/hadoop/hdfs/DFSInputStream$ReadStatistics;)
 [duplicate 1]
   [2025-03-11T01:01:25.961+0000] {ssh.py:483} WARNING - 25/03/11 01:01:25 INFO 
TaskSetManager: Starting task 0.2 in stage 3.0 (TID 6) 
(sdpproddn01.techsophy.com, executor 2, partition 0, PROCESS_LOCAL, 9070 bytes) 
   [2025-03-11T01:01:26.262+0000] {ssh.py:483} WARNING - 25/03/11 01:01:26 INFO 
TaskSetManager: Lost task 0.2 in stage 3.0 (TID 6) on 
sdpproddn01.techsophy.com, executor 2: java.lang.NoSuchMethodError 
(org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()Lorg/apache/hadoop/hdfs/DFSInputStream$ReadStatistics;)
 [duplicate 2]
   [2025-03-11T01:01:26.266+0000] {ssh.py:483} WARNING - 25/03/11 01:01:26 INFO 
TaskSetManager: Starting task 0.3 in stage 3.0 (TID 7) 
(sdpproddn01.techsophy.com, executor 1, partition 0, PROCESS_LOCAL, 9070 bytes) 
   [2025-03-11T01:01:26.547+0000] {ssh.py:483} WARNING - 25/03/11 01:01:26 INFO 
TaskSetManager: Lost task 0.3 in stage 3.0 (TID 7) on 
sdpproddn01.techsophy.com, executor 1: java.lang.NoSuchMethodError 
(org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()Lorg/apache/hadoop/hdfs/DFSInputStream$ReadStatistics;)
 [duplicate 3]
   [2025-03-11T01:01:26.548+0000] {ssh.py:483} WARNING - 25/03/11 01:01:26 
ERROR TaskSetManager: Task 0 in stage 3.0 failed 4 times; aborting job
   [2025-03-11T01:01:26.550+0000] {ssh.py:483} WARNING - 25/03/11 01:01:26 INFO 
YarnScheduler: Removed TaskSet 3.0, whose tasks have all completed, from pool 
   [2025-03-11T01:01:26.553+0000] {ssh.py:483} WARNING - 25/03/11 01:01:26 INFO 
YarnScheduler: Cancelling stage 3
   [2025-03-11T01:01:26.553+0000] {ssh.py:483} WARNING - 25/03/11 01:01:26 INFO 
YarnScheduler: Killing all running tasks in stage 3: Stage cancelled
   [2025-03-11T01:01:26.554+0000] {ssh.py:483} WARNING - 25/03/11 01:01:26 INFO 
DAGScheduler: ResultStage 3 (collect at HoodieJavaRDD.java:177) failed in 5.763 
s due to Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, 
most recent failure: Lost task 0.3 in stage 3.0 (TID 7) 
(sdpproddn01.techsophy.com executor 1): java.lang.NoSuchMethodError: 
org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()Lorg/apache/hadoop/hdfs/DFSInputStream$ReadStatistics;
   [2025-03-11T01:01:26.554+0000] {ssh.py:483} WARNING -        at 
org.apache.hudi.org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.updateInputStreamStatistics(FSDataInputStreamWrapper.java:249)
   [2025-03-11T01:01:26.554+0000] {ssh.py:483} WARNING -        at 
org.apache.hudi.org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.close(FSDataInputStreamWrapper.java:296)
   [2025-03-11T01:01:26.555+0000] {ssh.py:483} WARNING -        at 
org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.closeStreams(HFileBlock.java:1825)
   [2025-03-11T01:01:26.555+0000] {ssh.py:483} WARNING -        at 
org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFilePreadReader.close(HFilePreadReader.java:107)
   [2025-03-11T01:01:26.555+0000] {ssh.py:483} WARNING -        at 
org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.close(HFileReaderImpl.java:1421)
   [2025-03-11T01:01:26.555+0000] {ssh.py:483} WARNING -        at 
org.apache.hudi.io.storage.HoodieAvroHFileReader$RecordIterator.close(HoodieAvroHFileReader.java:725)
   [2025-03-11T01:01:26.555+0000] {ssh.py:483} WARNING -        at 
org.apache.hudi.common.util.collection.CloseableMappingIterator.close(CloseableMappingIterator.java:35)
   [2025-03-11T01:01:26.555+0000] {ssh.py:483} WARNING -        at 
org.apache.hudi.common.util.queue.SimpleExecutor.shutdownNow(SimpleExecutor.java:83)
   [2025-03-11T01:01:26.555+0000] {ssh.py:483} WARNING -        at 
org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:154)
   [2025-03-11T01:01:26.555+0000] {ssh.py:483} WARNING -        at 
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleUpdateInternal(HoodieSparkCopyOnWriteTable.java:252)
   [2025-03-11T01:01:26.555+0000] {ssh.py:483} WARNING -        at 
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleUpdate(HoodieSparkCopyOnWriteTable.java:235)
   [2025-03-11T01:01:26.556+0000] {ssh.py:483} WARNING -        at 
org.apache.hudi.table.action.compact.CompactionExecutionHelper.writeFileAndGetWriteStats(CompactionExecutionHelper.java:64)
   [2025-03-11T01:01:26.556+0000] {ssh.py:483} WARNING -        at 
org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:237)
   [2025-03-11T01:01:26.556+0000] {ssh.py:483} WARNING -        at 
org.apache.hudi.table.action.compact.HoodieCompactor.lambda$compact$988df80a$1(HoodieCompactor.java:132)
   [2025-03-11T01:01:26.556+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
   [2025-03-11T01:01:26.556+0000] {ssh.py:483} WARNING -        at 
scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
   [2025-03-11T01:01:26.556+0000] {ssh.py:483} WARNING -        at 
scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
   [2025-03-11T01:01:26.557+0000] {ssh.py:483} WARNING -        at 
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
   [2025-03-11T01:01:26.557+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:223)
   [2025-03-11T01:01:26.557+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:352)
   [2025-03-11T01:01:26.557+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1548)
   [2025-03-11T01:01:26.557+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1458)
   [2025-03-11T01:01:26.557+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1522)
   [2025-03-11T01:01:26.557+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1349)
   [2025-03-11T01:01:26.557+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:378)
   [2025-03-11T01:01:26.558+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
   [2025-03-11T01:01:26.558+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
   [2025-03-11T01:01:26.558+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
   [2025-03-11T01:01:26.558+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
   [2025-03-11T01:01:26.558+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
   [2025-03-11T01:01:26.558+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
   [2025-03-11T01:01:26.558+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.scheduler.Task.run(Task.scala:139)
   [2025-03-11T01:01:26.558+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
   [2025-03-11T01:01:26.558+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
   [2025-03-11T01:01:26.559+0000] {ssh.py:483} WARNING -        at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
   [2025-03-11T01:01:26.559+0000] {ssh.py:483} WARNING -        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   [2025-03-11T01:01:26.559+0000] {ssh.py:483} WARNING -        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   [2025-03-11T01:01:26.559+0000] {ssh.py:483} WARNING -        at 
java.lang.Thread.run(Thread.java:750)
   [2025-03-11T01:01:26.559+0000] {ssh.py:483} WARNING - 
   [2025-03-11T01:01:26.559+0000] {ssh.py:483} WARNING - Driver stacktrace:
   [2025-03-11T01:01:26.559+0000] {ssh.py:483} WARNING - 25/03/11 01:01:26 INFO 
DAGScheduler: Job 3 failed: collect at HoodieJavaRDD.java:177, took 5.779449 s
   [2025-03-11T01:01:26.580+0000] {ssh.py:478} INFO - 01:01:26.551 [Thread-6] 
ERROR org.apache.hudi.metadata.HoodieBackedTableMetadataWriter - Exception in 
running table services on metadata table
   [2025-03-11T01:01:26.580+0000] {ssh.py:478} INFO - 
org.apache.hudi.exception.HoodieCompactionException: Could not compact 
hdfs://sdpprodnn01.techsophy.com:9820/techsophy/raw/biometric/np/cow/hive/biometric_table/.hoodie/metadata
   [2025-03-11T01:01:27.119+0000] {ssh.py:478} INFO - Caused by: 
org.apache.hudi.exception.HoodieCompactionException: Could not compact 
hdfs://sdpprodnn01.techsophy.com:9820/techsophy/raw/biometric/np/cow/hive/biometric_table/.hoodie/metadata
   [2025-03-11T01:01:27.119+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.table.action.compact.RunCompactionActionExecutor.execute(RunCompactionActionExecutor.java:129)
   [2025-03-11T01:01:27.119+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.table.HoodieSparkMergeOnReadTable.compact(HoodieSparkMergeOnReadTable.java:155)
   [2025-03-11T01:01:27.120+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.client.BaseHoodieTableServiceClient.compact(BaseHoodieTableServiceClient.java:298)
   [2025-03-11T01:01:27.120+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.client.BaseHoodieWriteClient.compact(BaseHoodieWriteClient.java:1145)
   [2025-03-11T01:01:27.120+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.client.BaseHoodieWriteClient.compact(BaseHoodieWriteClient.java:1065)
   [2025-03-11T01:01:27.120+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.compactIfNecessary(HoodieBackedTableMetadataWriter.java:1254)
   [2025-03-11T01:01:27.120+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.performTableServices(HoodieBackedTableMetadataWriter.java:1205)
   [2025-03-11T01:01:27.120+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.client.SparkRDDWriteClient.initializeMetadataTable(SparkRDDWriteClient.java:290)
   [2025-03-11T01:01:27.120+0000] {ssh.py:478} INFO -   ... 51 more
   [2025-03-11T01:01:27.120+0000] {ssh.py:478} INFO - Caused by: 
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID 
7) (sdpproddn01.techsophy.com executor 1): java.lang.NoSuchMethodError: 
org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()Lorg/apache/hadoop/hdfs/DFSInputStream$ReadStatistics;
   [2025-03-11T01:01:27.121+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.updateInputStreamStatistics(FSDataInputStreamWrapper.java:249)
   [2025-03-11T01:01:27.121+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.close(FSDataInputStreamWrapper.java:296)
   [2025-03-11T01:01:27.121+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.closeStreams(HFileBlock.java:1825)
   [2025-03-11T01:01:27.121+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFilePreadReader.close(HFilePreadReader.java:107)
   [2025-03-11T01:01:27.121+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.close(HFileReaderImpl.java:1421)
   [2025-03-11T01:01:27.121+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.io.storage.HoodieAvroHFileReader$RecordIterator.close(HoodieAvroHFileReader.java:725)
   [2025-03-11T01:01:27.121+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.common.util.collection.CloseableMappingIterator.close(CloseableMappingIterator.java:35)
   [2025-03-11T01:01:27.121+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.common.util.queue.SimpleExecutor.shutdownNow(SimpleExecutor.java:83)
   [2025-03-11T01:01:27.121+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:154)
   [2025-03-11T01:01:27.122+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleUpdateInternal(HoodieSparkCopyOnWriteTable.java:252)
   [2025-03-11T01:01:27.122+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleUpdate(HoodieSparkCopyOnWriteTable.java:235)
   [2025-03-11T01:01:27.122+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.table.action.compact.CompactionExecutionHelper.writeFileAndGetWriteStats(CompactionExecutionHelper.java:64)
   [2025-03-11T01:01:27.122+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:237)
   [2025-03-11T01:01:27.122+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.table.action.compact.HoodieCompactor.lambda$compact$988df80a$1(HoodieCompactor.java:132)
   [2025-03-11T01:01:27.122+0000] {ssh.py:478} INFO -   at 
org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
   [2025-03-11T01:01:27.122+0000] {ssh.py:478} INFO -   at 
scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
   [2025-03-11T01:01:27.122+0000] {ssh.py:478} INFO -   at 
scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
   [2025-03-11T01:01:27.122+0000] {ssh.py:478} INFO -   at 
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
   [2025-03-11T01:01:27.122+0000] {ssh.py:478} INFO -   at 
org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:223)
   [2025-03-11T01:01:27.123+0000] {ssh.py:478} INFO -   at 
org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:352)
   [2025-03-11T01:01:27.123+0000] {ssh.py:478} INFO -   at 
org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1548)
   [2025-03-11T01:01:27.123+0000] {ssh.py:478} INFO -   at 
org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1458)
   [2025-03-11T01:01:27.123+0000] {ssh.py:478} INFO -   at 
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1522)
   [2025-03-11T01:01:27.123+0000] {ssh.py:478} INFO -   at 
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1349)
   [2025-03-11T01:01:27.123+0000] {ssh.py:478} INFO -   at 
org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:378)
   [2025-03-11T01:01:27.123+0000] {ssh.py:478} INFO -   at 
org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
   [2025-03-11T01:01:27.123+0000] {ssh.py:478} INFO -   at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
   [2025-03-11T01:01:27.123+0000] {ssh.py:478} INFO -   at 
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
   [2025-03-11T01:01:27.124+0000] {ssh.py:478} INFO -   at 
org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
   [2025-03-11T01:01:27.124+0000] {ssh.py:478} INFO -   at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
   [2025-03-11T01:01:27.124+0000] {ssh.py:478} INFO -   at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
   [2025-03-11T01:01:27.124+0000] {ssh.py:478} INFO -   at 
org.apache.spark.scheduler.Task.run(Task.scala:139)
   [2025-03-11T01:01:27.124+0000] {ssh.py:478} INFO -   at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
   [2025-03-11T01:01:27.124+0000] {ssh.py:478} INFO -   at 
org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
   [2025-03-11T01:01:27.124+0000] {ssh.py:478} INFO -   at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
   [2025-03-11T01:01:27.124+0000] {ssh.py:478} INFO -   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   [2025-03-11T01:01:27.124+0000] {ssh.py:478} INFO -   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   [2025-03-11T01:01:27.125+0000] {ssh.py:478} INFO -   at 
java.lang.Thread.run(Thread.java:750)
   [2025-03-11T01:01:27.125+0000] {ssh.py:478} INFO - 
   [2025-03-11T01:01:27.125+0000] {ssh.py:478} INFO - Driver stacktrace:
   [2025-03-11T01:01:27.125+0000] {ssh.py:478} INFO -   at 
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2790)
   [2025-03-11T01:01:27.125+0000] {ssh.py:478} INFO -   at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2726)
   [2025-03-11T01:01:27.125+0000] {ssh.py:478} INFO -   at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2725)
   [2025-03-11T01:01:27.125+0000] {ssh.py:478} INFO -   at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
   [2025-03-11T01:01:27.125+0000] {ssh.py:478} INFO -   at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
   [2025-03-11T01:01:27.125+0000] {ssh.py:478} INFO -   at 
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
   [2025-03-11T01:01:27.125+0000] {ssh.py:478} INFO -   at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2725)
   [2025-03-11T01:01:27.126+0000] {ssh.py:478} INFO -   at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1211)
   [2025-03-11T01:01:27.126+0000] {ssh.py:478} INFO -   at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1211)
   [2025-03-11T01:01:27.126+0000] {ssh.py:478} INFO -   at 
scala.Option.foreach(Option.scala:407)
   [2025-03-11T01:01:27.126+0000] {ssh.py:478} INFO -   at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1211)
   [2025-03-11T01:01:27.126+0000] {ssh.py:478} INFO -   at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2989)
   [2025-03-11T01:01:27.126+0000] {ssh.py:478} INFO -   at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2928)
   [2025-03-11T01:01:27.126+0000] {ssh.py:478} INFO -   at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2917)
   [2025-03-11T01:01:27.126+0000] {ssh.py:478} INFO -   at 
org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
   [2025-03-11T01:01:27.126+0000] {ssh.py:478} INFO -   at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:976)
   [2025-03-11T01:01:27.127+0000] {ssh.py:478} INFO -   at 
org.apache.spark.SparkContext.runJob(SparkContext.scala:2258)
   [2025-03-11T01:01:27.127+0000] {ssh.py:478} INFO -   at 
org.apache.spark.SparkContext.runJob(SparkContext.scala:2279)
   [2025-03-11T01:01:27.127+0000] {ssh.py:478} INFO -   at 
org.apache.spark.SparkContext.runJob(SparkContext.scala:2298)
   [2025-03-11T01:01:27.127+0000] {ssh.py:478} INFO -   at 
org.apache.spark.SparkContext.runJob(SparkContext.scala:2323)
   [2025-03-11T01:01:27.127+0000] {ssh.py:478} INFO -   at 
org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1022)
   [2025-03-11T01:01:27.127+0000] {ssh.py:478} INFO -   at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   [2025-03-11T01:01:27.127+0000] {ssh.py:478} INFO -   at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
   [2025-03-11T01:01:27.127+0000] {ssh.py:478} INFO -   at 
org.apache.spark.rdd.RDD.withScope(RDD.scala:408)
   [2025-03-11T01:01:27.127+0000] {ssh.py:478} INFO -   at 
org.apache.spark.rdd.RDD.collect(RDD.scala:1021)
   [2025-03-11T01:01:27.127+0000] {ssh.py:478} INFO -   at 
org.apache.spark.api.java.JavaRDDLike.collect(JavaRDDLike.scala:362)
   [2025-03-11T01:01:27.128+0000] {ssh.py:478} INFO -   at 
org.apache.spark.api.java.JavaRDDLike.collect$(JavaRDDLike.scala:361)
   [2025-03-11T01:01:27.128+0000] {ssh.py:478} INFO -   at 
org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
   [2025-03-11T01:01:27.128+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.data.HoodieJavaRDD.collectAsList(HoodieJavaRDD.java:177)
   [2025-03-11T01:01:27.128+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.table.action.compact.RunCompactionActionExecutor.execute(RunCompactionActionExecutor.java:113)
   [2025-03-11T01:01:27.128+0000] {ssh.py:478} INFO -   ... 58 more
   [2025-03-11T01:01:27.128+0000] {ssh.py:478} INFO - Caused by: 
java.lang.NoSuchMethodError: 
org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()Lorg/apache/hadoop/hdfs/DFSInputStream$ReadStatistics;
   [2025-03-11T01:01:27.128+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.updateInputStreamStatistics(FSDataInputStreamWrapper.java:249)
   [2025-03-11T01:01:27.128+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.close(FSDataInputStreamWrapper.java:296)
   [2025-03-11T01:01:27.128+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.closeStreams(HFileBlock.java:1825)
   [2025-03-11T01:01:27.129+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFilePreadReader.close(HFilePreadReader.java:107)
   [2025-03-11T01:01:27.129+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.close(HFileReaderImpl.java:1421)
   [2025-03-11T01:01:27.129+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.io.storage.HoodieAvroHFileReader$RecordIterator.close(HoodieAvroHFileReader.java:725)
   [2025-03-11T01:01:27.129+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.common.util.collection.CloseableMappingIterator.close(CloseableMappingIterator.java:35)
   [2025-03-11T01:01:27.129+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.common.util.queue.SimpleExecutor.shutdownNow(SimpleExecutor.java:83)
   [2025-03-11T01:01:27.129+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:154)
   [2025-03-11T01:01:27.129+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleUpdateInternal(HoodieSparkCopyOnWriteTable.java:252)
   [2025-03-11T01:01:27.129+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleUpdate(HoodieSparkCopyOnWriteTable.java:235)
   [2025-03-11T01:01:27.129+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.table.action.compact.CompactionExecutionHelper.writeFileAndGetWriteStats(CompactionExecutionHelper.java:64)
   [2025-03-11T01:01:27.129+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:237)
   [2025-03-11T01:01:27.130+0000] {ssh.py:478} INFO -   at 
org.apache.hudi.table.action.compact.HoodieCompactor.lambda$compact$988df80a$1(HoodieCompactor.java:132)
   [2025-03-11T01:01:27.130+0000] {ssh.py:478} INFO -   at 
org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
   [2025-03-11T01:01:27.130+0000] {ssh.py:478} INFO -   at 
scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
   [2025-03-11T01:01:27.130+0000] {ssh.py:478} INFO -   at 
scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
   [2025-03-11T01:01:27.130+0000] {ssh.py:478} INFO -   at 
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
   [2025-03-11T01:01:27.130+0000] {ssh.py:478} INFO -   at 
org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:223)
   [2025-03-11T01:01:27.130+0000] {ssh.py:478} INFO -   at 
org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:352)
   [2025-03-11T01:01:27.130+0000] {ssh.py:478} INFO -   at 
org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1548)
   [2025-03-11T01:01:27.130+0000] {ssh.py:478} INFO -   at 
org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1458)
   [2025-03-11T01:01:27.131+0000] {ssh.py:478} INFO -   at 
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1522)
   [2025-03-11T01:01:27.131+0000] {ssh.py:478} INFO -   at 
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1349)
   [2025-03-11T01:01:27.131+0000] {ssh.py:478} INFO -   at 
org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:378)
   [2025-03-11T01:01:27.131+0000] {ssh.py:478} INFO -   at 
org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
   [2025-03-11T01:01:27.131+0000] {ssh.py:478} INFO -   at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
   [2025-03-11T01:01:27.131+0000] {ssh.py:478} INFO -   at 
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
   [2025-03-11T01:01:27.131+0000] {ssh.py:478} INFO -   at 
org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
   [2025-03-11T01:01:27.131+0000] {ssh.py:478} INFO -   at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
   [2025-03-11T01:01:27.131+0000] {ssh.py:478} INFO -   at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
   [2025-03-11T01:01:27.131+0000] {ssh.py:478} INFO -   at 
org.apache.spark.scheduler.Task.run(Task.scala:139)
   [2025-03-11T01:01:27.132+0000] {ssh.py:478} INFO -   at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
   [2025-03-11T01:01:27.132+0000] {ssh.py:478} INFO -   at 
org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
   [2025-03-11T01:01:27.132+0000] {ssh.py:478} INFO -   at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
   [2025-03-11T01:01:27.132+0000] {ssh.py:478} INFO -   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   [2025-03-11T01:01:27.132+0000] {ssh.py:478} INFO -   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   [2025-03-11T01:01:27.132+0000] {ssh.py:478} INFO -   ... 1 more
   [2025-03-11T01:01:27.159+0000] {ssh.py:483} WARNING - 25/03/11 01:01:27 INFO 
SparkContext: SparkContext is stopping with exitCode 0.
   
   **Add the stacktrace of the error.**
   
   [2025-03-11T01:01:27.120+0000] {ssh.py:478} INFO - Caused by: 
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID 
7) (sdpproddn01.techsophy.com executor 1): java.lang.NoSuchMethodError: 
org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()Lorg/apache/hadoop/hdfs/DFSInputStream$ReadStatistics;
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to