[
https://issues.apache.org/jira/browse/HUDI-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sivabalan narayanan updated HUDI-2424:
--------------------------------------
Labels: core-flow-ds sev:high user-support-issues (was:
user-support-issues)
> Error checking bloom filter index (NPE)
> ---------------------------------------
>
> Key: HUDI-2424
> URL: https://issues.apache.org/jira/browse/HUDI-2424
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Jakub Kubala
> Priority: Major
> Labels: core-flow-ds, sev:high, user-support-issues
>
> Hi,
> Recently we have encountered an issue with Hudi where NPE is thrown out of
> nowhere during processing the content.
> As we have over 100k of the content to process, I cannot easily narrow down
> to what is the troublesome piece.
> We are using configurations that come with AWS EMR v5.30 (Hudi 0.5.2) and
> v5.33(Hudi 0.7.0)
>
> {code:java}
> 21/09/10 18:31:14 WARN TaskSetManager: Lost task 1.0 in stage 38.0 (TID
> 23804, ip-10-208-160-140.eu-central-1.compute.internal, executor 2):
> java.lang.RuntimeException: org.apache.hudi.exception.HoodieIndexException:
> Error checking bloom filter index. at
> org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121)
> at
> scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43) at
> scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435) at
> scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441) at
> scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462) at
> scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:154)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at
> org.apache.spark.scheduler.Task.run(Task.scala:123) at
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405) at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748) Caused by:
> org.apache.hudi.exception.HoodieIndexException: Error checking bloom filter
> index. at
> org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:110)
> at
> org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:60)
> at
> org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:119)
> ... 15 more Caused by: java.lang.NullPointerException at
> org.apache.hudi.io.HoodieKeyLookupHandle.addKey(HoodieKeyLookupHandle.java:99)
> at
> org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:97)
> ... 17 more21/09/10 18:31:14 INFO TaskSetManager: Starting task 1.1 in stage
> 38.0 (TID 23805, ip-10-208-160-140.eu-central-1.compute.internal, executor 1,
> partition 1, NODE_LOCAL, 7662 bytes) 21/09/10 18:31:18 INFO TaskSetManager:
> Lost task 1.1 in stage 38.0 (TID 23805) on
> ip-10-208-160-140.eu-central-1.compute.internal, executor 1:
> java.lang.RuntimeException (org.apache.hudi.exception.HoodieIndexException:
> Error checking bloom filter index. ) [duplicate 1] 21/09/10 18:31:18 INFO
> TaskSetManager: Starting task 1.2 in stage 38.0 (TID 23806,
> ip-10-208-160-140.eu-central-1.compute.internal, executor 1, partition 1,
> NODE_LOCAL, 7662 bytes) 21/09/10 18:31:21 INFO TaskSetManager: Lost task 1.2
> in stage 38.0 (TID 23806) on ip-10-208-160-140.eu-central-1.compute.internal,
> executor 1: java.lang.RuntimeException
> (org.apache.hudi.exception.HoodieIndexException: Error checking bloom filter
> index. ) [duplicate 2] 21/09/10 18:31:21 INFO TaskSetManager: Starting task
> 1.3 in stage 38.0 (TID 23807,
> ip-10-208-160-140.eu-central-1.compute.internal, executor 2, partition 1,
> NODE_LOCAL, 7662 bytes) 21/09/10 18:31:25 WARN TaskSetManager: Lost task 1.3
> in stage 38.0 (TID 23807, ip-10-208-160-140.eu-central-1.compute.internal,
> executor 2): java.lang.RuntimeException:
> org.apache.hudi.exception.HoodieIndexException: Error checking bloom filter
> index. at
> org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121)
> at
> scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43) at
> scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435) at
> scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441) at
> scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462) at
> scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:154)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at
> org.apache.spark.scheduler.Task.run(Task.scala:123) at
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405) at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748) Caused by:
> org.apache.hudi.exception.HoodieIndexException: Error checking bloom filter
> index. at
> org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:110)
> at
> org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:60)
> at
> org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:119)
> ... 15 more Caused by: java.lang.NullPointerException at
> org.apache.hudi.io.HoodieKeyLookupHandle.addKey(HoodieKeyLookupHandle.java:99)
> at
> org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:97)
> ... 17 more21/09/10 18:31:25 ERROR TaskSetManager: Task 1 in stage 38.0
> failed 4 times; aborting job 21/09/10 18:31:25 INFO YarnScheduler: Cancelling
> stage 38 21/09/10 18:31:25 INFO YarnScheduler: Killing all running tasks in
> stage 38: Stage cancelled 21/09/10 18:31:25 INFO YarnScheduler: Stage 38 was
> cancelled 21/09/10 18:31:25 INFO DAGScheduler: ShuffleMapStage 38
> (flatMapToPair at HoodieBloomIndex.java:308) failed in 15.973 s due to Job
> aborted due to stage failure: Task 1 in stage 38.0 failed 4 times, most
> recent failure: Lost task 1.3 in stage 38.0 (TID 23807,
> ip-10-208-160-140.eu-central-1.compute.internal, executor 2):
> java.lang.RuntimeException: org.apache.hudi.exception.HoodieIndexException:
> Error checking bloom filter index. at
> org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121)
> at
> scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43) at
> scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435) at
> scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441) at
> scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462) at
> scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:154)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at
> org.apache.spark.scheduler.Task.run(Task.scala:123) at
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405) at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748) Caused by:
> org.apache.hudi.exception.HoodieIndexException: Error checking bloom filter
> index. at
> org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:110)
> at
> org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:60)
> at
> org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:119)
> ... 15 more Caused by: java.lang.NullPointerException at
> org.apache.hudi.io.HoodieKeyLookupHandle.addKey(HoodieKeyLookupHandle.java:99)
> at
> org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:97)
> ... 17 more
> {code}
> Can you help me with this?
--
This message was sent by Atlassian Jira
(v8.20.1#820001)