Jakub Kubala created HUDI-2424:
----------------------------------

             Summary: Error checking bloom filter index (NPE)
                 Key: HUDI-2424
                 URL: https://issues.apache.org/jira/browse/HUDI-2424
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Jakub Kubala


Hi,

Recently we have encountered an issue with Hudi where NPE is thrown out of 
nowhere during processing the content.

As we have over 100k of the content to process, I cannot easily narrow down to 
what is the troublesome piece.

We are using configurations that come with AWS EMR v5.30 (Hudi 0.5.2) and 
v5.33(Hudi 0.7.0)

 
{code:java}
21/09/10 18:31:14 WARN TaskSetManager: Lost task 1.0 in stage 38.0 (TID 23804, 
ip-10-208-160-140.eu-central-1.compute.internal, executor 2): 
java.lang.RuntimeException: org.apache.hudi.exception.HoodieIndexException: 
Error checking bloom filter index. at 
org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121)
 at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43) 
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435) at 
scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441) at 
scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462) at 
scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:154)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) 
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) 
at org.apache.spark.scheduler.Task.run(Task.scala:123) at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
 at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405) at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748) Caused by: 
org.apache.hudi.exception.HoodieIndexException: Error checking bloom filter 
index. at 
org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:110)
 at 
org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:60)
 at 
org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:119)
 ... 15 more Caused by: java.lang.NullPointerException at 
org.apache.hudi.io.HoodieKeyLookupHandle.addKey(HoodieKeyLookupHandle.java:99) 
at 
org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:97)
 ... 17 more21/09/10 18:31:14 INFO TaskSetManager: Starting task 1.1 in stage 
38.0 (TID 23805, ip-10-208-160-140.eu-central-1.compute.internal, executor 1, 
partition 1, NODE_LOCAL, 7662 bytes) 21/09/10 18:31:18 INFO TaskSetManager: 
Lost task 1.1 in stage 38.0 (TID 23805) on 
ip-10-208-160-140.eu-central-1.compute.internal, executor 1: 
java.lang.RuntimeException (org.apache.hudi.exception.HoodieIndexException: 
Error checking bloom filter index. ) [duplicate 1] 21/09/10 18:31:18 INFO 
TaskSetManager: Starting task 1.2 in stage 38.0 (TID 23806, 
ip-10-208-160-140.eu-central-1.compute.internal, executor 1, partition 1, 
NODE_LOCAL, 7662 bytes) 21/09/10 18:31:21 INFO TaskSetManager: Lost task 1.2 in 
stage 38.0 (TID 23806) on ip-10-208-160-140.eu-central-1.compute.internal, 
executor 1: java.lang.RuntimeException 
(org.apache.hudi.exception.HoodieIndexException: Error checking bloom filter 
index. ) [duplicate 2] 21/09/10 18:31:21 INFO TaskSetManager: Starting task 1.3 
in stage 38.0 (TID 23807, ip-10-208-160-140.eu-central-1.compute.internal, 
executor 2, partition 1, NODE_LOCAL, 7662 bytes) 21/09/10 18:31:25 WARN 
TaskSetManager: Lost task 1.3 in stage 38.0 (TID 23807, 
ip-10-208-160-140.eu-central-1.compute.internal, executor 2): 
java.lang.RuntimeException: org.apache.hudi.exception.HoodieIndexException: 
Error checking bloom filter index. at 
org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121)
 at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43) 
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435) at 
scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441) at 
scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462) at 
scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:154)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) 
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) 
at org.apache.spark.scheduler.Task.run(Task.scala:123) at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
 at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405) at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748) Caused by: 
org.apache.hudi.exception.HoodieIndexException: Error checking bloom filter 
index. at 
org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:110)
 at 
org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:60)
 at 
org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:119)
 ... 15 more Caused by: java.lang.NullPointerException at 
org.apache.hudi.io.HoodieKeyLookupHandle.addKey(HoodieKeyLookupHandle.java:99) 
at 
org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:97)
 ... 17 more21/09/10 18:31:25 ERROR TaskSetManager: Task 1 in stage 38.0 failed 
4 times; aborting job 21/09/10 18:31:25 INFO YarnScheduler: Cancelling stage 38 
21/09/10 18:31:25 INFO YarnScheduler: Killing all running tasks in stage 38: 
Stage cancelled 21/09/10 18:31:25 INFO YarnScheduler: Stage 38 was cancelled 
21/09/10 18:31:25 INFO DAGScheduler: ShuffleMapStage 38 (flatMapToPair at 
HoodieBloomIndex.java:308) failed in 15.973 s due to Job aborted due to stage 
failure: Task 1 in stage 38.0 failed 4 times, most recent failure: Lost task 
1.3 in stage 38.0 (TID 23807, ip-10-208-160-140.eu-central-1.compute.internal, 
executor 2): java.lang.RuntimeException: 
org.apache.hudi.exception.HoodieIndexException: Error checking bloom filter 
index. at 
org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121)
 at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43) 
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435) at 
scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441) at 
scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462) at 
scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:154)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) 
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) 
at org.apache.spark.scheduler.Task.run(Task.scala:123) at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
 at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405) at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748) Caused by: 
org.apache.hudi.exception.HoodieIndexException: Error checking bloom filter 
index. at 
org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:110)
 at 
org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:60)
 at 
org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:119)
 ... 15 more Caused by: java.lang.NullPointerException at 
org.apache.hudi.io.HoodieKeyLookupHandle.addKey(HoodieKeyLookupHandle.java:99) 
at 
org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:97)
 ... 17 more
{code}
Can you help me with this?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to