[ 
https://issues.apache.org/jira/browse/HUDI-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2424:
--------------------------------------
    Labels: core-flow-ds sev:high user-support-issues  (was: 
user-support-issues)

> Error checking bloom filter index (NPE)
> ---------------------------------------
>
>                 Key: HUDI-2424
>                 URL: https://issues.apache.org/jira/browse/HUDI-2424
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Jakub Kubala
>            Priority: Major
>              Labels: core-flow-ds, sev:high, user-support-issues
>
> Hi,
> Recently we have encountered an issue with Hudi where NPE is thrown out of 
> nowhere during processing the content.
> As we have over 100k of the content to process, I cannot easily narrow down 
> to what is the troublesome piece.
> We are using configurations that come with AWS EMR v5.30 (Hudi 0.5.2) and 
> v5.33(Hudi 0.7.0)
>  
> {code:java}
> 21/09/10 18:31:14 WARN TaskSetManager: Lost task 1.0 in stage 38.0 (TID 
> 23804, ip-10-208-160-140.eu-central-1.compute.internal, executor 2): 
> java.lang.RuntimeException: org.apache.hudi.exception.HoodieIndexException: 
> Error checking bloom filter index. at 
> org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121)
>  at 
> scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43) at 
> scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435) at 
> scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441) at 
> scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462) at 
> scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:154)
>  at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at 
> org.apache.spark.scheduler.Task.run(Task.scala:123) at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748) Caused by: 
> org.apache.hudi.exception.HoodieIndexException: Error checking bloom filter 
> index. at 
> org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:110)
>  at 
> org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:60)
>  at 
> org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:119)
>  ... 15 more Caused by: java.lang.NullPointerException at 
> org.apache.hudi.io.HoodieKeyLookupHandle.addKey(HoodieKeyLookupHandle.java:99)
>  at 
> org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:97)
>  ... 17 more21/09/10 18:31:14 INFO TaskSetManager: Starting task 1.1 in stage 
> 38.0 (TID 23805, ip-10-208-160-140.eu-central-1.compute.internal, executor 1, 
> partition 1, NODE_LOCAL, 7662 bytes) 21/09/10 18:31:18 INFO TaskSetManager: 
> Lost task 1.1 in stage 38.0 (TID 23805) on 
> ip-10-208-160-140.eu-central-1.compute.internal, executor 1: 
> java.lang.RuntimeException (org.apache.hudi.exception.HoodieIndexException: 
> Error checking bloom filter index. ) [duplicate 1] 21/09/10 18:31:18 INFO 
> TaskSetManager: Starting task 1.2 in stage 38.0 (TID 23806, 
> ip-10-208-160-140.eu-central-1.compute.internal, executor 1, partition 1, 
> NODE_LOCAL, 7662 bytes) 21/09/10 18:31:21 INFO TaskSetManager: Lost task 1.2 
> in stage 38.0 (TID 23806) on ip-10-208-160-140.eu-central-1.compute.internal, 
> executor 1: java.lang.RuntimeException 
> (org.apache.hudi.exception.HoodieIndexException: Error checking bloom filter 
> index. ) [duplicate 2] 21/09/10 18:31:21 INFO TaskSetManager: Starting task 
> 1.3 in stage 38.0 (TID 23807, 
> ip-10-208-160-140.eu-central-1.compute.internal, executor 2, partition 1, 
> NODE_LOCAL, 7662 bytes) 21/09/10 18:31:25 WARN TaskSetManager: Lost task 1.3 
> in stage 38.0 (TID 23807, ip-10-208-160-140.eu-central-1.compute.internal, 
> executor 2): java.lang.RuntimeException: 
> org.apache.hudi.exception.HoodieIndexException: Error checking bloom filter 
> index. at 
> org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121)
>  at 
> scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43) at 
> scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435) at 
> scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441) at 
> scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462) at 
> scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:154)
>  at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at 
> org.apache.spark.scheduler.Task.run(Task.scala:123) at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748) Caused by: 
> org.apache.hudi.exception.HoodieIndexException: Error checking bloom filter 
> index. at 
> org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:110)
>  at 
> org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:60)
>  at 
> org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:119)
>  ... 15 more Caused by: java.lang.NullPointerException at 
> org.apache.hudi.io.HoodieKeyLookupHandle.addKey(HoodieKeyLookupHandle.java:99)
>  at 
> org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:97)
>  ... 17 more21/09/10 18:31:25 ERROR TaskSetManager: Task 1 in stage 38.0 
> failed 4 times; aborting job 21/09/10 18:31:25 INFO YarnScheduler: Cancelling 
> stage 38 21/09/10 18:31:25 INFO YarnScheduler: Killing all running tasks in 
> stage 38: Stage cancelled 21/09/10 18:31:25 INFO YarnScheduler: Stage 38 was 
> cancelled 21/09/10 18:31:25 INFO DAGScheduler: ShuffleMapStage 38 
> (flatMapToPair at HoodieBloomIndex.java:308) failed in 15.973 s due to Job 
> aborted due to stage failure: Task 1 in stage 38.0 failed 4 times, most 
> recent failure: Lost task 1.3 in stage 38.0 (TID 23807, 
> ip-10-208-160-140.eu-central-1.compute.internal, executor 2): 
> java.lang.RuntimeException: org.apache.hudi.exception.HoodieIndexException: 
> Error checking bloom filter index. at 
> org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121)
>  at 
> scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43) at 
> scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435) at 
> scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441) at 
> scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462) at 
> scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:154)
>  at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at 
> org.apache.spark.scheduler.Task.run(Task.scala:123) at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748) Caused by: 
> org.apache.hudi.exception.HoodieIndexException: Error checking bloom filter 
> index. at 
> org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:110)
>  at 
> org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:60)
>  at 
> org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:119)
>  ... 15 more Caused by: java.lang.NullPointerException at 
> org.apache.hudi.io.HoodieKeyLookupHandle.addKey(HoodieKeyLookupHandle.java:99)
>  at 
> org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:97)
>  ... 17 more
> {code}
> Can you help me with this?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to