[ 
https://issues.apache.org/jira/browse/HUDI-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494459#comment-17494459
 ] 

Jakub Kubala commented on HUDI-2424:
------------------------------------

[~shivnarayan] I don't think that the key could be null or empty, we filtered 
it. We noticed however the same problem after trying to manually remove the 
column from parquet files so maybe this is a case. Still, it could be thrown in 
a more descriptive manner

> Error checking bloom filter index (NPE)
> ---------------------------------------
>
>                 Key: HUDI-2424
>                 URL: https://issues.apache.org/jira/browse/HUDI-2424
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Jakub Kubala
>            Priority: Major
>              Labels: core-flow-ds, sev:high, user-support-issues
>
> Hi,
> Recently we have encountered an issue with Hudi where NPE is thrown out of 
> nowhere during processing the content.
> As we have over 100k of the content to process, I cannot easily narrow down 
> to what is the troublesome piece.
> We are using configurations that come with AWS EMR v5.30 (Hudi 0.5.2) and 
> v5.33(Hudi 0.7.0)
>  
> {code:java}
> 21/09/10 18:31:14 WARN TaskSetManager: Lost task 1.0 in stage 38.0 (TID 
> 23804, ip-10-208-160-140.eu-central-1.compute.internal, executor 2): 
> java.lang.RuntimeException: org.apache.hudi.exception.HoodieIndexException: 
> Error checking bloom filter index. at 
> org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121)
>  at 
> scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43) at 
> scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435) at 
> scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441) at 
> scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462) at 
> scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:154)
>  at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at 
> org.apache.spark.scheduler.Task.run(Task.scala:123) at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748) Caused by: 
> org.apache.hudi.exception.HoodieIndexException: Error checking bloom filter 
> index. at 
> org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:110)
>  at 
> org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:60)
>  at 
> org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:119)
>  ... 15 more Caused by: java.lang.NullPointerException at 
> org.apache.hudi.io.HoodieKeyLookupHandle.addKey(HoodieKeyLookupHandle.java:99)
>  at 
> org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:97)
>  ... 17 more21/09/10 18:31:14 INFO TaskSetManager: Starting task 1.1 in stage 
> 38.0 (TID 23805, ip-10-208-160-140.eu-central-1.compute.internal, executor 1, 
> partition 1, NODE_LOCAL, 7662 bytes) 21/09/10 18:31:18 INFO TaskSetManager: 
> Lost task 1.1 in stage 38.0 (TID 23805) on 
> ip-10-208-160-140.eu-central-1.compute.internal, executor 1: 
> java.lang.RuntimeException (org.apache.hudi.exception.HoodieIndexException: 
> Error checking bloom filter index. ) [duplicate 1] 21/09/10 18:31:18 INFO 
> TaskSetManager: Starting task 1.2 in stage 38.0 (TID 23806, 
> ip-10-208-160-140.eu-central-1.compute.internal, executor 1, partition 1, 
> NODE_LOCAL, 7662 bytes) 21/09/10 18:31:21 INFO TaskSetManager: Lost task 1.2 
> in stage 38.0 (TID 23806) on ip-10-208-160-140.eu-central-1.compute.internal, 
> executor 1: java.lang.RuntimeException 
> (org.apache.hudi.exception.HoodieIndexException: Error checking bloom filter 
> index. ) [duplicate 2] 21/09/10 18:31:21 INFO TaskSetManager: Starting task 
> 1.3 in stage 38.0 (TID 23807, 
> ip-10-208-160-140.eu-central-1.compute.internal, executor 2, partition 1, 
> NODE_LOCAL, 7662 bytes) 21/09/10 18:31:25 WARN TaskSetManager: Lost task 1.3 
> in stage 38.0 (TID 23807, ip-10-208-160-140.eu-central-1.compute.internal, 
> executor 2): java.lang.RuntimeException: 
> org.apache.hudi.exception.HoodieIndexException: Error checking bloom filter 
> index. at 
> org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121)
>  at 
> scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43) at 
> scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435) at 
> scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441) at 
> scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462) at 
> scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:154)
>  at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at 
> org.apache.spark.scheduler.Task.run(Task.scala:123) at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748) Caused by: 
> org.apache.hudi.exception.HoodieIndexException: Error checking bloom filter 
> index. at 
> org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:110)
>  at 
> org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:60)
>  at 
> org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:119)
>  ... 15 more Caused by: java.lang.NullPointerException at 
> org.apache.hudi.io.HoodieKeyLookupHandle.addKey(HoodieKeyLookupHandle.java:99)
>  at 
> org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:97)
>  ... 17 more21/09/10 18:31:25 ERROR TaskSetManager: Task 1 in stage 38.0 
> failed 4 times; aborting job 21/09/10 18:31:25 INFO YarnScheduler: Cancelling 
> stage 38 21/09/10 18:31:25 INFO YarnScheduler: Killing all running tasks in 
> stage 38: Stage cancelled 21/09/10 18:31:25 INFO YarnScheduler: Stage 38 was 
> cancelled 21/09/10 18:31:25 INFO DAGScheduler: ShuffleMapStage 38 
> (flatMapToPair at HoodieBloomIndex.java:308) failed in 15.973 s due to Job 
> aborted due to stage failure: Task 1 in stage 38.0 failed 4 times, most 
> recent failure: Lost task 1.3 in stage 38.0 (TID 23807, 
> ip-10-208-160-140.eu-central-1.compute.internal, executor 2): 
> java.lang.RuntimeException: org.apache.hudi.exception.HoodieIndexException: 
> Error checking bloom filter index. at 
> org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121)
>  at 
> scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43) at 
> scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435) at 
> scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441) at 
> scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462) at 
> scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:154)
>  at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at 
> org.apache.spark.scheduler.Task.run(Task.scala:123) at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748) Caused by: 
> org.apache.hudi.exception.HoodieIndexException: Error checking bloom filter 
> index. at 
> org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:110)
>  at 
> org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:60)
>  at 
> org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:119)
>  ... 15 more Caused by: java.lang.NullPointerException at 
> org.apache.hudi.io.HoodieKeyLookupHandle.addKey(HoodieKeyLookupHandle.java:99)
>  at 
> org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:97)
>  ... 17 more
> {code}
> Can you help me with this?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to