hudi-bot opened a new issue, #15258:
URL: https://github.com/apache/hudi/issues/15258

   When trying to upsert into a dataset with Meta Fields being disabled, you 
will encounter obscure NPE like below:
   {code:java}
   Caused by: org.apache.spark.SparkException: Job aborted due to stage 
failure: Task 25 in stage 20.0 failed 4 times, most recent failure: Lost task 
25.3 in stage 20.0 (TID 4110) (ip-172-31-20-53.us-west-2.compute.internal 
executor 7): java.lang.RuntimeException: 
org.apache.hudi.exception.HoodieIndexException: Error checking bloom filter 
index.
           at 
org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121)
           at 
scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:46)
           at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
           at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
           at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:513)
           at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
           at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140)
           at 
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
           at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
           at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
           at org.apache.spark.scheduler.Task.run(Task.scala:131)
           at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
           at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:750)
   Caused by: org.apache.hudi.exception.HoodieIndexException: Error checking 
bloom filter index.
           at 
org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:110)
           at 
org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:60)
           at 
org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:119)
           ... 16 more
   Caused by: java.lang.NullPointerException
           at 
org.apache.hudi.io.HoodieKeyLookupHandle.addKey(HoodieKeyLookupHandle.java:88)
           at 
org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:92)
           ... 18 more {code}
   Instead, we could be more explicit as to why this could have happened 
(meta-fields disabled -> no bloom filter created -> unable to do upserts)
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-4330
   - Type: Bug
   - Fix version(s):
     - 1.1.0
   
   
   ---
   
   
   ## Comments
   
   28/Jun/22 10:18;xichaomin;Currently, bloom filter depend on 
"hoodie.populate.meta.fields", If "hoodie.populate.meta.fields" is false, we 
won't write bloom filter to the footer. 
   
   Some code:
   
   
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/storage/HoodieAvroParquetWriter.java
   {code:java}
     @Override
     public void writeAvroWithMetadata(HoodieKey key, R avroRecord) throws 
IOException {
       if (populateMetaFields) {
         prepRecordWithMetadata(key, avroRecord, instantTime,
             taskContextSupplier.getPartitionIdSupplier().get(), 
getWrittenRecordCount(), fileName);
         super.write(avroRecord);
         writeSupport.add(key.getRecordKey());
       } else {
         super.write(avroRecord);
       }
     }  
     @Override
     public void writeAvro(String key, IndexedRecord object) throws IOException 
{
       super.write(object);
       if (populateMetaFields) {
         writeSupport.add(key);
       }
     } {code}
    
   
   
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/storage/HoodieFileWriterFactory.java
   {code:java}
   private static <T extends HoodieRecordPayload, R extends IndexedRecord> 
HoodieFileWriter<R> newParquetFileWriter(
       String instantTime, Path path, HoodieWriteConfig config, Schema schema, 
HoodieTable hoodieTable,
       TaskContextSupplier taskContextSupplier, boolean populateMetaFields) 
throws IOException {
     return newParquetFileWriter(instantTime, path, config, schema, 
hoodieTable.getHadoopConf(),
         taskContextSupplier, populateMetaFields, populateMetaFields);
   }
   
   private static <T extends HoodieRecordPayload, R extends IndexedRecord> 
HoodieFileWriter<R> newParquetFileWriter(
       String instantTime, Path path, HoodieWriteConfig config, Schema schema, 
Configuration conf,
       TaskContextSupplier taskContextSupplier, boolean populateMetaFields, 
boolean enableBloomFilter) throws IOException {
     Option<BloomFilter> filter = enableBloomFilter ? 
Option.of(createBloomFilter(config)) : Option.empty();
     HoodieAvroWriteSupport writeSupport = new HoodieAvroWriteSupport(new 
AvroSchemaConverter(conf).convert(schema), schema, filter);
   
     HoodieParquetConfig<HoodieAvroWriteSupport> parquetConfig = new 
HoodieParquetConfig<>(writeSupport, config.getParquetCompressionCodec(),
         config.getParquetBlockSize(), config.getParquetPageSize(), 
config.getParquetMaxFileSize(),
         conf, config.getParquetCompressionRatio(), 
config.parquetDictionaryEnabled());
   
     return new HoodieAvroParquetWriter<>(path, parquetConfig, instantTime, 
taskContextSupplier, populateMetaFields);
   }{code};;;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to