hudi-bot opened a new issue, #15258:
URL: https://github.com/apache/hudi/issues/15258
When trying to upsert into a dataset with Meta Fields being disabled, you
will encounter obscure NPE like below:
{code:java}
Caused by: org.apache.spark.SparkException: Job aborted due to stage
failure: Task 25 in stage 20.0 failed 4 times, most recent failure: Lost task
25.3 in stage 20.0 (TID 4110) (ip-172-31-20-53.us-west-2.compute.internal
executor 7): java.lang.RuntimeException:
org.apache.hudi.exception.HoodieIndexException: Error checking bloom filter
index.
at
org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121)
at
scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:46)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:513)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
at
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140)
at
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.hudi.exception.HoodieIndexException: Error checking
bloom filter index.
at
org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:110)
at
org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:60)
at
org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:119)
... 16 more
Caused by: java.lang.NullPointerException
at
org.apache.hudi.io.HoodieKeyLookupHandle.addKey(HoodieKeyLookupHandle.java:88)
at
org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:92)
... 18 more {code}
Instead, we could be more explicit as to why this could have happened
(meta-fields disabled -> no bloom filter created -> unable to do upserts)
## JIRA info
- Link: https://issues.apache.org/jira/browse/HUDI-4330
- Type: Bug
- Fix version(s):
- 1.1.0
---
## Comments
28/Jun/22 10:18;xichaomin;Currently, bloom filter depend on
"hoodie.populate.meta.fields", If "hoodie.populate.meta.fields" is false, we
won't write bloom filter to the footer.
Some code:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/storage/HoodieAvroParquetWriter.java
{code:java}
@Override
public void writeAvroWithMetadata(HoodieKey key, R avroRecord) throws
IOException {
if (populateMetaFields) {
prepRecordWithMetadata(key, avroRecord, instantTime,
taskContextSupplier.getPartitionIdSupplier().get(),
getWrittenRecordCount(), fileName);
super.write(avroRecord);
writeSupport.add(key.getRecordKey());
} else {
super.write(avroRecord);
}
}
@Override
public void writeAvro(String key, IndexedRecord object) throws IOException
{
super.write(object);
if (populateMetaFields) {
writeSupport.add(key);
}
} {code}
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/storage/HoodieFileWriterFactory.java
{code:java}
private static <T extends HoodieRecordPayload, R extends IndexedRecord>
HoodieFileWriter<R> newParquetFileWriter(
String instantTime, Path path, HoodieWriteConfig config, Schema schema,
HoodieTable hoodieTable,
TaskContextSupplier taskContextSupplier, boolean populateMetaFields)
throws IOException {
return newParquetFileWriter(instantTime, path, config, schema,
hoodieTable.getHadoopConf(),
taskContextSupplier, populateMetaFields, populateMetaFields);
}
private static <T extends HoodieRecordPayload, R extends IndexedRecord>
HoodieFileWriter<R> newParquetFileWriter(
String instantTime, Path path, HoodieWriteConfig config, Schema schema,
Configuration conf,
TaskContextSupplier taskContextSupplier, boolean populateMetaFields,
boolean enableBloomFilter) throws IOException {
Option<BloomFilter> filter = enableBloomFilter ?
Option.of(createBloomFilter(config)) : Option.empty();
HoodieAvroWriteSupport writeSupport = new HoodieAvroWriteSupport(new
AvroSchemaConverter(conf).convert(schema), schema, filter);
HoodieParquetConfig<HoodieAvroWriteSupport> parquetConfig = new
HoodieParquetConfig<>(writeSupport, config.getParquetCompressionCodec(),
config.getParquetBlockSize(), config.getParquetPageSize(),
config.getParquetMaxFileSize(),
conf, config.getParquetCompressionRatio(),
config.parquetDictionaryEnabled());
return new HoodieAvroParquetWriter<>(path, parquetConfig, instantTime,
taskContextSupplier, populateMetaFields);
}{code};;;
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]