[ 
https://issues.apache.org/jira/browse/HUDI-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5323:
----------------------------
    Description: 
When the virtual key feature is enabled by setting hoodie.populate.meta.fields 
to false, the bloom filters are not written to parquet base files in the write 
transactions.  Relevant logic in HoodieFileWriterFactory class:
{code:java}
private static <T extends HoodieRecordPayload, R extends IndexedRecord> 
HoodieFileWriter<R> newParquetFileWriter(
    String instantTime, Path path, HoodieWriteConfig config, Schema schema, 
HoodieTable hoodieTable,
    TaskContextSupplier taskContextSupplier, boolean populateMetaFields) throws 
IOException {
  return newParquetFileWriter(instantTime, path, config, schema, 
hoodieTable.getHadoopConf(),
      taskContextSupplier, populateMetaFields, populateMetaFields);
}

private static <T extends HoodieRecordPayload, R extends IndexedRecord> 
HoodieFileWriter<R> newParquetFileWriter(
    String instantTime, Path path, HoodieWriteConfig config, Schema schema, 
Configuration conf,
    TaskContextSupplier taskContextSupplier, boolean populateMetaFields, 
boolean enableBloomFilter) throws IOException {
  Option<BloomFilter> filter = enableBloomFilter ? 
Option.of(createBloomFilter(config)) : Option.empty();
  HoodieAvroWriteSupport writeSupport = new HoodieAvroWriteSupport(new 
AvroSchemaConverter(conf).convert(schema), schema, filter);

  HoodieParquetConfig<HoodieAvroWriteSupport> parquetConfig = new 
HoodieParquetConfig<>(writeSupport, config.getParquetCompressionCodec(),
      config.getParquetBlockSize(), config.getParquetPageSize(), 
config.getParquetMaxFileSize(),
      conf, config.getParquetCompressionRatio(), 
config.parquetDictionaryEnabled());

  return new HoodieAvroParquetWriter<>(path, parquetConfig, instantTime, 
taskContextSupplier, populateMetaFields);
} {code}
 

> Decouple virtual key with writing bloom filters to parquet files
> ----------------------------------------------------------------
>
>                 Key: HUDI-5323
>                 URL: https://issues.apache.org/jira/browse/HUDI-5323
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: index, writer-core
>            Reporter: Ethan Guo
>            Priority: Critical
>             Fix For: 0.13.0
>
>
> When the virtual key feature is enabled by setting 
> hoodie.populate.meta.fields to false, the bloom filters are not written to 
> parquet base files in the write transactions.  Relevant logic in 
> HoodieFileWriterFactory class:
> {code:java}
> private static <T extends HoodieRecordPayload, R extends IndexedRecord> 
> HoodieFileWriter<R> newParquetFileWriter(
>     String instantTime, Path path, HoodieWriteConfig config, Schema schema, 
> HoodieTable hoodieTable,
>     TaskContextSupplier taskContextSupplier, boolean populateMetaFields) 
> throws IOException {
>   return newParquetFileWriter(instantTime, path, config, schema, 
> hoodieTable.getHadoopConf(),
>       taskContextSupplier, populateMetaFields, populateMetaFields);
> }
> private static <T extends HoodieRecordPayload, R extends IndexedRecord> 
> HoodieFileWriter<R> newParquetFileWriter(
>     String instantTime, Path path, HoodieWriteConfig config, Schema schema, 
> Configuration conf,
>     TaskContextSupplier taskContextSupplier, boolean populateMetaFields, 
> boolean enableBloomFilter) throws IOException {
>   Option<BloomFilter> filter = enableBloomFilter ? 
> Option.of(createBloomFilter(config)) : Option.empty();
>   HoodieAvroWriteSupport writeSupport = new HoodieAvroWriteSupport(new 
> AvroSchemaConverter(conf).convert(schema), schema, filter);
>   HoodieParquetConfig<HoodieAvroWriteSupport> parquetConfig = new 
> HoodieParquetConfig<>(writeSupport, config.getParquetCompressionCodec(),
>       config.getParquetBlockSize(), config.getParquetPageSize(), 
> config.getParquetMaxFileSize(),
>       conf, config.getParquetCompressionRatio(), 
> config.parquetDictionaryEnabled());
>   return new HoodieAvroParquetWriter<>(path, parquetConfig, instantTime, 
> taskContextSupplier, populateMetaFields);
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to