I agree with Jim that we might discover more when implementing reader/writer and there should be no major change for parquet-format because:
what type of bloom filter to use? We use block-based Bloom filter now and no major changes if we plan to support others. Just add it to defined algorithm union. where to add them in the file? At beginning of row group. This is defined by offset specific in column chunk metadata so at least there is no change for parquet-format if we want to add it in different places. what thrift object should contain? The thrift definition now contains enough information to read a block-based bloom filter, it might need to add other info if we plan to support other type bloom filters in future. I can submit reader/writer PR in java side make this clear once we finish bloom filter utility PR in java side. Jim Apple <[email protected]> 于2018年9月1日周六 上午12:26写道: > On 2018/08/30 19:41:59, Ryan Blue <[email protected]> wrote: > > Jim, do you think that the implementation is going to make major changes > to > > the design of how bloom filters are stored in files? > > I don't foresee any problems with the current layout. > -- Thanks & Best Regards
