Thanks for the additional context, but I don't quite get why a utility class like this would need to make a call on what the maximum size of a bloom filter should be in the format. That's really a write-side concern. Can we just remove that code from the current PR and discuss it when we are working on how to produce appropriately-configured bloom filters?
On Tue, Jun 26, 2018 at 4:09 PM 俊杰陈 <cjjnj...@gmail.com> wrote: > Hi Ryan, > > The last comment on doc is to provide a benchmark for dictionary vs Bloom > filter, I provided benchmark result here > <https://docs.google.com/spreadsheets/d/1yV3u-P_yY4DtfSty3LPrbhwuJx4cqm_YeK61s2v0OLU/edit?usp=sharing>, > Jim have reviewed this and updated comments on JIRA also. You can access > JIRA <https://issues.apache.org/jira/browse/PARQUET-41> to get latest > status. > > We created some sub tasks for PARQUET-41, and first step [JIRA-1332 > <https://issues.apache.org/jira/browse/PARQUET-1332>] is to implement > Bloom filter utility class itself in parquet-mr and paruqet-cpp. The > question above is related to it. > > > > Ryan Blue <rb...@netflix.com.invalid> 于2018年6月27日周三 上午12:35写道: > >> I thought the plan was to finish the bloom filter spec and then decide how >> to create appropriately sized filters. This sounds like a write-side >> implementation detail to me. What is the current plan for getting this >> work >> in? >> >> On Mon, Jun 25, 2018 at 8:43 PM 俊杰陈 <cjjnj...@gmail.com> wrote: >> >> > Hi devs >> > >> > I'm now implementing bloom filter feature and need to set a default >> maximum >> > value for bloom filter size for a block. According to calculation here >> > < >> > >> https://docs.google.com/spreadsheets/d/1LQqGZ1EQSkPBXtdi9nyANiQOhwNFwqiiFe8Sazclf5Y/edit#gid=0 >> > .>, >> > I plan to set maximum size to 1/8 of parquet.block.size which can >> achieve >> > about 0.25 FPP in case of only one column of long type in a block and >> all >> > values are different. What do you think about this? Any feedback is >> > welcome. >> > >> > -- >> > Thanks & Best Regards >> > >> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix >> > > > -- > Thanks & Best Regards > -- Ryan Blue Software Engineer Netflix