[
https://issues.apache.org/jira/browse/HUDI-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338652#comment-17338652
]
Vinoth Chandar commented on HUDI-1295:
--------------------------------------
[~309637554] the bloom filters are separate. We can use the current metadata
table implementation to add a new partition `bloom_filters_<columnName>` to
store bloom filters for a given column. You also need to write some sync code,
to also populate bloom filters when commits are synced from data to metadata
table.
IMO we can make parallel progress on this, as RFC-27 adds the column ranges
including for _`_hoodie_record_key` . We could also special case
`_hoodie_record_key` and store its ranges along with bloom filters in a single
record, given the index will most likely access them together._
[~pwason] cc-ing as well, as FYI
> RFC-15: Track bloom filters as a part of metadata table
> -------------------------------------------------------
>
> Key: HUDI-1295
> URL: https://issues.apache.org/jira/browse/HUDI-1295
> Project: Apache Hudi
> Issue Type: Sub-task
> Components: Writer Core
> Affects Versions: 0.9.0
> Reporter: Vinoth Chandar
> Priority: Major
> Fix For: 0.9.0
>
>
> Idea here to maintain our bloom filters outside of parquet for speedier
> access from bloom. index
--
This message was sent by Atlassian Jira
(v8.3.4#803005)