[
https://issues.apache.org/jira/browse/HUDI-499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Raymond Xu reassigned HUDI-499:
-------------------------------
Assignee: Raymond Xu
> Allow partition path to be updated with GLOBAL_BLOOM index
> ----------------------------------------------------------
>
> Key: HUDI-499
> URL: https://issues.apache.org/jira/browse/HUDI-499
> Project: Apache Hudi (incubating)
> Issue Type: Improvement
> Components: Index
> Reporter: Raymond Xu
> Assignee: Raymond Xu
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> h3. Context
> When a record is to be updated with a new partition path, and when set to
> GLOBAL_BLOOM as index, the current logic implemented in
> [https://github.com/apache/incubator-hudi/pull/1091/] ignores the new
> partition path and update the record in the original partition path.
> h3. Proposed change
> Allow records to be inserted into their new partition paths and delete the
> records in the old partition paths. A configuration (e.g.
> {{hoodie.index.bloom.should.update.partition.path=true}}) can be added to
> enable this feature.
> h4. An example use case
> A Hudi dataset manages people info and partitioned by birthday. In most
> cases, where people info are updated, birthdays are not to be changed (that's
> why we choose it as partition field). But in some edge cases where birthday
> info are input wrongly and we want to manually fix it or allow user to
> updated it occasionally. In this case, option 2 would be helpful in keeping
> records in the expected partition, so that a query like "show me people who
> were born after 2000" would work.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)