Raymond Xu created HUDI-499:
-------------------------------
Summary: Allow partition path to be updated with GLOBAL_BLOOM index
Key: HUDI-499
URL: https://issues.apache.org/jira/browse/HUDI-499
Project: Apache Hudi (incubating)
Issue Type: Improvement
Components: Index
Reporter: Raymond Xu
h3. Context
When a record is to be updated with a new partition path, and when set to
GLOBAL_BLOOM as index, the current logic implemented in
[https://github.com/apache/incubator-hudi/pull/1091/] ignores the new partition
path and update the record in the original partition path.
h3. Proposed change
Allow records to be inserted into their new partition paths and delete the
records in the old partition paths. A configuration (e.g.
{{hoodie.index.bloom.update.partitionpath=true}}) can be added to enable this
feature.
h4. An example use case
A Hudi dataset manages people info and partitioned by birthday. In most cases,
where people info are updated, birthdays are not to be changed (that's why we
choose it as partition field). But in some edge cases where birthday info are
input wrongly and we want to manually fix it or allow user to updated it
occasionally. In this case, option 2 would be helpful in keeping records in the
expected partition, so that a query like "show me people who were born after
2000" would work.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)