[ 
https://issues.apache.org/jira/browse/HUDI-499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-499:
----------------------------
    Description: 
h3. Context

When a record is to be updated with a new partition path, and when set to 
GLOBAL_BLOOM as index, the current logic implemented in 
[https://github.com/apache/incubator-hudi/pull/1091/] ignores the new partition 
path and update the record in the original partition path.
h3. Proposed change

Allow records to be inserted into their new partition paths and delete the 
records in the old partition paths. A configuration (e.g. 
{{hoodie.index.bloom.should.update.partition.path=true}}) can be added to 
enable this feature.
h4. An example use case

A Hudi dataset manages people info and partitioned by birthday. In most cases, 
where people info are updated, birthdays are not to be changed (that's why we 
choose it as partition field). But in some edge cases where birthday info are 
input wrongly and we want to manually fix it or allow user to updated it 
occasionally. In this case, option 2 would be helpful in keeping records in the 
expected partition, so that a query like "show me people who were born after 
2000" would work.

 

  was:
h3. Context

When a record is to be updated with a new partition path, and when set to 
GLOBAL_BLOOM as index, the current logic implemented in 
[https://github.com/apache/incubator-hudi/pull/1091/] ignores the new partition 
path and update the record in the original partition path.
h3. Proposed change

Allow records to be inserted into their new partition paths and delete the 
records in the old partition paths. A configuration (e.g. 
{{hoodie.index.bloom.update.partitionpath=true}}) can be added to enable this 
feature.
h4. An example use case

A Hudi dataset manages people info and partitioned by birthday. In most cases, 
where people info are updated, birthdays are not to be changed (that's why we 
choose it as partition field). But in some edge cases where birthday info are 
input wrongly and we want to manually fix it or allow user to updated it 
occasionally. In this case, option 2 would be helpful in keeping records in the 
expected partition, so that a query like "show me people who were born after 
2000" would work.

 


> Allow partition path to be updated with GLOBAL_BLOOM index
> ----------------------------------------------------------
>
>                 Key: HUDI-499
>                 URL: https://issues.apache.org/jira/browse/HUDI-499
>             Project: Apache Hudi (incubating)
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Raymond Xu
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> h3. Context
> When a record is to be updated with a new partition path, and when set to 
> GLOBAL_BLOOM as index, the current logic implemented in 
> [https://github.com/apache/incubator-hudi/pull/1091/] ignores the new 
> partition path and update the record in the original partition path.
> h3. Proposed change
> Allow records to be inserted into their new partition paths and delete the 
> records in the old partition paths. A configuration (e.g. 
> {{hoodie.index.bloom.should.update.partition.path=true}}) can be added to 
> enable this feature.
> h4. An example use case
> A Hudi dataset manages people info and partitioned by birthday. In most 
> cases, where people info are updated, birthdays are not to be changed (that's 
> why we choose it as partition field). But in some edge cases where birthday 
> info are input wrongly and we want to manually fix it or allow user to 
> updated it occasionally. In this case, option 2 would be helpful in keeping 
> records in the expected partition, so that a query like "show me people who 
> were born after 2000" would work.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to