[
https://issues.apache.org/jira/browse/HUDI-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sivabalan narayanan updated HUDI-3018:
--------------------------------------
Sprint: Cont' improve - 2022/02/14, Cont' improve - 2022/02/21 (was:
Cont' improve - 2022/02/14)
> Flag if user df has "_hoodie_is_deleted" field with diff data type other than
> boolean.
> ---------------------------------------------------------------------------------------
>
> Key: HUDI-3018
> URL: https://issues.apache.org/jira/browse/HUDI-3018
> Project: Apache Hudi
> Issue Type: Improvement
> Components: Usability
> Reporter: sivabalan narayanan
> Assignee: sivabalan narayanan
> Priority: Critical
> Labels: pull-request-available, sev:normal
> Fix For: 0.11.0
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> as of now, hudi interprets a special column named "_hoodie_is_deleted" and if
> set to true, the record is considered a delete else an update or an insert.
> this is not a reserved column as such. For eg, user dataframe can have a
> column named "_hoodie_is_deleted" whose data type is random string.
>
> Add validations to hudi to ensure that this columns' data type is boolean if
> present in the df.
>
> excerpt from the user
>
> I'd suggest:
> * Possibly dropping the column (as you say if it has little benefits sure).
> If not, documenting the behaviour somewhere. Alternatively, always include
> the column, along with the other Hudi metadata fields which are prepended to
> written schema already.
> * If the column is not a boolean:
> ** Failing hard, as this column is essentially "reserved" for Hudi
> ** Taking {{IS NOT NULL}} as truthy
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)