[ 
https://issues.apache.org/jira/browse/HUDI-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-3018:
--------------------------------------
    Sprint: Cont' improve -  2022/02/14, Cont' improve -  2022/02/21  (was: 
Cont' improve -  2022/02/14)

> Flag if user df has "_hoodie_is_deleted" field with diff data type other than 
> boolean. 
> ---------------------------------------------------------------------------------------
>
>                 Key: HUDI-3018
>                 URL: https://issues.apache.org/jira/browse/HUDI-3018
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: Usability
>            Reporter: sivabalan narayanan
>            Assignee: sivabalan narayanan
>            Priority: Critical
>              Labels: pull-request-available, sev:normal
>             Fix For: 0.11.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> as of now, hudi interprets a special column named "_hoodie_is_deleted" and if 
> set to true, the record is considered a delete else an update or an insert. 
> this is not a reserved column as such. For eg, user dataframe can have a 
> column named "_hoodie_is_deleted" whose data type is random string. 
>  
> Add validations to hudi to ensure that this columns' data type is boolean if 
> present in the df. 
>  
> excerpt from the user
>  
> I'd suggest:
>  * Possibly dropping the column (as you say if it has little benefits sure). 
> If not, documenting the behaviour somewhere. Alternatively, always include 
> the column, along with the other Hudi metadata fields which are prepended to 
> written schema already.
>  * If the column is not a boolean:
>  ** Failing hard, as this column is essentially "reserved" for Hudi
>  ** Taking {{IS NOT NULL}} as truthy
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to