nsivabalan commented on issue #1073: [HUDI-377] Adding Delete() support to 
DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1073#issuecomment-568824832
 
 
   > I have another question: if hudi also syn to a hive table, 
`_hoodie_delete_marker` column should be added or not?
   > 
   > Let's assume a scenario, there is a message from oracle ogg by kafka to 
hudi, it has a column named `op_type`, and `op_type` has three values: `I`: 
insert, `U`: update, `D`: delete.
   > In this case, we need to add a column named `_hoodie_delete_marker` to set 
`D` to true, `I` and `U` to null. But user maybe don't want to hive show 
`_hoodie_delete_marker`. Because this field has no meaning for him, he has 
other fields to represent it.
   > 
   > So I think whether we can add two attributes in `DataSourceWriteOptions` 
to let hudi to tell which field may contain delete value and which value mean 
delete?
   
   Thanks for bringing it up. Yes, we did thought about this. But it had some 
complications and don't want to drag the first version. So, as of now, the 
field is fixed which is "_hoodie_delete_marker" and it will be present in 
serialized version of non deleted record on disk. We can create a ticket and 
discuss further on the usability and what would be a good approach. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to