bithw1 opened a new issue #2290: URL: https://github.com/apache/hudi/issues/2290
Hi, I have created a spark dataframe using the data from the upstream source. The data contains records that should be Delete and Insert/Update to the hudi table.(the record has the flag D/U/I) With Hidi,since delete and upsert are two different operation type. Should I have to filter out the deleted data as one dataframe,and upsert data as another dataframe, and write into hudi separately with two commits? If so, there would be a big performance loss, could I be able to perform deletel/update/insert in one commit? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
