FeiZou commented on issue #3418:
URL: https://github.com/apache/hudi/issues/3418#issuecomment-911008841


   Hey from my understanding, MOR table will have high cost during the reading 
than COR. Especially in our case, we have duplicated rows in the delta commit 
which needed to be handled during the reading time. Reading time is kind of 
more of concern for us. Please correct me if my understanding is wrong and 
please let me know if you guys have any clue on improving the `upsert` spark 
job performance, either from Spark side or Hudi side.
   
   And I have another concern actually, I found that there are still duplicates 
existing after I done the `bulk_insert` table migration using `SimpleKeyGen ` 
with CopyOnWrite table type. The count of duplicate numbers is same as the 
table using `NonPartitionedKeyGen `. Any thought on that? @nsivabalan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to