FeiZou commented on issue #3418: URL: https://github.com/apache/hudi/issues/3418#issuecomment-911008841
Hey from my understanding, MOR table will have high cost during the reading than COR. Especially in our case, we have duplicated rows in the delta commit which needed to be handled during the reading time. Reading time is kind of more of concern for us. Please correct me if my understanding is wrong and please let me know if you guys have any clue on improving the `upsert` spark job performance, either from Spark side or Hudi side. And I have another concern actually, I found that there are still duplicates existing after I done the `bulk_insert` table migration using `SimpleKeyGen ` with CopyOnWrite table type. The count of duplicate numbers is same as the table using `NonPartitionedKeyGen `. Any thought on that? @nsivabalan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
