I have serveral ideas to optimize the update performance: 1. Reduce the storage size of tupleId: The tupleId is too long leading heavily shuffle IO overhead while join change table with target table. 2. Avoid to convert String to UTF8String in the row processing. Before write rows into delta files, The convertfrom string to UTFString hamers some performance Code: "UTF8String.fromString(row.getString(tupleId))" 3. For DELETE ops in the MergeDataCommand, we shouldn't joint the whole columns of change table take part in the JOIN ops. Only the "key" column is needed.
-- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
