updatePartitionsToTable() is time consuming and redundant.

Purushotham Pushpavanthar Fri, 17 Jan 2020 03:44:23 -0800

Hi,

I noticed that
*org.apache.hudi.hive.HoodieHiveClient#updatePartitionsToTable()* is time
consuming while running HUDI on set of records which contains data for
large set of partitions. All it is doing is setting location for each
updated partition path. However,
*org.apache.hudi.hive.HoodieHiveClient#addPartitionsToTable()
*is taking care of adding new partitions to the table.


   1. For a given table, whose base path doesn't change (usually it doesn't
   in production), why *updatePartitionsToTable() *is needed? Can you
   please throw some light on any such case where this is needed?
   2. If it is required, can we do something to optimise the time consumed
   by this operation? Currently, the *Alter Statements* are executed one by
   one on each (partition, path) pair for every updated partition.



Regards,
Purushotham Pushpavanth

updatePartitionsToTable() is time consuming and redundant.

Reply via email to