Unfortunately, the mailing list does not support attachments, looks like :( Could you paste it inline?
On Sat, Feb 1, 2020 at 6:20 AM Purushotham Pushpavanthar < [email protected]> wrote: > Hi Balaji, > > The attachment contains the logs you asked for. > However, the only difference between storageValue and > fullStoragePartitionPath is *target-base-path*. > So if I'm not wrong, the code will be marking all partitions which got > UPDATE data for partition update. Hence time consuming. > > Regards, > Purushotham Pushpavanth > > > > On Mon, 20 Jan 2020 at 08:58, Balaji Varadarajan > <[email protected]> wrote: > >> Hi Purushotham, >> I am unable to reproduce same partitions getting hive-synced locally. >> Can you add the following log message in HoodieHiveClient.java and run the >> code and send us logs. >> diff --git >> a/hudi-hive/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java >> b/hudi-hive/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java >> >> index 4578bb2f..ba4b1147 100644 >> >> --- a/hudi-hive/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java >> >> +++ b/hudi-hive/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java >> >> @@ -237,6 +237,8 @@ public class HoodieHiveClient { >> >> if (!paths.containsKey(storageValue)) { >> >> >> events.add(PartitionEvent.newPartitionAddEvent(storagePartition)); >> >> } else if >> (!paths.get(storageValue).equals(fullStoragePartitionPath)) { >> >> + LOG.info("Partition Location changes. StorageVal=" + >> storageValue >> >> + + ", Existing Hive Path=" + paths.get(storageValue) + ", >> New Location=" + fullStoragePartitionPath); >> >> >> events.add(PartitionEvent.newPartitionUpdateEvent(storagePartition)); >> >> } >> >> } >> >> THanks,Balaji.V >> On Friday, January 17, 2020, 03:44:08 AM PST, Purushotham >> Pushpavanthar <[email protected]> wrote: >> >> Hi, >> >> I noticed that >> *org.apache.hudi.hive.HoodieHiveClient#updatePartitionsToTable()* is time >> consuming while running HUDI on set of records which contains data for >> large set of partitions. All it is doing is setting location for each >> updated partition path. However, >> *org.apache.hudi.hive.HoodieHiveClient#addPartitionsToTable() >> *is taking care of adding new partitions to the table. >> >> 1. For a given table, whose base path doesn't change (usually it doesn't >> in production), why *updatePartitionsToTable() *is needed? Can you >> please throw some light on any such case where this is needed? >> 2. If it is required, can we do something to optimise the time consumed >> by this operation? Currently, the *Alter Statements* are executed one by >> one on each (partition, path) pair for every updated partition. >> >> >> >> Regards, >> Purushotham Pushpavanth >> > >
