nsivabalan commented on pull request #4405: URL: https://github.com/apache/hudi/pull/4405#issuecomment-1002810449
sorry about the long question. could not make it succinct. @bvaradar @codope @vinothchandar : Need some guidance on updating partitions to hive. [Reference code ](https://github.com/apache/hudi/blob/504747ecf4c93d53f0fc565cdcf12544549b7903/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/HMSDDLExecutor.java#L220) Whenever we update a partition, last arg is set as null (parameters (map)) (check above link for reference). But with this patch, we are running into NumberFormatException, bcoz, in one of the internal classes of hive, it expects parameters to have entires for FAST_STATS ("numFiles" and. "totalSize"). This is a requirement only for a partition that is getting updated. For a new partition, this is not an issue. IMetaStoreClient.alter_partitions(String dbName, String tblName, List<Partition> newParts, EnvironmentContext environmentContext) -> HiveMetaStore L3932 : this.alterHandler.alterPartitions(this.getMS(), this.wh, db_name, tbl_name, new_parts, environmentContext, this); -> HiveAlterHandler L499 ``` if (MetaStoreUtils.requireCalStats(this.hiveConf, newPart, tmpPart, tbl, environmentContext)) { if (MetaStoreUtils.isFastStatsSame(newPart, tmpPart)) { MetaStoreUtils.updateBasicState(environmentContext, tmpPart.getParameters()); } else { MetaStoreUtils.updatePartitionStatsFast(tmpPart, wh, false, true, environmentContext); } } ``` In the below code, oldPart refers to already existing partition inhive and newPart refers to incoming Partition data. ``` static boolean isFastStatsSame(Partition oldPart, Partition newPart) { // requires to calculate stats if new and old have different fast stats if ((oldPart != null) && (oldPart.getParameters() != null)) { for (String stat : StatsSetupConst.fastStats) { if (oldPart.getParameters().containsKey(stat)) { Long oldStat = Long.parseLong(oldPart.getParameters().get(stat)); Long newStat = Long.parseLong(newPart.getParameters().get(stat)); if (!oldStat.equals(newStat)) { return false; } } else { return false; } } return true; } return false; } ``` ``` public static final String[] fastStats = new String[] {NUM_FILES,TOTAL_SIZE}; ``` Wondering how come we did not face this issue so far? What are the cases that will result in partition updates in general in our hive sync tool? I understand partition addition, but what constitutes partition updates? I tried adding some data files to an existing partition in unit tests, but our sync tool did not detect the partition as qualified for an update. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
