nsivabalan commented on pull request #4405:
URL: https://github.com/apache/hudi/pull/4405#issuecomment-1002810449


   sorry about the long question. could not make it succinct. 
   
   @bvaradar @codope @vinothchandar : Need some guidance on updating partitions 
to hive. 
   [Reference code 
](https://github.com/apache/hudi/blob/504747ecf4c93d53f0fc565cdcf12544549b7903/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/HMSDDLExecutor.java#L220)
   
   Whenever we update a partition, last arg is set as null (parameters (map)) 
(check above link for reference). But with this patch, we are running into 
NumberFormatException, bcoz, in one of the internal classes of hive, it expects 
parameters to have entires for FAST_STATS ("numFiles" and. "totalSize"). This 
is a requirement only for a partition that is getting updated. For a new 
partition, this is not an issue. 
   
   IMetaStoreClient.alter_partitions(String dbName, String tblName, 
List<Partition> newParts, EnvironmentContext environmentContext) 
     -> HiveMetaStore L3932 : 
         this.alterHandler.alterPartitions(this.getMS(), this.wh, db_name, 
tbl_name, new_parts, environmentContext, this);
            -> HiveAlterHandler L499
   ```
    if (MetaStoreUtils.requireCalStats(this.hiveConf, newPart, tmpPart, tbl, 
environmentContext)) {
               if (MetaStoreUtils.isFastStatsSame(newPart, tmpPart)) {
                 MetaStoreUtils.updateBasicState(environmentContext, 
tmpPart.getParameters());
               } else {
                 MetaStoreUtils.updatePartitionStatsFast(tmpPart, wh, false, 
true, environmentContext);
               }
             }
   ```
   
   In the below code, oldPart refers to already existing partition inhive and 
newPart refers to incoming Partition data. 
   ```
    static boolean isFastStatsSame(Partition oldPart, Partition newPart) {
       // requires to calculate stats if new and old have different fast stats
       if ((oldPart != null) && (oldPart.getParameters() != null)) {
         for (String stat : StatsSetupConst.fastStats) {
           if (oldPart.getParameters().containsKey(stat)) {
             Long oldStat = Long.parseLong(oldPart.getParameters().get(stat));
             Long newStat = Long.parseLong(newPart.getParameters().get(stat));
             if (!oldStat.equals(newStat)) {
               return false;
             }
           } else {
             return false;
           }
         }
         return true;
       }
       return false;
     }
   ```
   
   ```
     public static final String[] fastStats = new String[] 
{NUM_FILES,TOTAL_SIZE};
   ```
   
   Wondering how come we did not face this issue so far? What are the cases 
that will result in partition updates in general in our hive sync tool? I 
understand partition addition, but what constitutes partition updates? I tried 
adding some data files to an existing partition in unit tests, but our sync 
tool did not detect the partition as qualified for an update. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to