Re: [PR] HIVE-28601: Leverage configurable getPartitions API in HMS to decrease memory footprint in HS2 [hive]

via GitHub Tue, 14 Jan 2025 09:59:45 -0800


armitage420 commented on code in PR #5539:
URL: https://github.com/apache/hive/pull/5539#discussion_r1915361539



##########
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java:
##########
@@ -4466,6 +4526,71 @@ public List<Partition> getPartitionsByFilter(Table tbl, 
String filter)
     return convertFromMetastore(tbl, tParts);
   }
 
+  public List<Partition> getPartitionsWithSpecs(Table tbl, 
GetPartitionsRequest request)
+      throws HiveException, TException {
+
+    if (!tbl.isPartitioned()) {
+      throw new HiveException(ErrorMsg.TABLE_NOT_PARTITIONED, 
tbl.getTableName());
+    }
+    int batchSize= MetastoreConf.getIntVar(Hive.get().getConf(), 
MetastoreConf.ConfVars.BATCH_RETRIEVE_MAX);
+    if(batchSize > 0){
+      return new ArrayList<>(getAllPartitionsWithSpecsInBatches(tbl, 
batchSize, DEFAULT_BATCH_DECAYING_FACTOR, MetastoreConf.getIntVar(
+          Hive.get().getConf(), 
MetastoreConf.ConfVars.GETPARTITIONS_BATCH_MAX_RETRIES), request));
+    }else{
+      return getPartitionsWithSpecsInternal(tbl, request);
+    }
+  }
+
+  public List<Partition> getPartitionsWithSpecsInternal(Table tbl, 
GetPartitionsRequest request)
+      throws HiveException, TException {
+
+    if (!tbl.isPartitioned()) {
+      throw new HiveException(ErrorMsg.TABLE_NOT_PARTITIONED, 
tbl.getTableName());
+    }
+    GetPartitionsResponse response = getMSC().getPartitionsWithSpecs(request);
+    List<org.apache.hadoop.hive.metastore.api.PartitionSpec> partitionSpecs = 
response.getPartitionSpec();
+    List<Partition> partitions = new ArrayList<>();
+    partitions.addAll(convertFromPartSpec(partitionSpecs.iterator(), tbl));
+
+    return partitions;
+  }
+
+  List<Partition> getPartitionsWithSpecsByNames(Table tbl, List<String> 
partNames, GetPartitionsRequest request)

Review Comment:
   This particular case scenario is made for huge partitioned tables where our 
thrift network would hit a 2GB data threshold. 
   
   We might need to change the value METASTORE_BATCH_RETRIEVE_MAX in order to 
benefit from this very method. We will have to choose a very approximate value 
for the same though. As, I am not able to come up with a real time calculation 
of max batchsize required. And the data size now is going to be pretty dynamic 
with dynamic projections for partitions.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Re: [PR] HIVE-28601: Leverage configurable getPartitions API in HMS to decrease memory footprint in HS2 [hive]

Reply via email to