zhangbutao commented on code in PR #5063: URL: https://github.com/apache/hive/pull/5063#discussion_r1520802015
########## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java: ########## @@ -742,14 +742,8 @@ public List<Partition> getPartitionsViaPartNames(final String catName, final Str return Batchable.runBatched(batchSize, partNames, new Batchable<String, Partition>() { @Override public List<Partition> run(List<String> input) throws MetaException { - String filter = "" + PARTITIONS + ".\"PART_NAME\" in (" + makeParams(input.size()) + ")"; - List<Long> partitionIds = getPartitionIdsViaSqlFilter(catName, dbName, tblName, Review Comment: > Skipping get partition ids in MetaStoreDirectSql#getPartitionsViaPartNames to reduce the Database calls. Seems that you merge the two sql(getPartitionIdsViaSqlFilter and `getPartitionsFromPartitionIds`) into one sql, so that reduce the db calls. I think your change maybe reasonable, but i also think there have other reasons which can prove why we use two sql to do this before. If you give any clues that can explain the history code, that's will be good. :) ########## ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java: ########## @@ -3199,19 +3199,24 @@ public Map<Map<String, String>, Partition> loadDynamicPartitions(final LoadTable partitionNames.add(Warehouse.makeDynamicPartNameNoTrailingSeperator(details.fullSpec)); } } - List<Partition> partitions = Hive.get().getPartitionsByNames(tbl, partitionNames); - for(Partition partition : partitions) { - LOG.debug("HMS partition spec: {}", partition.getSpec()); - partitionDetailsMap.entrySet().parallelStream() - .filter(entry -> entry.getValue().fullSpec.equals(partition.getSpec())) - .findAny().ifPresent(entry -> { - entry.getValue().partition = partition; - entry.getValue().hasOldPartition = true; - }); - } - // no need to fetch partition again in tasks since we have already fetched partitions - // info in getPartitionsByNames() - fetchPartitionInfo = false; + try { + List<Partition> partitions = Hive.get().getPartitionsByNames(tbl, partitionNames); + for(Partition partition : partitions) { + LOG.debug("HMS partition spec: {}", partition.getSpec()); + partitionDetailsMap.entrySet().parallelStream() + .filter(entry -> entry.getValue().fullSpec.equals(partition.getSpec())) + .findAny().ifPresent(entry -> { + entry.getValue().partition = partition; + entry.getValue().hasOldPartition = true; + }); + } + // no need to fetch partition again in tasks since we have already fetched partitions + // info in getPartitionsByNames() + fetchPartitionInfo = false; + } catch (HiveException e) { + // Failed to fetch partitions in some cases (e.g., partition number limit) + LOG.warn("Fetching partitions by names failed.", e); Review Comment: IMO, this exception should be thrown directly, as we need the engines(HS2 & Spark) to know that the `load_partition` is failed so that they can do some rollback based the exception, or the users know that they should control the limit size by setting conf. BTW, i think we had better not to change this client code (i know you are trying to fix the failed tests). The PR should not have any effect on client. Maybe we can try to fix this in sever side to minimize the potential incompatibilities? ########## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java: ########## @@ -1027,15 +1021,42 @@ public <T> List<T> getPartitionFieldsViaSqlFilter( } } + + /** Should be called with the list short enough to not trip up Oracle/etc. */ + private List<Partition> getPartitionsFromPartitionNames(String catName, String dbName, + String tblName, Boolean isView, List<String> partNameList, List<String> projectionFields, + boolean isAcidTable, GetPartitionsArgs args) throws MetaException { + // Get most of the fields for the partNames provided. + // Assume db and table names are the same for all partition, as provided in arguments. + String partNames = partNameList.stream() + .map(name -> "'" + name + "'") + .collect(Collectors.joining(",")); + String queryText = Review Comment: Can we ensure that other backend db like oracle & pg can use the sql without incompatibility? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org