zhangbutao commented on code in PR #5063:
URL: https://github.com/apache/hive/pull/5063#discussion_r1520802015
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##########
@@ -742,14 +742,8 @@ public List<Partition> getPartitionsViaPartNames(final
String catName, final Str
return Batchable.runBatched(batchSize, partNames, new Batchable<String,
Partition>() {
@Override
public List<Partition> run(List<String> input) throws MetaException {
- String filter = "" + PARTITIONS + ".\"PART_NAME\" in (" +
makeParams(input.size()) + ")";
- List<Long> partitionIds = getPartitionIdsViaSqlFilter(catName, dbName,
tblName,
Review Comment:
> Skipping get partition ids in MetaStoreDirectSql#getPartitionsViaPartNames
to reduce the Database calls.
Seems that you merge the two sql(getPartitionIdsViaSqlFilter and
`getPartitionsFromPartitionIds`) into one sql, so that reduce the db calls.
I think your change maybe reasonable, but i also think there have other
reasons which can prove why we use two sql to do this before. If you give any
clues that can explain the history code, that's will be good. :)
##########
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java:
##########
@@ -3199,19 +3199,24 @@ public Map<Map<String, String>, Partition>
loadDynamicPartitions(final LoadTable
partitionNames.add(Warehouse.makeDynamicPartNameNoTrailingSeperator(details.fullSpec));
}
}
- List<Partition> partitions = Hive.get().getPartitionsByNames(tbl,
partitionNames);
- for(Partition partition : partitions) {
- LOG.debug("HMS partition spec: {}", partition.getSpec());
- partitionDetailsMap.entrySet().parallelStream()
- .filter(entry ->
entry.getValue().fullSpec.equals(partition.getSpec()))
- .findAny().ifPresent(entry -> {
- entry.getValue().partition = partition;
- entry.getValue().hasOldPartition = true;
- });
- }
- // no need to fetch partition again in tasks since we have already
fetched partitions
- // info in getPartitionsByNames()
- fetchPartitionInfo = false;
+ try {
+ List<Partition> partitions = Hive.get().getPartitionsByNames(tbl,
partitionNames);
+ for(Partition partition : partitions) {
+ LOG.debug("HMS partition spec: {}", partition.getSpec());
+ partitionDetailsMap.entrySet().parallelStream()
+ .filter(entry ->
entry.getValue().fullSpec.equals(partition.getSpec()))
+ .findAny().ifPresent(entry -> {
+ entry.getValue().partition = partition;
+ entry.getValue().hasOldPartition = true;
+ });
+ }
+ // no need to fetch partition again in tasks since we have already
fetched partitions
+ // info in getPartitionsByNames()
+ fetchPartitionInfo = false;
+ } catch (HiveException e) {
+ // Failed to fetch partitions in some cases (e.g., partition number
limit)
+ LOG.warn("Fetching partitions by names failed.", e);
Review Comment:
IMO, this exception should be thrown directly, as we need the engines(HS2 &
Spark) to know that the `load_partition` is failed so that they can do some
rollback based the exception, or the users know that they should control the
limit size by setting conf.
BTW, i think we had better not to change this client code (i know you are
trying to fix the failed tests). The PR should not have any effect on client.
Maybe we can try to fix this in sever side to minimize the potential
incompatibilities?
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##########
@@ -1027,15 +1021,42 @@ public <T> List<T> getPartitionFieldsViaSqlFilter(
}
}
+
+ /** Should be called with the list short enough to not trip up Oracle/etc. */
+ private List<Partition> getPartitionsFromPartitionNames(String catName,
String dbName,
+ String tblName, Boolean isView, List<String> partNameList, List<String>
projectionFields,
+ boolean isAcidTable, GetPartitionsArgs args) throws MetaException {
+ // Get most of the fields for the partNames provided.
+ // Assume db and table names are the same for all partition, as provided
in arguments.
+ String partNames = partNameList.stream()
+ .map(name -> "'" + name + "'")
+ .collect(Collectors.joining(","));
+ String queryText =
Review Comment:
Can we ensure that other backend db like oracle & pg can use the sql without
incompatibility?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]