[
https://issues.apache.org/jira/browse/HIVE-22957?focusedWorklogId=456097&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-456097
]
ASF GitHub Bot logged work on HIVE-22957:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 08/Jul/20 12:59
Start Date: 08/Jul/20 12:59
Worklog Time Spent: 10m
Work Description: kgyrtkirk commented on a change in pull request #1105:
URL: https://github.com/apache/hive/pull/1105#discussion_r451502201
##########
File path: parser/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g
##########
@@ -734,6 +734,21 @@ dropPartitionOperator
EQUAL | NOTEQUAL | LESSTHANOREQUALTO | LESSTHAN | GREATERTHANOREQUALTO |
GREATERTHAN
;
+filterPartitionSpec
+ :
+ LPAREN filterPartitionVal (COMMA filterPartitionVal )* RPAREN ->
^(TOK_PARTSPEC filterPartitionVal +)
+ ;
+
+filterPartitionVal
+ :
+ identifier filterPartitionOperator constant -> ^(TOK_PARTVAL identifier
filterPartitionOperator constant)
Review comment:
old `partitionSpec` doesn't mandatorily required the constant
```
identifier (EQUAL constant)?
```
were there any use cases of that?
##########
File path:
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##########
@@ -383,7 +375,29 @@ void findUnknownPartitions(Table table, Set<Path>
partPaths,
// now check the table folder and see if we find anything
// that isn't in the metastore
Set<Path> allPartDirs = new HashSet<Path>();
+ Set<Path> partDirs = new HashSet<Path>();
+ List<FieldSchema> partColumns = table.getPartitionKeys();
checkPartitionDirs(tablePath, allPartDirs,
Collections.unmodifiableList(getPartColNames(table)));
+
+ if (filterExp != null) {
+ PartitionExpressionProxy expressionProxy = createExpressionProxy(conf);
+ List<String> paritions = new ArrayList<>();
+ for (Path path : allPartDirs) {
+ // remove the table's path from the partition path
+ // eg: <tablePath>/p1=1/p2=2/p3=3 ---> p1=1/p2=2/p3=3
+ paritions.add(path.toString().substring(tablePath.toString().length()
+ 1));
+ }
+ // Remove all partition paths which does not matches the filter
expression.
+ expressionProxy.filterPartitionsByExpr(partColumns, filterExp,
+ conf.get(MetastoreConf.ConfVars.DEFAULTPARTITIONNAME.getVarname()),
paritions);
+
+ // now the partition list will contain all the paths that matches the
filter expression.
+ // add them back to partDirs.
+ for (String path : paritions) {
+ partDirs.add(new Path(tablePath.toString() + "/" + path));
Review comment:
instead of concatenating with `/` use `new Path(parentPath,child)` -
it's more portable
##########
File path: itests/src/test/resources/testconfiguration.properties
##########
@@ -222,6 +222,7 @@ mr.query.files=\
mapjoin_subquery2.q,\
mapjoin_test_outer.q,\
masking_5.q,\
+ msck_repair_filter.q,\
Review comment:
is there a reason that we run this test with mr?
##########
File path: parser/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
##########
@@ -1942,9 +1942,8 @@ metastoreCheck
@after { popMsg(state); }
: KW_MSCK (repair=KW_REPAIR)?
(KW_TABLE tableName
- ((add=KW_ADD | drop=KW_DROP | sync=KW_SYNC) (parts=KW_PARTITIONS))? |
- (partitionSpec)?)
- -> ^(TOK_MSCK $repair? tableName? $add? $drop? $sync? (partitionSpec*)?)
+ ((add=KW_ADD | drop=KW_DROP | sync=KW_SYNC) (parts=KW_PARTITIONS)
(filterPartitionSpec)?)?)
+ -> ^(TOK_MSCK $repair? tableName? $add? $drop? $sync?
(filterPartitionSpec)?)
Review comment:
I know it was here before - but let's fix this up:
instead of separate add/drop/sync variable ...we could have
`opt=(KW_ADD|KW_DROP|KW_SYNC)` ? that will make the other end more readable as
well
##########
File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java
##########
@@ -63,13 +67,24 @@ public void analyzeInternal(ASTNode root) throws
SemanticException {
}
Table table = getTable(tableName);
- List<Map<String, String>> specs = getPartitionSpecs(table, root);
+ Map<Integer, List<ExprNodeGenericFuncDesc>> partitionSpecs =
getFullPartitionSpecs(root, table, conf, false);
+ byte[] filterExp = null;
+ if (partitionSpecs != null & !partitionSpecs.isEmpty()) {
+ // explicitly set expression proxy class to
PartitionExpressionForMetastore since we intend to use the
+ // filterPartitionsByExpr of PartitionExpressionForMetastore for
partition pruning down the line.
+ conf.set(MetastoreConf.ConfVars.EXPRESSION_PROXY_CLASS.getVarname(),
Review comment:
I don't think this will work - this is the ql module ; while
`EXPRESSION_PROXY_CLASS` is a metastore conf key; in a remote metastore setup
this set will probably have no effect...
have you tried it?
I think making a check and returning with an error that this feature is not
available due to required conf change is fine
##########
File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java
##########
@@ -63,13 +67,24 @@ public void analyzeInternal(ASTNode root) throws
SemanticException {
}
Table table = getTable(tableName);
- List<Map<String, String>> specs = getPartitionSpecs(table, root);
+ Map<Integer, List<ExprNodeGenericFuncDesc>> partitionSpecs =
getFullPartitionSpecs(root, table, conf, false);
+ byte[] filterExp = null;
+ if (partitionSpecs != null & !partitionSpecs.isEmpty()) {
+ // explicitly set expression proxy class to
PartitionExpressionForMetastore since we intend to use the
+ // filterPartitionsByExpr of PartitionExpressionForMetastore for
partition pruning down the line.
+ conf.set(MetastoreConf.ConfVars.EXPRESSION_PROXY_CLASS.getVarname(),
+ PartitionExpressionForMetastore.class.getCanonicalName());
+ // fetch the first value of partitionSpecs map since it will always have
one key, value pair
+ filterExp = SerializationUtilities.serializeExpressionToKryo(
Review comment:
why this needs to be flattened into a `byte[]` ?
##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java
##########
@@ -837,6 +844,118 @@ public static void checkColumnName(String columnName)
throws SemanticException {
return colList;
}
+ /**
+ * Get the partition specs from the tree. This stores the full specification
+ * with the comparator operator into the output list.
+ *
+ * @return Map of partitions by prefix length. Most of the time prefix
length will
+ * be the same for all partition specs, so we can just OR the
expressions.
+ */
+ public static Map<Integer, List<ExprNodeGenericFuncDesc>>
getFullPartitionSpecs(
Review comment:
can we find a new home for these 2 `static` methods? :)
`ql/src/java/org/apache/hadoop/hive/ql/parse/ParseUtils.java`
##########
File path:
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##########
@@ -383,7 +375,29 @@ void findUnknownPartitions(Table table, Set<Path>
partPaths,
// now check the table folder and see if we find anything
// that isn't in the metastore
Set<Path> allPartDirs = new HashSet<Path>();
+ Set<Path> partDirs = new HashSet<Path>();
Review comment:
move this variable inside the if
##########
File path:
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##########
@@ -240,40 +243,27 @@ void checkTable(String catName, String dbName, String
tableName,
}
PartitionIterable parts;
- boolean findUnknownPartitions = true;
if (isPartitioned(table)) {
- if (partitions == null || partitions.isEmpty()) {
+ if (filterExp != null) {
+ List<Partition> results = new ArrayList<>();
+ getPartitionListByFilterExp(getMsc(), table, filterExp,
Review comment:
I wonder if there is a way to retain `filterExp` in a more natural
way....it will be kryo-encoded almost all the time...but seems like the
metastore interface method was designed to accept kryo stuff...
##########
File path:
ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveMetaStoreChecker.java
##########
@@ -330,17 +330,6 @@ public void testPartitionsCheck() throws HiveException,
assertEquals(partToRemove.getTable().getTableName(),
result.getPartitionsNotOnFs().iterator().next().getTableName());
assertEquals(Collections.<CheckResult.PartitionResult>emptySet(),
result.getPartitionsNotInMs());
-
- List<Map<String, String>> partsCopy = new ArrayList<Map<String, String>>();
- partsCopy.add(partitions.get(1).getSpec());
Review comment:
is there a successor of this test?
##########
File path: parser/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g
##########
@@ -734,6 +734,21 @@ dropPartitionOperator
EQUAL | NOTEQUAL | LESSTHANOREQUALTO | LESSTHAN | GREATERTHANOREQUALTO |
GREATERTHAN
;
+filterPartitionSpec
+ :
+ LPAREN filterPartitionVal (COMMA filterPartitionVal )* RPAREN ->
^(TOK_PARTSPEC filterPartitionVal +)
+ ;
+
+filterPartitionVal
+ :
+ identifier filterPartitionOperator constant -> ^(TOK_PARTVAL identifier
filterPartitionOperator constant)
+ ;
+
+filterPartitionOperator
+ :
+ EQUAL | NOTEQUAL | LESSTHANOREQUALTO | LESSTHAN | GREATERTHANOREQUALTO |
GREATERTHAN | KW_LIKE
Review comment:
`dropPartitionSpec` seems to use almost the same construct ; I don't see
any reason to duplicate it ...
the only difference I see right now is `LIKE` - are there any other
differences?
I think instead of duplicate we should use the same stuff...
##########
File path:
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##########
@@ -383,7 +375,29 @@ void findUnknownPartitions(Table table, Set<Path>
partPaths,
// now check the table folder and see if we find anything
// that isn't in the metastore
Set<Path> allPartDirs = new HashSet<Path>();
+ Set<Path> partDirs = new HashSet<Path>();
+ List<FieldSchema> partColumns = table.getPartitionKeys();
checkPartitionDirs(tablePath, allPartDirs,
Collections.unmodifiableList(getPartColNames(table)));
+
+ if (filterExp != null) {
+ PartitionExpressionProxy expressionProxy = createExpressionProxy(conf);
+ List<String> paritions = new ArrayList<>();
+ for (Path path : allPartDirs) {
+ // remove the table's path from the partition path
+ // eg: <tablePath>/p1=1/p2=2/p3=3 ---> p1=1/p2=2/p3=3
+ paritions.add(path.toString().substring(tablePath.toString().length()
+ 1));
Review comment:
I'm wondering if `tablePath` could end with a '/' or not; if it does,
and `checkPartitionDirs` are removing double slashes this could eat up 1 extra
char...
##########
File path:
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java
##########
@@ -1348,6 +1348,17 @@ public static Path getPath(Table table) {
}
}
+ public static void getPartitionListByFilterExp(IMetaStoreClient msc, Table
table, byte[] filterExp,
+ String defaultPartName,
List<Partition> results)
+ throws MetastoreException {
+ try {
+ msc.listPartitionsByExpr(table.getCatName(), table.getDbName(),
table.getTableName(), filterExp,
Review comment:
this method accepts `byte[]` and if I'm not wrong this is like this
since around 2013
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 456097)
Time Spent: 1.5h (was: 1h 20m)
> Support Partition Filtering In MSCK REPAIR TABLE Command
> --------------------------------------------------------
>
> Key: HIVE-22957
> URL: https://issues.apache.org/jira/browse/HIVE-22957
> Project: Hive
> Issue Type: Improvement
> Reporter: Syed Shameerur Rahman
> Assignee: Syed Shameerur Rahman
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Design Doc_ Partition Filtering In MSCK REPAIR
> TABLE.pdf, HIVE-22957.01.patch, HIVE-22957.02.patch, HIVE-22957.03.patch
>
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> *Design Doc:*
> [^Design Doc_ Partition Filtering In MSCK REPAIR TABLE.pdf]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)