[
https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17194261#comment-17194261
]
Syed Shameerur Rahman commented on HIVE-23851:
----------------------------------------------
[~kgyrtkirk] As per your comments, I have changed the implementation. Please
review the PR.
Thanks.
> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> ----------------------------------------------------------------------------
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
> Issue Type: Bug
> Affects Versions: 4.0.0
> Reporter: Syed Shameerur Rahman
> Assignee: Syed Shameerur Rahman
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0
>
> Time Spent: 4h 10m
> Remaining Estimate: 0h
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
> 2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main]
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
> java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
> at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
> at
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707)
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806)
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96)
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52)
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at
> org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48)
> [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593)
> [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80)
> [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192]
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> ~[?:1.8.0_192]
> {code}
> *Cause:*
> In case of msck repair with partition filtering we expect expression proxy
> class to be set as PartitionExpressionForMetastore (
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78
> ), While dropping partition we serialize the drop partition filter
> expression as (
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589
> ) which is incompatible during deserializtion happening in
> PartitionExpressionForMetastore (
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52
> ) hence the query fails with Failed to deserialize the expression.
> *Solutions*:
> I could think of two approaches to this problem
> # Since PartitionExpressionForMetastore is required only during parition
> pruning step, We can switch back the expression proxy class to
> MsckPartitionExpressionProxy once the partition pruning step is done.
> # The other solution is to make serialization process in msck drop partition
> filter expression compatible with the one with
> PartitionExpressionForMetastore, We can do this via Reflection since the drop
> partition serialization happens in Msck class (standadlone-metatsore) by this
> way we can completely remove the need for class MsckPartitionExpressionProxy
> and this also helps to reduce the complexity of Msck Repair command with
> parition filtering to work with ease (no need to set the expression
> proxyClass config).
> I am personally inclined to the 2nd approach. Before moving on i want to know
> if this is the best approach or is there any other better/easier approach to
> solve this problem.
> PS: qtest added in HIVE-22957 mainly focused on adding missing partition.
> Forgot to add case for dropping partition.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)