[
https://issues.apache.org/jira/browse/HIVE-19040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16417987#comment-16417987
]
Sergey Shelukhin commented on HIVE-19040:
-----------------------------------------
[~vihangk1] as per above... HMS version doesn't matter, only Hive jars and HS2.
The reason this API exists is actually also described above although it's not
very clear I guess.
See the impl.
What happens is we send actual Hive expression to MS.
On the main path, where it can be pushed down, it gets deserialized, converted
to string and pushed, so that part is indeed redundant and can be replaced.
However the reason we send it as bytes (and the reason the whole API was added
on top of the old string filter API), is to allow HMS to evaluate it when
pushdown is not available (which is actually in most cases - Filter.g and SQL
pushdown supports only basic stuff). This way MS can get (or potentially
stream) all the partitions and actually evaluate a full Hive expression on them
with any standard UDFs.
The alternative without this API is to send all partitions back to client
(which unlike the SQL db can be far away) and evaluate Hive expression on
client, which can be very expensive if there are many.
That's why bytes and proxy class into ql jar is used (if long ago we had
metastore client and metastore server module separation, it would have used QL
classes directly). Also Hive stuff is already serializable into bytes (due to
needing to serialize the plan).
So with just filter.g this functionality can be lost.
If we want to keep the ability of MS to evaluate expressions locally, it's
possible to beef up this API (and the proxy class config). They can refer to a
named "expression evaluator" that will be configured to refer to a particular
class, and also supplied in the request. Version can also be included to handle
compat...
It's also possible to create support for native expressions in metastore that
would handle most Hive cases, i.e. basically replace Filter.g with Hive
expression parsing and include common UDFs like IN, etc..
> get_partitions_by_expr() implementation in HiveMetaStore causes backward
> incompatibility easily
> ------------------------------------------------------------------------------------------------
>
> Key: HIVE-19040
> URL: https://issues.apache.org/jira/browse/HIVE-19040
> Project: Hive
> Issue Type: Improvement
> Components: Standalone Metastore
> Affects Versions: 2.0.0
> Reporter: Aihua Xu
> Priority: Major
>
> In the HiveMetaStore implementation of {{public PartitionsByExprResult
> get_partitions_by_expr(PartitionsByExprRequest req) throws TException}} , an
> expression is serialized into byte array from the client side and passed
> through PartitionsByExprRequest. Then HMS will deserialize back into the
> expression and filter the partitions by it.
> Such partition filtering expression can contain various UDFs. If there are
> some changes to one of the UDFs between different Hive versions, HS2 on the
> older version will serialize the expression in old format which won't be able
> to be deserialized by HMS on the newer version. One example of that is,
> GenericUDFIn class adds {{transient}} to the field constantInSet which will
> cause such incompatibility.
> One approach I'm thinking of is, instead of converting the expression object
> to byte array, we can pass the expression string directly.
>
>
>
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)