[ 
https://issues.apache.org/jira/browse/HIVE-19715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16491263#comment-16491263
 ] 

Vihang Karajgaonkar commented on HIVE-19715:
--------------------------------------------

+1 to the idea of having one common API to consolidate fetching 
partition-related information. I like the idea of have a {{projection}} list 
and a {{predicate expression}} to filter the partitions. Pagination is very 
important too because right now any client can cause OOM on HMS (or client 
side) by requesting thousands of partitions and the pagination/streaming 
support will help a lot with such cases. I also think we should at-least 
deprecate the older APIs so that clients can move to the newer APIs in the near 
future. get_partition using partition expression proxy has issues like 
described above by [~akolb] and doesn't work well for a standalone metastore 
since it depends on ql classes (effectively making standalone-metastore not a 
standalone-metastore). We have seen compatibility issues with that API as well 
when newer clients try to talk to older HMS server. I can take up this task if 
we can come up a API spec which works well for most cases. It can be an 
incremental effort like for example adding support for pagination could be 
provided later as long as API is defined in a extendable way.

One interesting side-effect of returning only subset of interesting fields of 
the partition objects is we probably will have to change the partition fields 
as {{optional}} instead of the {{required}}. This can create a trickle down 
effect all the way down to the database and I am not sure what complications 
can it cause. Thoughts?

> Consolidated and flexible API for fetching partition metadata from HMS
> ----------------------------------------------------------------------
>
>                 Key: HIVE-19715
>                 URL: https://issues.apache.org/jira/browse/HIVE-19715
>             Project: Hive
>          Issue Type: New Feature
>          Components: Standalone Metastore
>            Reporter: Todd Lipcon
>            Priority: Major
>
> Currently, the HMS thrift API exposes 17 different APIs for fetching 
> partition-related information. There is somewhat of a combinatorial explosion 
> going on, where each API has variants with and without "auth" info, by pspecs 
> vs names, by filters, by exprs, etc. Having all of these separate APIs long 
> term is a maintenance burden and also more confusing for consumers.
> Additionally, even with all of these APIs, there is a lack of granularity in 
> fetching only the information needed for a particular use case. For example, 
> in some use cases it may be beneficial to only fetch the partition locations 
> without wasting effort fetching statistics, etc.
> This JIRA proposes that we add a new "one API to rule them all" for fetching 
> partition info. The request and response would be encapsulated in structs. 
> Some desirable properties:
> - the request should be able to specify which pieces of information are 
> required (eg location, properties, etc)
> - in the case of partition parameters, the request should be able to do 
> either whitelisting or blacklisting (eg to exclude large incremental column 
> stats HLL dumped in there by Impala)
> - the request should optionally specify auth info (to encompas the 
> "with_auth" variants)
> - the request should be able to designate the set of partitions to access 
> through one of several different methods (eg "all", list<name>, expr, 
> part_vals, etc) 
> - the struct should be easily evolvable so that new pieces of info can be 
> added
> - the response should be designed in such a way as to avoid transferring 
> redundant information for common cases (eg simple "dictionary coding" of 
> strings like parameter names, etc)
> - the API should support some form of pagination for tables with large 
> partition counts



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to