[
https://issues.apache.org/jira/browse/HIVE-17466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16159025#comment-16159025
]
Mithun Radhakrishnan commented on HIVE-17466:
---------------------------------------------
[~alangates], I was wondering if you're alright the intention of this API.
We're using HIVE-17467 (that depends on this patch) for Oozie workflow launch,
internally.
> Metastore API to list unique partition-key-value combinations
> -------------------------------------------------------------
>
> Key: HIVE-17466
> URL: https://issues.apache.org/jira/browse/HIVE-17466
> Project: Hive
> Issue Type: New Feature
> Components: Metastore
> Affects Versions: 2.2.0, 3.0.0
> Reporter: Mithun Radhakrishnan
> Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-17466.1.patch, HIVE-17466.2-branch-2.patch,
> HIVE-17466.2.patch
>
>
> Raising this on behalf of [~thiruvel], who wrote this initially as part of a
> tangential "data-discovery" system.
> Programs like Apache Oozie, Apache Falcon (or Yahoo GDM), etc. launch
> workflows based on the availability of table/partitions. Partitions are
> currently discovered by listing partitions using (what boils down to)
> {{HiveMetaStoreClient.listPartitions()}}. This can be slow and cumbersome,
> given that {{Partition}} objects are heavyweight and carry redundant
> information. The alternative is to use partition-names, which will need
> client-side parsing to extract part-key values.
> When checking which hourly partitions for a particular day have been
> published already, it would be preferable to have an API that pushed down
> part-key extraction into the {{RawStore}} layer, and returned key-values as
> the result. This would be similar to how {{SELECT DISTINCT part_key FROM
> my_table;}} would run, but at the {{HiveMetaStoreClient}} level.
> Here's what we've been using at Yahoo.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)