[
https://issues.apache.org/jira/browse/HIVE-20306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16583254#comment-16583254
]
Vihang Karajgaonkar commented on HIVE-20306:
--------------------------------------------
[~tlipcon] [~pvary] [~akolb] Can you please review?
Here is a brief description of the approach which may help with the review
# A new thrift API called {{get_partitions_with_specs}} is introduced which
takes in a request and returns response object
# Request object provides a {{GetPartitionsProjectSpec}} which provides
{{fieldList}} which is a list of strings representing the fields which are
requested and parameter key, a {{paramKeyPattern}} which is a SQL regex pattern
to include/exclude certain parameter keys. The include/exclude criteria is
determined by the value of boolean {{excludeParamKeyPattern}}
# The main directSQL implementation is provided in the class
{{PartitionProjectionEvaluator}} which receives the input fieldList is
internally converted into a prefix tree of nodes. The partition field values
are fetched in two stages. In the first stage all the single-valued fields are
set and then the second stage sets multi-valued fields. In case of
single-valued fields, we can create the SQL based on the projection fields and
avoid unnecessary joins if the fields requested do not need a join. The second
pass sets the values for multi-valued fields since each multi-valued fields
needs a SQL of its own.
# Once the partitions are fetch it groups them based on storage descriptors in
{{get_partitionspecs_grouped_by_storage_descriptor}} which was an existing
method and I modified it to handle the cases when SD or SD.localtion is not set.
I would like to move the existing {{getPartitionsFromPartitionIds}} to use
{{PartitionProjectionEvaluator}} since right now we have two methods doing
almost the same thing. Any thoughts about that?
> Implement projection spec for fetching only requested fields from partitions
> ----------------------------------------------------------------------------
>
> Key: HIVE-20306
> URL: https://issues.apache.org/jira/browse/HIVE-20306
> Project: Hive
> Issue Type: Sub-task
> Reporter: Vihang Karajgaonkar
> Assignee: Vihang Karajgaonkar
> Priority: Major
> Attachments: HIVE-20306.patch
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)