[ 
https://issues.apache.org/jira/browse/HIVE-20306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16583254#comment-16583254
 ] 

Vihang Karajgaonkar commented on HIVE-20306:
--------------------------------------------

[~tlipcon] [~pvary] [~akolb] Can you please review?

Here is a brief description of the approach which may help with the review
 # A new thrift API called {{get_partitions_with_specs}} is introduced which 
takes in a request and returns response object
 # Request object provides a {{GetPartitionsProjectSpec}} which provides 
{{fieldList}} which is a list of strings representing the fields which are 
requested and parameter key, a {{paramKeyPattern}} which is a SQL regex pattern 
to include/exclude certain parameter keys. The include/exclude criteria is 
determined by the value of boolean {{excludeParamKeyPattern}}
 # The main directSQL implementation is provided in the class 
{{PartitionProjectionEvaluator}} which receives the input fieldList is 
internally converted into a prefix tree of nodes. The partition field values 
are fetched in two stages. In the first stage all the single-valued fields are 
set and then the second stage sets multi-valued fields. In case of 
single-valued fields, we can create the SQL based on the projection fields and 
avoid unnecessary joins if the fields requested do not need a join. The second 
pass sets the values for multi-valued fields since each multi-valued fields 
needs a SQL of its own.
 # Once the partitions are fetch it groups them based on storage descriptors in 
{{get_partitionspecs_grouped_by_storage_descriptor}} which was an existing 
method and I modified it to handle the cases when SD or SD.localtion is not set.

I would like to move the existing {{getPartitionsFromPartitionIds}} to use 
{{PartitionProjectionEvaluator}} since right now we have two methods doing 
almost the same thing. Any thoughts about that?

 

> Implement projection spec for fetching only requested fields from partitions
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-20306
>                 URL: https://issues.apache.org/jira/browse/HIVE-20306
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>            Priority: Major
>         Attachments: HIVE-20306.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to