[jira] [Work logged] (HIVE-23890) Create HMS endpoint for querying file lists using FlatBuffers as serialization

ASF GitHub Bot (Jira) Tue, 11 Aug 2020 13:03:46 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-23890?focusedWorklogId=469386&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469386
 ]


ASF GitHub Bot logged work on HIVE-23890:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 11/Aug/20 20:02
            Start Date: 11/Aug/20 20:02
    Worklog Time Spent: 10m 
      Work Description: vihangk1 commented on a change in pull request #1330:
URL: https://github.com/apache/hive/pull/1330#discussion_r468832097



##########
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##########
@@ -1861,6 +1861,19 @@ struct ScheduledQueryProgressInfo{
   4: optional string errorMessage,
 }
 
+struct GetFileListRequest {
+  1: optional string catName,
+  2: optional string dbName,
+  3: optional string tableName,
+  4: optional list<string> partVals,

Review comment:
       I think there is a trade-off here. On larger tables with lots of 
partitions, doing multiple RPCs to the metastore for fetching the file-metadata 
one at a time not only is less efficient, it is likely that the 
ValidWriteIdList is updated for the table during the time and the cache hit 
ratio could go down. You are right about large data sent over network. In my 
experience the file-metadata which we are sending here is few hundred bytes per 
partition and the its not very large even for few thousands of the partition. 
If use a partitionNames list here in the request, clients can always do 
batching like requesting 1000 partitions at a time which would be more 
efficient.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 469386)
    Time Spent: 2h 20m  (was: 2h 10m)

> Create HMS endpoint for querying file lists using FlatBuffers as serialization
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-23890
>                 URL: https://issues.apache.org/jira/browse/HIVE-23890
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Barnabas Maidics
>            Assignee: Barnabas Maidics
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> New thrift objects would be:
> {code:java}
> struct GetFileListRequest {
>     1: optional string catName,
>     2: required string dbName,
>     3: required string tableName,
>     4: required list<string> partVals,
>     6: optional string validWriteIdList
> }
> struct GetFileListResponse {
>     1: required binary fileListData
> }
> {code}
> Where GetFileListResponse contains a binary field, which would be a 
> FlatBuffer object



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23890) Create HMS endpoint for querying file lists using FlatBuffers as serialization

Reply via email to