[
https://issues.apache.org/jira/browse/HIVE-23890?focusedWorklogId=469386&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469386
]
ASF GitHub Bot logged work on HIVE-23890:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 11/Aug/20 20:02
Start Date: 11/Aug/20 20:02
Worklog Time Spent: 10m
Work Description: vihangk1 commented on a change in pull request #1330:
URL: https://github.com/apache/hive/pull/1330#discussion_r468832097
##########
File path:
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##########
@@ -1861,6 +1861,19 @@ struct ScheduledQueryProgressInfo{
4: optional string errorMessage,
}
+struct GetFileListRequest {
+ 1: optional string catName,
+ 2: optional string dbName,
+ 3: optional string tableName,
+ 4: optional list<string> partVals,
Review comment:
I think there is a trade-off here. On larger tables with lots of
partitions, doing multiple RPCs to the metastore for fetching the file-metadata
one at a time not only is less efficient, it is likely that the
ValidWriteIdList is updated for the table during the time and the cache hit
ratio could go down. You are right about large data sent over network. In my
experience the file-metadata which we are sending here is few hundred bytes per
partition and the its not very large even for few thousands of the partition.
If use a partitionNames list here in the request, clients can always do
batching like requesting 1000 partitions at a time which would be more
efficient.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 469386)
Time Spent: 2h 20m (was: 2h 10m)
> Create HMS endpoint for querying file lists using FlatBuffers as serialization
> ------------------------------------------------------------------------------
>
> Key: HIVE-23890
> URL: https://issues.apache.org/jira/browse/HIVE-23890
> Project: Hive
> Issue Type: Improvement
> Components: Metastore
> Reporter: Barnabas Maidics
> Assignee: Barnabas Maidics
> Priority: Major
> Labels: pull-request-available
> Time Spent: 2h 20m
> Remaining Estimate: 0h
>
> New thrift objects would be:
> {code:java}
> struct GetFileListRequest {
> 1: optional string catName,
> 2: required string dbName,
> 3: required string tableName,
> 4: required list<string> partVals,
> 6: optional string validWriteIdList
> }
> struct GetFileListResponse {
> 1: required binary fileListData
> }
> {code}
> Where GetFileListResponse contains a binary field, which would be a
> FlatBuffer object
--
This message was sent by Atlassian Jira
(v8.3.4#803005)