vihangk1 commented on a change in pull request #1330:
URL: https://github.com/apache/hive/pull/1330#discussion_r468832097



##########
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##########
@@ -1861,6 +1861,19 @@ struct ScheduledQueryProgressInfo{
   4: optional string errorMessage,
 }
 
+struct GetFileListRequest {
+  1: optional string catName,
+  2: optional string dbName,
+  3: optional string tableName,
+  4: optional list<string> partVals,

Review comment:
       I think there is a trade-off here. On larger tables with lots of 
partitions, doing multiple RPCs to the metastore for fetching the file-metadata 
one at a time not only is less efficient, it is likely that the 
ValidWriteIdList is updated for the table during the time and the cache hit 
ratio could go down. You are right about large data sent over network. In my 
experience the file-metadata which we are sending here is few hundred bytes per 
partition and the its not very large even for few thousands of the partition. 
If use a partitionNames list here in the request, clients can always do 
batching like requesting 1000 partitions at a time which would be more 
efficient.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to