vihangk1 commented on a change in pull request #1330: URL: https://github.com/apache/hive/pull/1330#discussion_r468832097
########## File path: standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift ########## @@ -1861,6 +1861,19 @@ struct ScheduledQueryProgressInfo{ 4: optional string errorMessage, } +struct GetFileListRequest { + 1: optional string catName, + 2: optional string dbName, + 3: optional string tableName, + 4: optional list<string> partVals, Review comment: I think there is a trade-off here. On larger tables with lots of partitions, doing multiple RPCs to the metastore for fetching the file-metadata one at a time not only is less efficient, it is likely that the ValidWriteIdList is updated for the table during the time and the cache hit ratio could go down. You are right about large data sent over network. In my experience the file-metadata which we are sending here is few hundred bytes per partition and the its not very large even for few thousands of the partition. If use a partitionNames list here in the request, clients can always do batching like requesting 1000 partitions at a time which would be more efficient. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org