vihangk1 commented on a change in pull request #1330:
URL: https://github.com/apache/hive/pull/1330#discussion_r466575514



##########
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##########
@@ -1861,6 +1861,19 @@ struct ScheduledQueryProgressInfo{
   4: optional string errorMessage,
 }
 
+struct GetFileListRequest {

Review comment:
       This naming of this struct could be more generic (may be a better name 
could be GetFileMetadataRequest). Also did you consider reusing 
existing/extending getFileMetadata HMS API?

##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
##########
@@ -5685,6 +5706,67 @@ private void alter_table_core(String catName, String 
dbname, String name, Table
       }
     }
 
+    @Override
+    public GetFileListResponse get_file_list(GetFileListRequest req) throws 
MetaException {
+      String catName = req.isSetCatName() ? req.getCatName() : 
getDefaultCatalog(conf);
+      String dbName = req.getDbName();
+      String tblName = req.getTableName();
+      List<String> partitions = req.getPartVals();
+      // Will be used later, when cache is introduced
+      String validWriteIdList = req.getValidWriteIdList();
+
+      startFunction("get_file_list", ": " + TableName.getQualified(catName, 
dbName, tblName)
+              + ", partitions: " + partitions.toString());
+
+
+      GetFileListResponse response = new GetFileListResponse();
+
+      boolean success = false;
+      Exception ex = null;
+      try {
+        Partition p =  getMS().getPartition(catName, dbName, tblName, 
partitions);
+        Path path = new Path(p.getSd().getLocation());
+
+        FileSystem fs = path.getFileSystem(conf);
+        RemoteIterator<LocatedFileStatus> itr = fs.listFiles(path, true);
+        while (itr.hasNext()) {
+          FileStatus fStatus = itr.next();
+          Reader reader = OrcFile.createReader(fStatus.getPath(), 
OrcFile.readerOptions(fs.getConf()));

Review comment:
       Does this assume that the request is always for a ORC table?

##########
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##########
@@ -1861,6 +1861,19 @@ struct ScheduledQueryProgressInfo{
   4: optional string errorMessage,
 }
 
+struct GetFileListRequest {
+  1: optional string catName,
+  2: optional string dbName,
+  3: optional string tableName,
+  4: optional list<string> partVals,
+  6: optional string validWriteIdList
+}
+
+struct GetFileListResponse {

Review comment:
       I think it will be useful to have a separate struct defined for the 
FileMetadata which also includes a type field. For instance, I can see this 
being useful for various engines and the FileMetadata format could be different 
for each engine. For instance, engine1 may require FileStatus, BlockInformation 
while engine2 only is interested in FileStatus, FilemodificationTime, while 
there is another file-metadata type for ACID state which includes some ACID 
specific information like fileformat that you used below.

##########
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##########
@@ -2454,10 +2467,14 @@ PartitionsResponse 
get_partitions_req(1:PartitionsRequest req)
   // partition keys in new_part should be the same as those in old partition.
   void rename_partition(1:string db_name, 2:string tbl_name, 3:list<string> 
part_vals, 4:Partition new_part)
                        throws (1:InvalidOperationException o1, 2:MetaException 
o2)
-  
+
   RenamePartitionResponse rename_partition_req(1:RenamePartitionRequest req)
                        throws (1:InvalidOperationException o1, 2:MetaException 
o2)
 
+  // Returns a file list using FlatBuffers as serialization
+  GetFileListResponse get_file_list(1:GetFileListRequest req)
+    throws(1:MetaException o1)

Review comment:
       Seems like if the partition doesn't exist you might need to throw a 
NoSuchObjectFoundException as well.

##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
##########
@@ -5685,6 +5706,67 @@ private void alter_table_core(String catName, String 
dbname, String name, Table
       }
     }
 
+    @Override
+    public GetFileListResponse get_file_list(GetFileListRequest req) throws 
MetaException {
+      String catName = req.isSetCatName() ? req.getCatName() : 
getDefaultCatalog(conf);
+      String dbName = req.getDbName();
+      String tblName = req.getTableName();
+      List<String> partitions = req.getPartVals();
+      // Will be used later, when cache is introduced
+      String validWriteIdList = req.getValidWriteIdList();
+
+      startFunction("get_file_list", ": " + TableName.getQualified(catName, 
dbName, tblName)
+              + ", partitions: " + partitions.toString());
+
+
+      GetFileListResponse response = new GetFileListResponse();
+
+      boolean success = false;
+      Exception ex = null;
+      try {
+        Partition p =  getMS().getPartition(catName, dbName, tblName, 
partitions);
+        Path path = new Path(p.getSd().getLocation());
+
+        FileSystem fs = path.getFileSystem(conf);
+        RemoteIterator<LocatedFileStatus> itr = fs.listFiles(path, true);
+        while (itr.hasNext()) {
+          FileStatus fStatus = itr.next();
+          Reader reader = OrcFile.createReader(fStatus.getPath(), 
OrcFile.readerOptions(fs.getConf()));
+          boolean isRawFormat  = 
!CollectionUtils.isEqualCollection(reader.getSchema().getFieldNames(), 
ALL_ACID_ROW_NAMES);
+          int fileFormat = isRawFormat ? 0 : 2;

Review comment:
       I think fileFormat having 0 or 2 looks very arbitary. We should atleast 
define the valid values of this field and what each value means.

##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
##########
@@ -5685,6 +5706,67 @@ private void alter_table_core(String catName, String 
dbname, String name, Table
       }
     }
 
+    @Override
+    public GetFileListResponse get_file_list(GetFileListRequest req) throws 
MetaException {
+      String catName = req.isSetCatName() ? req.getCatName() : 
getDefaultCatalog(conf);
+      String dbName = req.getDbName();
+      String tblName = req.getTableName();
+      List<String> partitions = req.getPartVals();
+      // Will be used later, when cache is introduced
+      String validWriteIdList = req.getValidWriteIdList();
+
+      startFunction("get_file_list", ": " + TableName.getQualified(catName, 
dbName, tblName)
+              + ", partitions: " + partitions.toString());
+
+
+      GetFileListResponse response = new GetFileListResponse();
+
+      boolean success = false;
+      Exception ex = null;
+      try {
+        Partition p =  getMS().getPartition(catName, dbName, tblName, 
partitions);
+        Path path = new Path(p.getSd().getLocation());
+
+        FileSystem fs = path.getFileSystem(conf);
+        RemoteIterator<LocatedFileStatus> itr = fs.listFiles(path, true);
+        while (itr.hasNext()) {
+          FileStatus fStatus = itr.next();
+          Reader reader = OrcFile.createReader(fStatus.getPath(), 
OrcFile.readerOptions(fs.getConf()));
+          boolean isRawFormat  = 
!CollectionUtils.isEqualCollection(reader.getSchema().getFieldNames(), 
ALL_ACID_ROW_NAMES);
+          int fileFormat = isRawFormat ? 0 : 2;
+          response.addToFileListData(createFileStatus(fStatus, 
p.getSd().getLocation(), fileFormat));
+        }
+        success = true;
+      } catch (Exception e) {
+        ex = e;
+        LOG.error("Failed to list files with error: " + e.getMessage());
+        throw new MetaException(e.getMessage());
+      } finally {
+        endFunction("get_file_list", success, ex);
+      }
+      return response;
+    }
+
+    private ByteBuffer createFileStatus(FileStatus fileStatus, String 
basePath, int fileFormat) {

Review comment:
       Please add a java doc here. Also, would be good to attribute the source 
where this code is from with a link to the file.

##########
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##########
@@ -1861,6 +1861,19 @@ struct ScheduledQueryProgressInfo{
   4: optional string errorMessage,
 }
 
+struct GetFileListRequest {
+  1: optional string catName,
+  2: optional string dbName,
+  3: optional string tableName,
+  4: optional list<string> partVals,

Review comment:
       Does this mean the file list is retrieved on 1 partition at a time? It 
seems like we can have bulk retrieval for a given list of partition names?

##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
##########
@@ -240,6 +247,20 @@
   private static ZooKeeperHiveHelper zooKeeperHelper = null;
   private static String msHost = null;
 
+  static final String OPERATION_FIELD_NAME = "operation";

Review comment:
       Some documentation of these constants will help improve the readability.

##########
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##########
@@ -1861,6 +1861,19 @@ struct ScheduledQueryProgressInfo{
   4: optional string errorMessage,
 }
 
+struct GetFileListRequest {
+  1: optional string catName,
+  2: optional string dbName,
+  3: optional string tableName,
+  4: optional list<string> partVals,
+  6: optional string validWriteIdList
+}
+
+struct GetFileListResponse {
+  1: optional list<binary> fileListData,
+  2: optional i32 fbVersionNumber

Review comment:
       It would be good to have a comment here saying what the version number 
represents. Also, fbVersionNumber sounds like its a flatbuffer specific version 
number? Is that correct? Why does this field need to be optional?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to