KIRTI RUGE created HIVE-28491:
---------------------------------

             Summary: Check feasibility of improvement in 
ShowTableStatusOperation.
                 Key: HIVE-28491
                 URL: https://issues.apache.org/jira/browse/HIVE-28491
             Project: Hive
          Issue Type: Improvement
          Components: Hive
            Reporter: KIRTI RUGE


SHOW TABLE EXTENDED IN test_db LIKE foo_n4; takes around 25 minutes to 2.5 
hours  on live cluster when table has huge partitions

Scenario given:
 # A table has 8.8k partitions and 26k files+dirs in the table location.
 # From production jstack it is clearly shown that method 
writeFileSystemStats() is calling getfilestatus and liststatus for every single 
file and dir and table location of given table.
 # ShowTableStatusOperation->TextMetaDataFormatter. showTableStatus is a single 
call which tried to fetch fileStatus for all paths of partitions of given table.
 # Check feasibility of improvement of above code if it can be be optimized 
with more elegant code like-
instead of going over fs.getFileStatus(loc) for each location we can think of 
using RemoteIterator

like RemoteIterator<LocatedFileStatus> remoteIterator = 
fs.listFiles(base.getPath(), ifneededrecursive:true);
while (remoteIterator.hasNext())
{   LocatedFileStatus each = remoteIterator.next();   each.getLen();   
each.getAccessTime();   each.getModificationTime()  }
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to