> On Nov. 5, 2019, 4:33 p.m., Panos Garefalakis wrote: > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java > > Lines 328 (patched) > > <https://reviews.apache.org/r/71707/diff/2/?file=2171542#file2171542line335> > > > > Hey Attila, the solution looks good however, as other fileSystems might > > face similar issues in the future using this recursive method (i.e. Azure > > Blob storage) wouldn't it make sense to have hdfs a the base case and > > others separately? and maybe throw a warn message here when the filesystem > > is not supported?
Hey Panos, I checked the hadoop project and I found only one FS implementation with optimized recursive listFiles(), other implementations use the tree walking impl. from the base class. I think that's the more common case. Do you know where is the source of this Azure Blob storage? Is that one open source at all? - Attila ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71707/#review218505 ----------------------------------------------------------- On Nov. 5, 2019, 3:32 p.m., Attila Magyar wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/71707/ > ----------------------------------------------------------- > > (Updated Nov. 5, 2019, 3:32 p.m.) > > > Review request for hive, Ashutosh Chauhan, Peter Vary, and Slim Bouguerra. > > > Bugs: HIVE-22411 > https://issues.apache.org/jira/browse/HIVE-22411 > > > Repository: hive-git > > > Description > ------- > > Executing single insert statements on a transactional table effects write > performance on a s3 file system. Each insert creates a new delta directory. > After each insert hive calculates statistics like number of file in the table > and total size of the table. In order to calculate these, it traverses the > directory recursively. During the recursion for each path a separate > listStatus call is executed. In the end the more delta directory you have the > more time it takes to calculate the statistics. > > Therefore insertion time goes up linearly. > > > Diffs > ----- > > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java > 38e843aeacf > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java > bf206fffc26 > > > Diff: https://reviews.apache.org/r/71707/diff/2/ > > > Testing > ------- > > measured and plotted insertation time > > > Thanks, > > Attila Magyar > >