> On Nov. 5, 2019, 4:33 p.m., Panos Garefalakis wrote: > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java > > Lines 328 (patched) > > <https://reviews.apache.org/r/71707/diff/2/?file=2171542#file2171542line335> > > > > Hey Attila, the solution looks good however, as other fileSystems might > > face similar issues in the future using this recursive method (i.e. Azure > > Blob storage) wouldn't it make sense to have hdfs a the base case and > > others separately? and maybe throw a warn message here when the filesystem > > is not supported? > > Attila Magyar wrote: > Hey Panos, I checked the hadoop project and I found only one FS > implementation with optimized recursive listFiles(), other implementations > use the tree walking impl. from the base class. I think that's the more > common case. Do you know where is the source of this Azure Blob storage? Is > that one open source at all?
Hey Attila, I was referring to this: https://hadoop.apache.org/docs/current/hadoop-azure/index.html but I was also assuming that the recursive method you modified be called for other filesystems as well - if thats not the case then my comment does not apply :) - Panos ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71707/#review218505 ----------------------------------------------------------- On Nov. 5, 2019, 3:32 p.m., Attila Magyar wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/71707/ > ----------------------------------------------------------- > > (Updated Nov. 5, 2019, 3:32 p.m.) > > > Review request for hive, Ashutosh Chauhan, Peter Vary, and Slim Bouguerra. > > > Bugs: HIVE-22411 > https://issues.apache.org/jira/browse/HIVE-22411 > > > Repository: hive-git > > > Description > ------- > > Executing single insert statements on a transactional table effects write > performance on a s3 file system. Each insert creates a new delta directory. > After each insert hive calculates statistics like number of file in the table > and total size of the table. In order to calculate these, it traverses the > directory recursively. During the recursion for each path a separate > listStatus call is executed. In the end the more delta directory you have the > more time it takes to calculate the statistics. > > Therefore insertion time goes up linearly. > > > Diffs > ----- > > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java > 38e843aeacf > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java > bf206fffc26 > > > Diff: https://reviews.apache.org/r/71707/diff/2/ > > > Testing > ------- > > measured and plotted insertation time > > > Thanks, > > Attila Magyar > >