> On Nov. 5, 2019, 11:59 p.m., Ashutosh Chauhan wrote: > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java > > Line 331 (original), 324 (patched) > > <https://reviews.apache.org/r/71707/diff/2/?file=2171542#file2171542line331> > > > > you may use BlobStorageUtils::isBlobStorageFileSystem() here.
isBlobStorageFileSystem matches to s3,s3a,s3n, but only S3AFileSystem (https://github.com/apache/hadoop/blob/1d5d7d0989e9ee2f4527dc47ba5c80e1c38f641a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L3861) has an optimized listFiles() implementation. NativeS3FileSystem (https://github.com/apache/hadoop/blob/1d5d7d0989e9ee2f4527dc47ba5c80e1c38f641a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3native/NativeS3FileSystem.java) uses the same tree travesing algorithm from the base class. - Attila ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71707/#review218518 ----------------------------------------------------------- On Nov. 7, 2019, 9:23 a.m., Attila Magyar wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/71707/ > ----------------------------------------------------------- > > (Updated Nov. 7, 2019, 9:23 a.m.) > > > Review request for hive, Ashutosh Chauhan, Peter Vary, and Slim Bouguerra. > > > Bugs: HIVE-22411 > https://issues.apache.org/jira/browse/HIVE-22411 > > > Repository: hive-git > > > Description > ------- > > Executing single insert statements on a transactional table effects write > performance on a s3 file system. Each insert creates a new delta directory. > After each insert hive calculates statistics like number of file in the table > and total size of the table. In order to calculate these, it traverses the > directory recursively. During the recursion for each path a separate > listStatus call is executed. In the end the more delta directory you have the > more time it takes to calculate the statistics. > > Therefore insertion time goes up linearly. > > > Diffs > ----- > > common/src/java/org/apache/hadoop/hive/common/FileUtils.java 651b842f688 > common/src/java/org/apache/hadoop/hive/common/HiveStatsUtils.java > 09343e56166 > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java > 38e843aeacf > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java > bf206fffc26 > > > Diff: https://reviews.apache.org/r/71707/diff/3/ > > > Testing > ------- > > measured and plotted insertation time > > > Thanks, > > Attila Magyar > >