-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71707/#review218524
-----------------------------------------------------------




standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java
Line 331 (original), 324 (patched)
<https://reviews.apache.org/r/71707/#comment306265>

    BlobStorageUtils::isBlobStorageFileSystem() checks if the scheme is either 
"s3","s3n" or "s3a". But only S3AFileSystem has the optimized listFiles(). In 
NativeS3FileSystem does not override the tree walking algorithm from the base 
class.
    
    See: 
https://github.com/apache/hadoop/blob/1d5d7d0989e9ee2f4527dc47ba5c80e1c38f641a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L3861
    
    and:
    
    
https://github.com/apache/hadoop/blob/1d5d7d0989e9ee2f4527dc47ba5c80e1c38f641a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3native/NativeS3FileSystem.java


- Attila Magyar


On Nov. 7, 2019, 9:23 a.m., Attila Magyar wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71707/
> -----------------------------------------------------------
> 
> (Updated Nov. 7, 2019, 9:23 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Peter Vary, and Slim Bouguerra.
> 
> 
> Bugs: HIVE-22411
>     https://issues.apache.org/jira/browse/HIVE-22411
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Executing single insert statements on a transactional table effects write 
> performance on a s3 file system. Each insert creates a new delta directory. 
> After each insert hive calculates statistics like number of file in the table 
> and total size of the table. In order to calculate these, it traverses the 
> directory recursively. During the recursion for each path a separate 
> listStatus call is executed. In the end the more delta directory you have the 
> more time it takes to calculate the statistics.
> 
> Therefore insertion time goes up linearly.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/common/FileUtils.java 651b842f688 
>   common/src/java/org/apache/hadoop/hive/common/HiveStatsUtils.java 
> 09343e56166 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java
>  38e843aeacf 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java
>  bf206fffc26 
> 
> 
> Diff: https://reviews.apache.org/r/71707/diff/3/
> 
> 
> Testing
> -------
> 
> measured and plotted insertation time
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>

Reply via email to