> On Nov. 5, 2019, 4:33 p.m., Panos Garefalakis wrote:
> > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java
> > Lines 328 (patched)
> > <https://reviews.apache.org/r/71707/diff/2/?file=2171542#file2171542line335>
> >
> >     Hey Attila, the solution looks good however, as other fileSystems might 
> > face similar issues in the future using this recursive method (i.e. Azure 
> > Blob storage)  wouldn't it make sense to have hdfs a the base case and 
> > others separately? and maybe throw a warn message here when the filesystem 
> > is not supported?
> 
> Attila Magyar wrote:
>     Hey Panos, I checked the hadoop project and I found only one FS 
> implementation with optimized recursive listFiles(), other implementations 
> use the tree walking impl. from the base class. I think that's the more 
> common case. Do you know where is the source of this Azure Blob storage? Is 
> that one open source at all?

Hey Attila, I was referring to this: 
https://hadoop.apache.org/docs/current/hadoop-azure/index.html 
but I was also assuming that the recursive method you modified be called for 
other filesystems as well - if thats not the case then my comment does not 
apply :)


- Panos


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71707/#review218505
-----------------------------------------------------------


On Nov. 5, 2019, 3:32 p.m., Attila Magyar wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71707/
> -----------------------------------------------------------
> 
> (Updated Nov. 5, 2019, 3:32 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Peter Vary, and Slim Bouguerra.
> 
> 
> Bugs: HIVE-22411
>     https://issues.apache.org/jira/browse/HIVE-22411
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Executing single insert statements on a transactional table effects write 
> performance on a s3 file system. Each insert creates a new delta directory. 
> After each insert hive calculates statistics like number of file in the table 
> and total size of the table. In order to calculate these, it traverses the 
> directory recursively. During the recursion for each path a separate 
> listStatus call is executed. In the end the more delta directory you have the 
> more time it takes to calculate the statistics.
> 
> Therefore insertion time goes up linearly.
> 
> 
> Diffs
> -----
> 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java
>  38e843aeacf 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java
>  bf206fffc26 
> 
> 
> Diff: https://reviews.apache.org/r/71707/diff/2/
> 
> 
> Testing
> -------
> 
> measured and plotted insertation time
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>

Reply via email to