-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71707/
-----------------------------------------------------------

(Updated Nov. 5, 2019, 3:32 p.m.)


Review request for hive, Ashutosh Chauhan, Peter Vary, and Slim Bouguerra.


Changes
-------

Adressing Ashutosh's comments


Bugs: HIVE-22411
    https://issues.apache.org/jira/browse/HIVE-22411


Repository: hive-git


Description
-------

Executing single insert statements on a transactional table effects write 
performance on a s3 file system. Each insert creates a new delta directory. 
After each insert hive calculates statistics like number of file in the table 
and total size of the table. In order to calculate these, it traverses the 
directory recursively. During the recursion for each path a separate listStatus 
call is executed. In the end the more delta directory you have the more time it 
takes to calculate the statistics.

Therefore insertion time goes up linearly.


Diffs (updated)
-----

  
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java
 38e843aeacf 
  
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java
 bf206fffc26 


Diff: https://reviews.apache.org/r/71707/diff/2/

Changes: https://reviews.apache.org/r/71707/diff/1-2/


Testing
-------

measured and plotted insertation time


Thanks,

Attila Magyar

Reply via email to