Peter Vary commented on HIVE-22081:

[~Rajkumar Singh]: Is this for cases where the automatic compaction was turned 
off for a while, and then someone turns that on later? So we have big number of 
tables because of the accumulation of the changes before the automatic 
compaction was turned on. In this case splitting the jobs to multiple threads 
is really useful. On the other hand if we have so many changes under 5 min that 
it takes more than 5 min to check if compaction is needed then we might to 
consider some other way to calculate / cache the check results. Splitting out 
the tasks for multiple threads could help, but it is still a CPU hog and IO 

Also please consider fixing the checkstyle warnings.



> Hivemetastore Performance: Compaction Initiator Thread overwhelmed if there 
> are too many Table/partitions are eligible for compaction 
> --------------------------------------------------------------------------------------------------------------------------------------
>                 Key: HIVE-22081
>                 URL: https://issues.apache.org/jira/browse/HIVE-22081
>             Project: Hive
>          Issue Type: Improvement
>          Components: Transactions
>    Affects Versions: 3.1.1
>            Reporter: Rajkumar Singh
>            Assignee: Rajkumar Singh
>            Priority: Major
>         Attachments: HIVE-22081.patch
> if Automatic Compaction is turned on, Initiator thread check for potential 
> table/partitions which are eligible for compactions and run some checks in 
> for loop before requesting compaction for eligibles. Though initiator thread 
> is configured to run at interval 5 min default, in case of many objects it 
> keeps on running as these checks are IO intensive and hog cpu.
> In the proposed changes, I am planning to do
> 1. passing less object to for loop by filtering out the objects based on the 
> condition which we are checking within the loop.
> 2. Doing Async call using future to determine compaction type(this is where 
> we do FileSystem calls)

This message was sent by Atlassian JIRA

Reply via email to