[ 
https://issues.apache.org/jira/browse/HBASE-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13266932#comment-13266932
 ] 

Nicolas Spiegelberg commented on HBASE-5920:
--------------------------------------------

tl;dr. The old master had a debug line about issuing major compactions.  This 
is removed in the new master and should be added back in :(

Couple notes about your cluster:
1. Your cluster is obviously in a very bad state.  "Recursive enqueue" means 
that your files/store have been huge and the RS is trying to aggressively cut 
them down.  It can't do an online major compaction until the state of the 
cluster is better.
2. "compaction_queue=(59:0)" means that all compactions are being upgraded to 
the large queue (59 large, 0 small).  Either this is because all regions are 
congested or because you have "hbase.regionserver.thread.compaction.throttle" 
improperly configured.  See HBASE-5867.  Optimally, all your tiny regions 
should go in one queue and substantial compactions on large regions should go 
in another.

That said, let's talk about the compaction logic.  User-issued majors are 
asynchronous.  They are not guaranteed to happen immediately, just when the 
cluster is ready to handle the major compaction.  The RS will recursively do 
minor compactions until the cluster is in a good state and then do a major.  
The major compaction flag will persist until this happens, but will not 
currently persist across reboots or region moves.
                
> New Compactions Logic can silently prevent user-initiated compactions from 
> occurring
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-5920
>                 URL: https://issues.apache.org/jira/browse/HBASE-5920
>             Project: HBase
>          Issue Type: Bug
>          Components: client, regionserver
>    Affects Versions: 0.92.1
>            Reporter: Derek Wollenstein
>            Priority: Minor
>              Labels: compaction
>
> There seem to be some tuning settings in which manually triggered major 
> compactions will do nothing, including loggic
> From Store.java in the function
>   List<StoreFile> compactSelection(List<StoreFile> candidates)
> When a user manually triggers a compaction, this follows the same logic as a 
> normal compaction check.  when a user manually triggers a major compaction, 
> something similar happens.  Putting this all together:
> 1. If a user triggers a major compaction, this is checked against a max files 
> threshold (hbase.hstore.compaction.max). If the number of storefiles to 
> compact is > max files, then we downgrade to a minor compaction
> 2. If we are in a minor compaction, we do the following checks:
>    a. If the file is less than a minimum size 
> (hbase.hstore.compaction.min.size) we automatically include it
>    b. Otherwise, we check how the size compares to the next largest size.  
> based on hbase.hstore.compaction.ratio.  
>   c. If the number of files included is less than a minimum count 
> (hbase.hstore.compaction.min) then don't compact.
> In many of the exit strategies, we aren't seeing an error message.
> The net-net of this is that if we have a mix of very large and very small 
> files, we may end up having too many files to do a major compact, but too few 
> files to do a minor compact.
> I'm trying to go through and see if I'm understanding things correctly, but 
> this seems like the bug
> To put it another way
> 2012-05-02 20:09:36,389 DEBUG 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Large Compaction 
> requested: 
> regionName=str,44594594594594592,1334939064521.f7aed25b55d4d7988af763bede9ce74e.,
>  store
> Name=c, fileCount=15, fileSize=1.5g (20.2k, 362.5m, 155.3k, 3.0m, 30.7k, 
> 361.2m, 6.9m, 4.7m, 14.7k, 363.4m, 30.9m, 3.2m, 7.3k, 362.9m, 23.5m), 
> priority=-9, time=3175046817624398; Because: Recursive enqueue; 
> compaction_queue=(59:0), split_queue=0
> When we had a minimum compaction size of 128M, and default settings for 
> hbase.hstore.compaction.min,hbase.hstore.compaction.max,hbase.hstore.compaction.ratio,
>  we were not getting a compaction to run even if we ran
> major_compact 
> 'str,44594594594594592,1334939064521.f7aed25b55d4d7988af763bede9ce74e.' from 
> the ruby shell.  Note that we had many tiny regions (20k, 155k, 3m, 30k,..) 
> and several large regions (362.5m,361.2m,363.4m,362.9m).  I think the bimodal 
> nature of the sizes prevented us from doing a compaction.
> I'm not 100% sure where this errored out because when I manually triggered a 
> compaction, I did not see
> '      // if we don't have enough files to compact, just wait             
>       if (filesToCompact.size() < this.minFilesToCompact) {              
>         if (LOG.isDebugEnabled()) {                                      
>           LOG.debug("Skipped compaction of " + this.storeNameStr         
>             + ".  Only " + (end - start) + " file(s) of size "           
>             + StringUtils.humanReadableInt(totalSize)                    
>             + " have met compaction criteria.");                         
>         }                                                                
> ' 
> being printed in the logs (and I know DEBUG logging was enabled because I saw 
> this elsewhere).  
> I'd be happy with better error messages when we decide not to compact for 
> user enabled compactions.
> I'd also like to see some override that says "user triggered major compaction 
> always occurs", but maybe that's a bad idea for other reasons.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to