[
https://issues.apache.org/jira/browse/HBASE-19646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16304858#comment-16304858
]
BELUGA BEHR commented on HBASE-19646:
-------------------------------------
[~eclark] Thanks!
{quote}
In a default configuration, major compactions are scheduled automatically to
run once in a 7-day period. This is sometimes inappropriate for systems in
production. You can manage major compactions manually.
{quote}
{quote}
By default, the maximum time between major compactions is 7 days, plus or minus
a 4.8 hour period, and determined randomly within those parameters.
{quote}
https://hbase.apache.org/book.html#compaction
So, there is, by default, a random value of plus or minus a 4.8 hour period.
This is not great if perhaps a SA wants to monitor the system during a major
compaction, but has a variable amount of time to wait. Regardless, it's
confusing because, for production systems, the docs recommend doing a manually
triggered major compaction anyway. So what's the point of implementing a
feature to do the major compactions if it is then recommended to just do them
manually?
If we believe that we should keep the major compaction triggering as simple as
possible, and just allow cron to step in as the "business needs" solution, then
why was _hbase.offpeak.*.hour_ introduced? Isn't that specifying a business
need within an HBase configuration?
Can you please explain where the line is here?
> Add CRON To Major Compaction
> ----------------------------
>
> Key: HBASE-19646
> URL: https://issues.apache.org/jira/browse/HBASE-19646
> Project: HBase
> Issue Type: Bug
> Components: Compaction
> Reporter: BELUGA BEHR
> Priority: Minor
>
> HBase provides _hbase.hregion.majorcompaction_
> {quote}
> Time between major compactions, expressed in milliseconds. Set to 0 to
> disable time-based automatic major compactions. User-requested and size-based
> major compactions will still run. This value is multiplied by
> hbase.hregion.majorcompaction.jitter to cause compaction to start at a
> somewhat-random time during a given window of time. The default value is 7
> days, expressed in milliseconds. If major compactions are causing disruption
> in your environment, you can configure them to run at off-peak times for your
> deployment, or disable time-based major compactions by setting this parameter
> to 0, and run major compactions in a cron job or by another external
> mechanism.
> {quote}
> Instead of this existing mechanism, that adds randomness into a production
> system (ugh), let's simply allow users to specify a cron string and replace
> this simple periodic (+jitter) scheduling mechanism. CRON is useful for
> systems that have known windows of time (i.e. weekend, nights) that are known
> to be good times for compaction.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)