[
https://issues.apache.org/jira/browse/PHOENIX-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16020365#comment-16020365
]
Sergey Soldatov commented on PHOENIX-3871:
------------------------------------------
We collect statistic on all compactions that have COMPACT_DROP_DELETES. Those
not only major compactions, but also minor compactions if one of default
compaction policies are used ( RatioBasedCompactionPolicy sets the compaction
as major if all storefile candidates get into the compaction). Running
statistic collection on upserts sounds like an overkill.
> Incremental stats collection
> ----------------------------
>
> Key: PHOENIX-3871
> URL: https://issues.apache.org/jira/browse/PHOENIX-3871
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Eli Levine
>
> Phoenix automatically gathers statistics at [major compaction
> time|http://phoenix.apache.org/update_statistics.html]. While this is useful
> and accurate, it also means that statistics can become stale due to the
> infrequency of major compactions (can be days between major compactions),
> reducing their usefulness.
> This jira asks the question: Is it possible for Phoenix to collects
> statistics at a more granular level, say for every (or a sampling of) UPSERT,
> or minor compaction. Since statistics are always approximations, it is OK for
> this incremental approach to not be 100% accurate.
> The current stats collection mechanism at major compaction time should be
> kept to accurately "fix up" stats at major compaction time.
> [~jamestaylor], FYI. We talked about this in person a few weeks ago. Creating
> this Jira for posterity. Please add anything that I missed. Thanks!
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)