[
https://issues.apache.org/jira/browse/PHOENIX-7764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18059117#comment-18059117
]
Emil Kleszcz commented on PHOENIX-7764:
---------------------------------------
Thanks a lot for the suggestion, [~tkhurana]. This was very helpful.
I tested this on our QA cluster (HBase 2.5.x / Phoenix 5.2.x) and can confirm
that setting
_ALTER TABLE <table> SET "phoenix.table.ttl.enabled" = false_
does have a real and immediate effect.
After applying the flag:
* The property is persisted at the HBase table descriptor level ({_}METADATA
=> 'phoenix.table.ttl.enabled' = false{_}).
* Regions are briefly closed and reopened, which matches a descriptor refresh
(expected side effect I assume).
* Major compactions that previously took a lot of time on the first CF
complete much faster and behave similarly to or even the same as non-Phoenix
CFs.
* Tombstones are physically removed again by major compaction which is the
reason why we need to keep majors running more often as we have spikes of
deletes in the table.
* Normal Phoenix operations (UPSERT, DELETE, scans, GROUP BY, MIN/MAX)
continued to work correctly in spot checks.
In our production use case:
* The table does not use Phoenix TTL ({_}PHOENIX_TTL{_} and _PHOENIX_TTL_HWM_
are NULL).
* There are no TTL views.
* Deletes are explicit user deletes, and we rely on HBase major compactions to
reclaim space.
Given this, disabling _phoenix.table.ttl.enabled_ appears to restore sane
compaction behavior for large, delete-heavy tables where TTL is not used. The
only operational side effect observed so far is the one-time region reopen when
the property is applied.
One clarification question. For a Phoenix table that does not use TTL at all,
is it safe to keep _phoenix.table.ttl.enabled = false_ permanently in
production?
Are there any less obvious side effects (e.g. on statistics, consistency, or
future upgrades) that operators should be aware of when leaving this disabled
long-term?
I don't seem to find anything documented on this in the upstream documentation.
Regarding Phoenix 5.3: from reading the current source, the flag still exists
and the TTL/compaction logic remains guarded by it. I don't see an indication
that its semantics change in 5.3, but please correct me if that assumption is
wrong. I was checking:
https://github.com/apache/phoenix/blob/5.3/phoenix-core-client/src/main/java/org/apache/phoenix/query/QueryServicesOptions.java#L480
There I can also see another flag for Phoenix compactions
_PHOENIX_COMPACTION_ENABLED_ and according to _isPhoenixCompactionEnabled_
method it disabled the compaction:
https://github.com/apache/phoenix/blob/5.3/phoenix-core-client/src/main/java/org/apache/phoenix/util/ScanUtil.java#L1217
I don't see this option in 5.2. Maybe you can help me to clarify that so I know
how to proceed with the migration to 5.3 later on.
Thanks again, this insight directly unblocked us operationally.
> Phoenix UngroupedAggregateRegionObserver causes extremely slow HBase major
> compactions by forcing statistics recomputation
> --------------------------------------------------------------------------------------------------------------------------
>
> Key: PHOENIX-7764
> URL: https://issues.apache.org/jira/browse/PHOENIX-7764
> Project: Phoenix
> Issue Type: Improvement
> Affects Versions: 5.2.1
> Reporter: Emil Kleszcz
> Priority: Major
>
> On HBase 2.5.10 with Phoenix 5.2.1, major compactions become _orders of
> magnitude slower_ when the Phoenix coprocessor
> _org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver_ is enabled
> in a given table (by default).
> Compactions that normally complete in minutes instead run for tens of hours,
> even when compacting only a few GB per column family.
> Thread dumps and logs show that Phoenix wraps HBase compaction with its own
> scanner chain and recomputes Phoenix statistics (guideposts) during
> compaction, dominating runtime.
> This makes large Phoenix tables effectively unmaintainable under heavy delete
> or split workloads.
> *Environment*
> * HBase: 2.5.10
> * Phoenix: 5.2.1
> * Hadoop: 3.3.6
> * JDK: 11.0.24
> * Table: multi-CF (A/B/C/D), billions of rows, heavy deletes
> *Observed behavior*
> Major compactions on CF A routinely take 20–30 hours for ~4–6 GB of
> compressed region data (depending on the number of tombstones, number of
> cells, and cell sizes):
> {code:java}
> Completed major compaction ... store A ... into size=3.9 G
> This selection was in queue for 58hrs, and took 27hrs, 14mins to
> execute.{code}
> At the same time, compactions on other CFs of similar or larger size complete
> in minutes.
> *Evidence: Phoenix on compaction hot path*
> 1. *Thread dumps during compaction*
> All long-running compaction threads are executing Phoenix code:
> {code:java}
> org.apache.phoenix.coprocessor.CompactionScanner$PhoenixLevelRowCompactor.compactRegionLevel
> org.apache.phoenix.schema.stats.StatisticsScanner.next
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction{code}
> 2. *RegionServer logs*
> {code:java}
> Starting CompactionScanner ... store A major compaction
> Closing CompactionScanner ... retained N of N cells phoenix level only
> {code}
> This shows Phoenix intercepting the HBase compaction and running a
> Phoenix-level scan.
> 3. *HFile inspection*
> Large store files show hundreds of millions of delete markers and billions of
> entries.
> Phoenix statistics recomputation during compaction requires scanning and
> processing all rows, which dominates runtime.
> *Controlled experiment*
> * Removing only _UngroupedAggregateRegionObserver_ from the table:
> ** CF A major compactions complete in minutes (comparable to other CFs).
> ** Normal point lookups, scans, joins still work.
> ** Phoenix statistics collection still enabled globally.
> * Side effect:
> ** Ungrouped aggregate queries ({_}COUNT( * ){_}, {_}MIN/MAX{_}, _SUM_
> without {_}GROUP BY{_}) fail, because Phoenix does not fall back to
> client-side aggregation and still plans {_}SERVER AGGREGATE INTO SINGLE
> ROW{_}.
> This confirms:
> * The coprocessor is the source of extreme compaction slowdown.
> * Phoenix tightly couples aggregate execution and compaction-time statistics
> recomputation.
> *Problem*
> * Phoenix performs expensive statistics work during HBase major compaction,
> a critical maintenance operation.
> * This work is opaque, unavoidable, and not configurable.
> * Large Phoenix tables with deletes/splits can remain under compaction for
> weeks, causing:
> ** prolonged compaction backlogs,
> ** blocked balancing,
> ** unpredictable query latency spikes.
> *Expected*
> One of the following (any would be acceptable):
> # A configuration to disable Phoenix statistics recomputation during
> compaction.
> # A way to decouple {{UngroupedAggregateRegionObserver}} from compaction-time
> scanning.
> # Clear documentation that Phoenix majorly alters HBase compaction cost, with
> guidance for large tables.
> # A fix so Phoenix falls back to client-side aggregation when the coprocessor
> is absent (so operators can safely remove it).
> At minimum, confirmation whether this behavior is expected and unavoidable in
> Phoenix 5.2.x on HBase 2.5.x.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)