[
https://issues.apache.org/jira/browse/PHOENIX-7764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18059223#comment-18059223
]
Kadir Ozdemir commented on PHOENIX-7764:
----------------------------------------
[~tkhurana], since the practical solution in this case is to disable Phoenix
(row) level compaction and use HBase (cell) level compaction, we can start
writing empty column to all column families and in the same PR we remove the
logic of the region level compaction and masking. In addition to that, we can
write a tool to add empty column to all column families that can be run
asynchronously.
> Phoenix UngroupedAggregateRegionObserver causes extremely slow HBase major
> compactions by forcing statistics recomputation
> --------------------------------------------------------------------------------------------------------------------------
>
> Key: PHOENIX-7764
> URL: https://issues.apache.org/jira/browse/PHOENIX-7764
> Project: Phoenix
> Issue Type: Improvement
> Affects Versions: 5.2.1
> Reporter: Emil Kleszcz
> Priority: Major
>
> On HBase 2.5.10 with Phoenix 5.2.1, major compactions become _orders of
> magnitude slower_ when the Phoenix coprocessor
> _org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver_ is enabled
> in a given table (by default).
> Compactions that normally complete in minutes instead run for tens of hours,
> even when compacting only a few GB per column family.
> Thread dumps and logs show that Phoenix wraps HBase compaction with its own
> scanner chain and recomputes Phoenix statistics (guideposts) during
> compaction, dominating runtime.
> This makes large Phoenix tables effectively unmaintainable under heavy delete
> or split workloads.
> *Environment*
> * HBase: 2.5.10
> * Phoenix: 5.2.1
> * Hadoop: 3.3.6
> * JDK: 11.0.24
> * Table: multi-CF (A/B/C/D), billions of rows, heavy deletes
> *Observed behavior*
> Major compactions on CF A routinely take 20–30 hours for ~4–6 GB of
> compressed region data (depending on the number of tombstones, number of
> cells, and cell sizes):
> {code:java}
> Completed major compaction ... store A ... into size=3.9 G
> This selection was in queue for 58hrs, and took 27hrs, 14mins to
> execute.{code}
> At the same time, compactions on other CFs of similar or larger size complete
> in minutes.
> *Evidence: Phoenix on compaction hot path*
> 1. *Thread dumps during compaction*
> All long-running compaction threads are executing Phoenix code:
> {code:java}
> org.apache.phoenix.coprocessor.CompactionScanner$PhoenixLevelRowCompactor.compactRegionLevel
> org.apache.phoenix.schema.stats.StatisticsScanner.next
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction{code}
> 2. *RegionServer logs*
> {code:java}
> Starting CompactionScanner ... store A major compaction
> Closing CompactionScanner ... retained N of N cells phoenix level only
> {code}
> This shows Phoenix intercepting the HBase compaction and running a
> Phoenix-level scan.
> 3. *HFile inspection*
> Large store files show hundreds of millions of delete markers and billions of
> entries.
> Phoenix statistics recomputation during compaction requires scanning and
> processing all rows, which dominates runtime.
> *Controlled experiment*
> * Removing only _UngroupedAggregateRegionObserver_ from the table:
> ** CF A major compactions complete in minutes (comparable to other CFs).
> ** Normal point lookups, scans, joins still work.
> ** Phoenix statistics collection still enabled globally.
> * Side effect:
> ** Ungrouped aggregate queries ({_}COUNT( * ){_}, {_}MIN/MAX{_}, _SUM_
> without {_}GROUP BY{_}) fail, because Phoenix does not fall back to
> client-side aggregation and still plans {_}SERVER AGGREGATE INTO SINGLE
> ROW{_}.
> This confirms:
> * The coprocessor is the source of extreme compaction slowdown.
> * Phoenix tightly couples aggregate execution and compaction-time statistics
> recomputation.
> *Problem*
> * Phoenix performs expensive statistics work during HBase major compaction,
> a critical maintenance operation.
> * This work is opaque, unavoidable, and not configurable.
> * Large Phoenix tables with deletes/splits can remain under compaction for
> weeks, causing:
> ** prolonged compaction backlogs,
> ** blocked balancing,
> ** unpredictable query latency spikes.
> *Expected*
> One of the following (any would be acceptable):
> # A configuration to disable Phoenix statistics recomputation during
> compaction.
> # A way to decouple {{UngroupedAggregateRegionObserver}} from compaction-time
> scanning.
> # Clear documentation that Phoenix majorly alters HBase compaction cost, with
> guidance for large tables.
> # A fix so Phoenix falls back to client-side aggregation when the coprocessor
> is absent (so operators can safely remove it).
> At minimum, confirmation whether this behavior is expected and unavoidable in
> Phoenix 5.2.x on HBase 2.5.x.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)