[
https://issues.apache.org/jira/browse/HBASE-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246405#comment-13246405
]
Keith Turner commented on HBASE-5479:
-------------------------------------
Accumulo does something similar to what this ticket describes. It has a
priority queue of tablets/regions that need to be major compacted. There is a
thread that scans all tablets every 30 seconds to see if a compaction is needed
and if so throws it on the queue. Should probably check after flush and bulk
import. I do not think multiple entries are placed on the queue. When
something is pulled of of the queue it decides then which files to compact.
The priority queue is sorted on compaction type and then number of files per
tablet. User requested compactions come first, then chops (special compaction
for merging tablets), then system initiated compactions, then idle compactions.
Among the same type of compaction, it will take the tablet/region with the
most files. To find the tablet/region with the most files it does a linear
scan of all of the tablets in the queue. I do not like the linear scan, but I
am not sure of a better way to do this since the number of files could change
while something is in the queue. Once we started taking the tablet w/ the most
files it really helped overall query performance by keeping the avg files per
tablet and std dev as low as possible.
One other wrinkle is that Accumulo will only compact up to 10 files at a time
(configurable). If a tablet has 30 files, it will compact the smallest 10
files and throw the tablet back on the major compaction queue. From a
tablet/region server perspective this also helps keep the number of total files
in the server down. We used to do compaction depth first, where the tablet
with 30 files would be compacted to one file. However this could take a long
time and a lot of compaction work could back up. Doing compactions breadth
first and taking the tablet with the most files has really helped keep the
number of files manageable under continuous ingest. Our continuous ingest test
tracks statistics (min, max, avg, std dev) on files per tablet over time and we
plot this info using gnuplot at the end of test. Doing this type of test and
looking at the data helped us formulate our current strategy. I would
encourage starting with test.
> Postpone CompactionSelection to compaction execution time
> ---------------------------------------------------------
>
> Key: HBASE-5479
> URL: https://issues.apache.org/jira/browse/HBASE-5479
> Project: HBase
> Issue Type: New Feature
> Components: io, performance, regionserver
> Reporter: Matt Corgan
>
> It can be commonplace for regionservers to develop long compaction queues,
> meaning a CompactionRequest may execute hours after it was created. The
> CompactionRequest holds a CompactionSelection that was selected at request
> time but may no longer be the optimal selection. The CompactionSelection
> should be created at compaction execution time rather than compaction request
> time.
> The current mechanism breaks down during high volume insertion. The
> inefficiency is clearest when the inserts are finished. Inserting for 5
> hours may build up 50 storefiles and a 40 element compaction queue. When
> finished inserting, you would prefer that the next compaction merges all 50
> files (or some large subset), but the current system will churn through each
> of the 40 compaction requests, the first of which may be hours old. This
> ends up re-compacting the same data many times.
> The current system is especially inefficient when dealing with time series
> data where the data in the storefiles has minimal overlap. With time series
> data, there is even less benefit to intermediate merges because most
> storefiles can be eliminated based on their key range during a read, even
> without bloomfilters. The only goal should be to reduce file count, not to
> minimize number of files merged for each read.
> There are other aspects to the current queuing mechanism that would need to
> be looked at. You would want to avoid having the same Store in the queue
> multiple times. And you would want the completion of one compaction to
> possibly queue another compaction request for the store.
> A alternative architecture to the current style of queues would be to have
> each Store (all open in memory) keep a compactionPriority score up to date
> after events like flushes, compactions, schema changes, etc. Then you create
> a "CompactionPriorityComparator implements Comparator<Store>" and stick all
> the Stores into a PriorityQueue (synchronized remove/add from the queue when
> the value changes). The async compaction threads would keep pulling off the
> head of that queue as long as the head has compactionPriority > X.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira