keith-turner commented on a change in pull request #232:
URL: https://github.com/apache/accumulo-website/pull/232#discussion_r568032121
##########
File path: _docs-2/administration/compaction.md
##########
@@ -0,0 +1,119 @@
+---
+title: Compactions
+category: administration
+order: 6
+---
+
+In Accumulo each tablet has a list of files associated with it. As data is
+written to Accumulo it is buffered in memory. The data buffered in memory is
+eventually written to files in DFS on a per tablet basis. Files can also be
+added to tablets directly by bulk import. In the background tablet servers run
+major compactions to merge multiple files into one. The tablet server has to
+decide which tablets to compact and which files within a tablet to compact.
+
+Within each tablet server there are one or more user configurable Comapction
+Services that compact tablets. Each Accumulo table has a user configurable
+Compaction Dispatcher that decides which compaction services that table will
+use. Accumulo generates metrics for each compaction service which enable users
+to adjust compaction service settings based on actual activity.
+
+Each compaction service has a compaction planner that decides which files to
+compact. The default compaction planner uses the table property {% plink
+table.compaction.major.ratio %} to decide which files to compact. The
+compaction ratio is real number >= 1.0. Assume LFS is the size of the largest
+file in a set, CR is the compaction ratio, and FSS is the sum of file sizes in
+a set. The default planner looks for file sets where LFS*CR <= FSS. By only
+compacting sets of files that meet this requirement the amount of work done by
+compactions is O(N * log<sub>CR</sub>(N)). Increasing the ratio will
+result in less compaction work and more files per tablet. More files per
+tablet means more higher query latency. So adjusting this ratio is a trade off
+between ingest and query performance.
+
+When CR=1.0 this will result in a goal of a single per file tablet, but the
+amount of work is O(N<sup>2</sup>) so 1.0 should be used with caution. For
+example if a tablet has a 1G file and 1M file is added, then a compaction of
+the 1G and 1M file would be queued.
+
+Compaction services and dispatchers were introduced in Accumulo 2.1, so much
+of this documentation only applies to Accumulo 2.1 and later.
Review comment:
It may be worthwhile to research practices for annotating since
information in documentation and see if there is anything worth adopting.
AFAIK we don't have any standard way to do this in the Accumulo docs. I have
read documentation for other projects where they had a standard way of
annotating since information, but I can not remember where I saw that.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]