keith-turner commented on a change in pull request #282:
URL: https://github.com/apache/accumulo-website/pull/282#discussion_r632613635



##########
File path: _docs-2/administration/compaction.md
##########
@@ -62,6 +68,151 @@ For more information see the javadoc for {% jlink 
org.apache.accumulo.core.spi.c
 
 The names of the compaction services and executors are used for logging and 
metrics.
 
+## External Compactions
+
+In Accumulo 2.1 we introduced a new optional feature that allows compactions 
to run
+outside of the Tablet Server.  External compactions introduces two new server 
processes
+in an Accumulo deployment:
+
+  * *Compactor*: Accumulo process that runs external compactions and is 
started with the name of a queue for which it will perform compactions.  In a 
typical deployment there will be many of these processes running, some for 
queue A, queue B, etc.  This process will only run a single compaction at a 
time and will communicate with the Compaction Coordinator to get a compaction 
job and report its status.
+
+  * *Compaction Coordinator*: a process that manages the compaction queues for 
all external compactions in the system and assigns compaction tasks to 
Compactors. In a typical deployment there will be one instance of this process 
in use at a time with a backup process waiting to become primary (much like the 
primary and secondary manager processes). This process communicates with the 
TabletServers to get external compaction job information and report back their 
status. 
+
+### Configuration
+
+Configuration for external compactions is very similar to the internal 
compaction example above.
+In the example below we create a Compaction Service `cs1` and configure it 
with an externalQueue
+named `DCQ1`. We then define the Compaction Dispatcher on table `testTable` 
and configure the
+table to use the `cs1` Compaction Service for planning and executing 
compactions.
+
+```
+config -s 
tserver.compaction.major.service.cs1.planner=org.apache.accumulo.core.spi.compaction.DefaultCompactionPlanner
+config -s 
'tserver.compaction.major.service.cs1.planner.opts.executors=[{"name":"all","externalQueue":"DCQ1"}]'
+config -t testTable -s 
table.compaction.dispatcher=org.apache.accumulo.core.spi.compaction.SimpleCompactionDispatcher
+config -t testTable -s table.compaction.dispatcher.opts.service=cs1
+```
+
+### Overview
+
+The CompactionCoordinator is responsible for managing the global external 
compaction work queue. For each external compaction queue, the tablet server 
will maintain an in memory priority queue of the tablets loaded on it that 
require external compactions. The coordinator polls all tservers to get summary 
information about their external compaction queues to combine the summary 
information to determine which tablet server to contact next to get work.  The 
coordinator does not maintain per tablet information, it only maintains enough 
information to allow it know which tablet server to contact next for a given 
queue.  The tablet server will then know what specific tablet in that queue 
needs to compact.

Review comment:
       There are two types of information in the metadata table related to 
external compactions.  
   
   The first type of metadata is stored under a tablets row and contains 
information about running external compactions.  Tablets are authorities for 
this information and are the only ones to read/write it, the coordinator does 
not.
   
   The second type of information in the metadata table is stored under the 
~ecomp prefix and contains information about completed or failed compactions.  
This information is written by the coordinator and deleted by tablets upon 
successful commit of an external compaction or deleted by the coordinator when 
it detects a completed compaction for a tablet that no longer exists.  The 
primary purpose of this information is to allow the coordinator to persist 
information about completed compactions for tablets that are temporarily 
offline so that it can notify them later.
   
   Compactors reserver and commit external compactions via the coordinator 
(which in turn talks to tservers).  During this process they pass back and 
forth information about specific extents.  However for the purpose of finding 
the next tserver to reserve an external compaction form, the coordinator does 
not maintain per tablet information.  Rather it maintains per tserver summary 
information that helps it find the next tserver to contact.  The summary 
information is managed by 
[QueueSummaries.java](https://github.com/apache/accumulo/blob/327a48f4cf8d09a006cc137d3505bfc644e93994/server/compaction-coordinator/src/main/java/org/apache/accumulo/coordinator/QueueSummaries.java#L38)
 within the coordinator.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to