dlmarion commented on a change in pull request #282:
URL: https://github.com/apache/accumulo-website/pull/282#discussion_r634665156
##########
File path: _docs-2/administration/compaction.md
##########
@@ -62,6 +68,151 @@ For more information see the javadoc for {% jlink
org.apache.accumulo.core.spi.c
The names of the compaction services and executors are used for logging and
metrics.
+## External Compactions
+
+In Accumulo 2.1 we introduced a new optional feature that allows compactions
to run
+outside of the Tablet Server. External compactions introduces two new server
processes
+in an Accumulo deployment:
+
+ * *Compactor*: Accumulo process that runs external compactions and is
started with the name of a queue for which it will perform compactions. In a
typical deployment there will be many of these processes running, some for
queue A, queue B, etc. This process will only run a single compaction at a
time and will communicate with the Compaction Coordinator to get a compaction
job and report its status.
+
+ * *Compaction Coordinator*: a process that manages the compaction queues for
all external compactions in the system and assigns compaction tasks to
Compactors. In a typical deployment there will be one instance of this process
in use at a time with a backup process waiting to become primary (much like the
primary and secondary manager processes). This process communicates with the
TabletServers to get external compaction job information and report back their
status.
+
+### Configuration
+
+Configuration for external compactions is very similar to the internal
compaction example above.
+In the example below we create a Compaction Service `cs1` and configure it
with an externalQueue
+named `DCQ1`. We then define the Compaction Dispatcher on table `testTable`
and configure the
+table to use the `cs1` Compaction Service for planning and executing
compactions.
+
+```
+config -s
tserver.compaction.major.service.cs1.planner=org.apache.accumulo.core.spi.compaction.DefaultCompactionPlanner
+config -s
'tserver.compaction.major.service.cs1.planner.opts.executors=[{"name":"all","externalQueue":"DCQ1"}]'
+config -t testTable -s
table.compaction.dispatcher=org.apache.accumulo.core.spi.compaction.SimpleCompactionDispatcher
+config -t testTable -s table.compaction.dispatcher.opts.service=cs1
+```
+
+### Overview
+
+The CompactionCoordinator is responsible for managing the global external
compaction work queue. For each external compaction queue, the tablet server
will maintain an in memory priority queue of the tablets loaded on it that
require external compactions. The coordinator polls all tservers to get summary
information about their external compaction queues to combine the summary
information to determine which tablet server to contact next to get work. The
coordinator does not maintain per tablet information, it only maintains enough
information to allow it to know which tablet server to contact next for a given
queue. The tablet server will then know what specific tablet in that queue
needs to compact.
+
+When a Compactor is free to perform work, it asks the CompactionCoordinator
for the next compaction job. The CompactionCoordinator contacts the next
TabletServer that has the highest priority for the Compactor's queue. The
TabletServer returns the information necessary for the compaction to occur to
the CompactionCoordinator, which is passed on to the Compactor. The Compaction
Coordinator maintains an in-memory list of running compactions and also inserts
an entry into the metadata table for the tablet to denote that an external
compaction is running. When the Compactor has finished the compaction, it
notifies the CompactionCoordinator which inserts an entry into the metadata
table to denote that the external compaction completed and it attempts to
notify the TabletServer. If successful, the TabletServer commits the major
compaction. If the TabletServer is down, or the Tablet has become hosted on a
different TabletServer, then the CompactionCoordinator will fail to notify the
Tablet
Server, but the metadata table entries will remain. The major compaction will
be committed in the future by the TabletServer hosting the Tablet.
+
+### External Compaction in Action
+
+Below are some examples of log entries and metadata table entries for external
compactions. First, here are some metadata entries for table `2` . You can see
that there are three files of different sizes (file size and number of entries
are stored in the value portion of the metadata table rows with the "file"
column qualifier).
+
+```
+2< file:hdfs://localhost:8020/accumulo/tables/2/default_tablet/A0000047.rf []
12330,99000
+2< file:hdfs://localhost:8020/accumulo/tables/2/default_tablet/F0000048.rf []
1196,1000
+2< file:hdfs://localhost:8020/accumulo/tables/2/default_tablet/F000004j.rf []
1302,1000
+2< last:10000bf4e0a0004 [] localhost:9997
+2< loc:10000bf4e0a0004 [] localhost:9997
+2< srv:compact [] 111
+2< srv:dir [] default_tablet
+2< srv:flush [] 113
+2< srv:lock []
tservers/localhost:9997/zlock#1950397a-b2ca-4685-b70b-67ae3cd578b9#0000000000$10000bf4e0a0004
+2< srv:time [] M1618325648093
+2< ~tab:~pr [] \x00
+```
+
+Below are excerpts from the TabletServer, CompactionCoordinator, Compactor
logs and metadata table. I have merged the logs in time order to make it easier
to see what is happening.
+
+In the logs below the Compactor requested a compaction job from the
Coordinator with an ExternalCompactionId of
`de6afc1d-64ae-4abf-8bce-02ec0a79aa6c`. The Coordinator knew that TabletServer
`localhost:9997` had a Tablet that needed compacting and contacted it to get
the details. The CompactionManager, a component
+running in the TabletServer, returned the information to the Coordinator. The
Coordinator then updates the metadata table (below the logs) for the external
compaction and returns the information to the Compactor:
+
Review comment:
Resolved in 5314ec2bc0215fe75f7f7b866fe3dd26e7cacb87
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]