support in-memory compactions
-----------------------------
Key: ACCUMULO-519
URL: https://issues.apache.org/jira/browse/ACCUMULO-519
Project: Accumulo
Issue Type: Improvement
Components: tserver
Reporter: Adam Fuchs
Assignee: Adam Fuchs
There are several factors that influence how big to make the in-memory write
buffer (tserver.memory.maps.max) for Accumulo. Two dominant factors that
conflict with each other are:
# Overall disk I/O depends somewhat on the log of the ratio of tablet size to
initial file size. Bigger write buffer leads to bigger initial files, and can
lead to less overall disk I/O.
# Aggregation, versioning, and deleting take place in the iterator tree, which
only applies during compactions and scans. The in-memory write buffer can
buffer many versions of a given key, and scans can be slow if compactions are
infrequent.
One solution would be to run some sort of stepped compaction in-memory, in
which the iterator tree is applied in some sort of log-structured fashion. We
can consider the minor compaction to be two pipelined steps: serialization of
map entries, and writing the serialized form to disk. After we have written the
serialized form to disk, we can free up the write-ahead logs associated with
that data.
I propose the following:
# We should buffer the serialized RFile form in-memory instead of writing it to
disk (call it a micro-compaction).
# We should implement a merging step for merging existing buffered RFiles with
newly serialized buffers, using the same algorithm that we use for major
compaction file selection.
# The in-memory buffer should be micro-compacted aggressively (whenever we have
a thread free, with some minimum allocation of CPU and memory I/O resources to
this task).
# The current triggers that we use for minor compactions should be used to
select buffered RFiles from memory and dump them to disk, at which point we can
drop the write-ahead log references.
Overall this will allow users to keep the initial files generated by minor
compactions large while alleviating the second concern of buffering too many
versions of the same key. Two use cases that will benefit greatly for this are
ACCUMULO-348 (lots of updates to the default tablet info in the !METADATA
table), and aggregation in which there are a small number of keys. Other
considerations that also affect this space are:
# RFiles are column-oriented (with locality groups), while the in-memory map is
only row oriented. Moving to a column-oriented structure sooner would benefit
some queries.
# RFiles are optimized for sequential access while the in-memory write buffer
requires lots of random memory access to read a stream of key/value pairs in
key order.
# RFiles use configurable compression, while the in-memory map only uses
hierarchical organization. RFiles generally get better compression.
# Currently, writing a column-oriented RFile requires scanning the entire
in-memory map for each locality group. Bigger in-memory maps can take a long
time to re-order for minor compaction.
# Memory fragmentation and garbage collection in the JVM are big concerns that
a lot of work has gone into. We need to be considerate of those factors in
implementing this change.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira