support in-memory compactions
-----------------------------

                 Key: ACCUMULO-519
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-519
             Project: Accumulo
          Issue Type: Improvement
          Components: tserver
            Reporter: Adam Fuchs
            Assignee: Adam Fuchs


There are several factors that influence how big to make the in-memory write 
buffer (tserver.memory.maps.max) for Accumulo. Two dominant factors that 
conflict with each other are:
# Overall disk I/O depends somewhat on the log of the ratio of tablet size to 
initial file size. Bigger write buffer leads to bigger initial files, and can 
lead to less overall disk I/O.
# Aggregation, versioning, and deleting take place in the iterator tree, which 
only applies during compactions and scans. The in-memory write buffer can 
buffer many versions of a given key, and scans can be slow if compactions are 
infrequent.

One solution would be to run some sort of stepped compaction in-memory, in 
which the iterator tree is applied in some sort of log-structured fashion. We 
can consider the minor compaction to be two pipelined steps: serialization of 
map entries, and writing the serialized form to disk. After we have written the 
serialized form to disk, we can free up the write-ahead logs associated with 
that data.

I propose the following:
# We should buffer the serialized RFile form in-memory instead of writing it to 
disk (call it a micro-compaction).
# We should implement a merging step for merging existing buffered RFiles with 
newly serialized buffers, using the same algorithm that we use for major 
compaction file selection.
# The in-memory buffer should be micro-compacted aggressively (whenever we have 
a thread free, with some minimum allocation of CPU and memory I/O resources to 
this task).
# The current triggers that we use for minor compactions should be used to 
select buffered RFiles from memory and dump them to disk, at which point we can 
drop the write-ahead log references.

Overall this will allow users to keep the initial files generated by minor 
compactions large while alleviating the second concern of buffering too many 
versions of the same key. Two use cases that will benefit greatly for this are 
ACCUMULO-348 (lots of updates to the default tablet info in the !METADATA 
table), and aggregation in which there are a small number of keys. Other 
considerations that also affect this space are:
# RFiles are column-oriented (with locality groups), while the in-memory map is 
only row oriented. Moving to a column-oriented structure sooner would benefit 
some queries.
# RFiles are optimized for sequential access while the in-memory write buffer 
requires lots of random memory access to read a stream of key/value pairs in 
key order.
# RFiles use configurable compression, while the in-memory map only uses 
hierarchical organization. RFiles generally get better compression.
# Currently, writing a column-oriented RFile requires scanning the entire 
in-memory map for each locality group. Bigger in-memory maps can take a long 
time to re-order for minor compaction.
# Memory fragmentation and garbage collection in the JVM are big concerns that 
a lot of work has gone into. We need to be considerate of those factors in 
implementing this change.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to