[
https://issues.apache.org/jira/browse/ACCUMULO-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Josh Elser updated ACCUMULO-3248:
---------------------------------
Fix Version/s: (was: 1.7.0)
1.8.0
> Document in memory map sizing guidelines
> ----------------------------------------
>
> Key: ACCUMULO-3248
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3248
> Project: Accumulo
> Issue Type: Improvement
> Components: docs
> Reporter: Sean Busbey
> Fix For: 1.8.0
>
>
> From [~ecn]'s comments on ACCUMULO-3246
> {quote}
> A bigger IMM will still be used. It just doesn't help for long-running ingest
> (which is the world I live in).
> Let's say you have 10G to ingest, 1G / unit time, and a 1G IMM.
> At .5 G, the IMM starts minor compacting. It can write out that .5G at about
> the same speed as the WAL can accept the next .5G.
> So, by the time the first .5G is done writing, we can start writing the next
> .5G.
> Doubling the IMM just moves the bar from .5G chunks to 1G chunks. Both of
> these are large enough to take advantage of compression and write buffer
> sizes.
> You can argue that you will do fewer major compactions, and that's true. But
> these also occur in the background, and don't affect query/ingest except that
> they consume resources, create disk contention and invalidate blocks/buffers.
> Bigger flushes will require longer major compactions when they finally
> happen, so there's no win.
> So, the IMM for each actively ingesting tablet should be ~ HDFS block size.
> More IMM will be used, and will give you some big numbers on initial ingest,
> but sustained ingest will not improve.
> Because aggregation/combiners run only at compaction time, a larger IMM may
> actually hurt performance.
> {quote}
> We should roll these into the ref guide.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)