What are the characteristics of the data you are writing?  Does each client
generate data that spreads across the cluster?

What version of Accumulo are you using?  1.5 has two walog improvements
that should help as a cluster grows.  It has group commit and writes to
logs in parallel.  In 1.4 when a batch of data comes in from a client, the
walog is locked and then that data is written to the two logs serially.

On Fri, Feb 24, 2012 at 2:35 PM, Aaron Cordova <aa...@cordovas.org> wrote:

> In my experience with Accumulo on EC2, I've seen about an 85% increase in
> aggregate write rate each time the size of the cluster is doubled. I've
> tried to capture that behavior in a model to help myself understand it.
>
> The model I came up with is the following:
>
> where
> w: aggregate write rate (writes per second)
> m: number of machines
> k: standalone single server performance (in my experience about 30k writes
> per second on average)
>
> the units of k and w are writes per second
>
> for those of you without the ability to see graphics in email, the model
> is:
>  w = m * pow(0.85, log(m, 2)) * k
>
> First of all, my algebra may be rusty, so it may be possible to simplify
> the model ... second, does the model make sense?
>

Reply via email to