Re: Local state write throughput

Roger Hoover Sun, 25 Jan 2015 12:31:32 -0800

FYI, for Linux with SSDs, changing the io scheduler to deadline or noop can
make a 500x improvement.  I haven't tried this myself.


http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/hardware.html#_disks

On Tue, Jan 20, 2015 at 9:28 AM, Chris Riccomini <
criccom...@linkedin.com.invalid> wrote:

> Hey Roger,
>
> We did some benchmarking, and discovered very similar performance to what
> you've described. We saw ~40k writes/sec, and ~20 k reads/sec,
> per-container, on a Virident SSD. This was without any changelog. Are you
> using a changelog on the store?
>
> When we attached a changelog to the store, the writes dropped
> significantly (~1000 writes/sec). When we hooked up VisualVM, we saw that
> the container was spending > 99% of its time in KafkaSystemProducer.send().
>
> We're currently doing two things:
>
> 1. Working with our performance team to understand and tune RocksDB
> properly.
> 2. Upgrading the Kafka producer to use the new Java-based API. (SAMZA-227)
>
> For (1), it seems like we should be able to get a lot higher throughput
> from RocksDB. Anecdotally, we've heard that RocksDB requires many threads
> in order to max out an SSD, and since Samza is single-threaded, we could
> just be hitting a RocksDB bottleneck. We won't know until we dig into the
> problem (which we started investigating last week). The current plan is to
> start by benchmarking RocksDB JNI outside of Samza, and see what we can
> get. From there, we'll know our "speed of light", and can try to get Samza
> as close as possible to it. If RocksDB JNI can't be made to go "fast",
> then we'll have to understand why.
>
> (2) should help with the changelog issue. I believe that the slowness with
> the changelog is caused because the changelog is using a sync producer to
> send to Kafka, and is blocking when a batch is flushed. In the new API,
> the concept of a "sync" producer is removed. All writes are handled on an
> async writer thread (though we can still guarantee writes are safely
> written before checkpointing, which is what we need).
>
> In short, I agree, it seems slow. We see this behavior, too. We're digging
> into it.
>
> Cheers,
> Chris
>
> On 1/17/15 12:58 PM, "Roger Hoover" <roger.hoo...@gmail.com> wrote:
>
> >Michael,
> >
> >Thanks for the response.  I used VisualVM and YourKit and see the CPU is
> >not being used (0.1%).  I took a few thread dumps and see the main thread
> >blocked on the flush() method inside the KV store.
> >
> >On Sat, Jan 17, 2015 at 7:09 AM, Michael Rose <elementat...@gmail.com>
> >wrote:
> >
> >> Is your process at 100% CPU? I suspect you're spending most of your
> >>time in
> >> JSON deserialization, but profile it and check.
> >>
> >> Michael
> >>
> >> On Friday, January 16, 2015, Roger Hoover <roger.hoo...@gmail.com>
> >>wrote:
> >>
> >> > Hi guys,
> >> >
> >> > I'm testing a job that needs to load 40M records (6GB in Kafka as
> >>JSON)
> >> > from a bootstrap topic.  The topic has 4 partitions and I'm running
> >>the
> >> job
> >> > using the ProcessJobFactory so all four tasks are in one container.
> >> >
> >> > Using RocksDB, it's taking 19 minutes to load all the data which
> >>amounts
> >> to
> >> > 35k records/sec or 5MB/s based on input size.  I ran iostat during
> >>this
> >> > time as see the disk write throughput is 14MB/s.
> >> >
> >> > I didn't tweak any of the storage settings.
> >> >
> >> > A few questions:
> >> > 1) Does this seem low?  I'm running on a Macbook Pro with SSD.
> >> > 2) Do you have any recommendations for improving the load speed?
> >> >
> >> > Thanks,
> >> >
> >> > Roger
> >> >
> >>
>
>

Re: Local state write throughput

Reply via email to