Re: Local state write throughput

Chris Riccomini Tue, 20 Jan 2015 16:41:21 -0800

Hey Roger,

To add to Jay's comment, if you don't care about getting updates after the
initial bootstrap, you can configure a store with a changelog pointed to
your bootstrap topic. This will cause the SamzaContainer to bootstrap
using the optimized code that Jay described. Just make sure you don't
write to the store (since it would put the mutation back into your
bootstrap stream). This configuration won't allow new updates to come into
the store until the job is restarted. If you use the 'bootstrap stream'
concept, then you continue getting updates after the initial bootstrap.
The 'bootstrap' stream also allows you to have arbitrary logic, which
might be useful for your job--not sure.


Cheers,
Chris

On 1/20/15 4:30 PM, "Jay Kreps" <[email protected]> wrote:

>It's also worth noting that restoring from a changelog *should* be much
>faster than restoring from upstream. The restore case is optimized and
>batches the updates and skips serialization both of which help a ton with
>performance.
>
>-Jay
>
>On Tue, Jan 20, 2015 at 4:19 PM, Chinmay Soman <[email protected]>
>wrote:
>
>> I remember running both RocksDB and LevelDB and it was definitely better
>> (in that 1 test case, it was ~40K vs ~30K random writes/sec) - but I
>> haven't done any exhaustive comparison.
>>
>> Btw, I see that you're using 4 partitions ? Any reason you're not using
>> like >= 128 and running with more containers ?
>>
>> On Tue, Jan 20, 2015 at 4:05 PM, Roger Hoover <[email protected]>
>> wrote:
>>
>> > Thanks, Chris.
>> >
>> > I am not using a changelog for the store because the the bootstrap
>>stream
>> > is a master copy of the data and the job can recover from there.  No
>>need
>> > to write out another copy.  Is this the way you typically do it for
>> > stream/table joins?
>> >
>> > Great to know you that you're looking into the performance issues.  I
>> love
>> > the idea of local state for isolation and predictable throughput but
>>the
>> > current write throughput puts hard limits on the amount of local state
>> that
>> > a container can have without really long initialization/recovery
>>times.
>> >
>> > Is my tests, LevelDB has about the same performance.  Have you noticed
>> that
>> > as well?
>> >
>> > Cheers,
>> >
>> > Roger
>> >
>> > On Tue, Jan 20, 2015 at 9:28 AM, Chris Riccomini <
>> > [email protected]> wrote:
>> >
>> > > Hey Roger,
>> > >
>> > > We did some benchmarking, and discovered very similar performance to
>> what
>> > > you've described. We saw ~40k writes/sec, and ~20 k reads/sec,
>> > > per-container, on a Virident SSD. This was without any changelog.
>>Are
>> you
>> > > using a changelog on the store?
>> > >
>> > > When we attached a changelog to the store, the writes dropped
>> > > significantly (~1000 writes/sec). When we hooked up VisualVM, we saw
>> that
>> > > the container was spending > 99% of its time in
>> > KafkaSystemProducer.send().
>> > >
>> > > We're currently doing two things:
>> > >
>> > > 1. Working with our performance team to understand and tune RocksDB
>> > > properly.
>> > > 2. Upgrading the Kafka producer to use the new Java-based API.
>> > (SAMZA-227)
>> > >
>> > > For (1), it seems like we should be able to get a lot higher
>>throughput
>> > > from RocksDB. Anecdotally, we've heard that RocksDB requires many
>> threads
>> > > in order to max out an SSD, and since Samza is single-threaded, we
>> could
>> > > just be hitting a RocksDB bottleneck. We won't know until we dig
>>into
>> the
>> > > problem (which we started investigating last week). The current
>>plan is
>> > to
>> > > start by benchmarking RocksDB JNI outside of Samza, and see what we
>>can
>> > > get. From there, we'll know our "speed of light", and can try to get
>> > Samza
>> > > as close as possible to it. If RocksDB JNI can't be made to go
>>"fast",
>> > > then we'll have to understand why.
>> > >
>> > > (2) should help with the changelog issue. I believe that the
>>slowness
>> > with
>> > > the changelog is caused because the changelog is using a sync
>>producer
>> to
>> > > send to Kafka, and is blocking when a batch is flushed. In the new
>>API,
>> > > the concept of a "sync" producer is removed. All writes are handled
>>on
>> an
>> > > async writer thread (though we can still guarantee writes are safely
>> > > written before checkpointing, which is what we need).
>> > >
>> > > In short, I agree, it seems slow. We see this behavior, too. We're
>> > digging
>> > > into it.
>> > >
>> > > Cheers,
>> > > Chris
>> > >
>> > > On 1/17/15 12:58 PM, "Roger Hoover" <[email protected]> wrote:
>> > >
>> > > >Michael,
>> > > >
>> > > >Thanks for the response.  I used VisualVM and YourKit and see the
>>CPU
>> is
>> > > >not being used (0.1%).  I took a few thread dumps and see the main
>> > thread
>> > > >blocked on the flush() method inside the KV store.
>> > > >
>> > > >On Sat, Jan 17, 2015 at 7:09 AM, Michael Rose
>><[email protected]
>> >
>> > > >wrote:
>> > > >
>> > > >> Is your process at 100% CPU? I suspect you're spending most of
>>your
>> > > >>time in
>> > > >> JSON deserialization, but profile it and check.
>> > > >>
>> > > >> Michael
>> > > >>
>> > > >> On Friday, January 16, 2015, Roger Hoover
>><[email protected]>
>> > > >>wrote:
>> > > >>
>> > > >> > Hi guys,
>> > > >> >
>> > > >> > I'm testing a job that needs to load 40M records (6GB in Kafka
>>as
>> > > >>JSON)
>> > > >> > from a bootstrap topic.  The topic has 4 partitions and I'm
>> running
>> > > >>the
>> > > >> job
>> > > >> > using the ProcessJobFactory so all four tasks are in one
>> container.
>> > > >> >
>> > > >> > Using RocksDB, it's taking 19 minutes to load all the data
>>which
>> > > >>amounts
>> > > >> to
>> > > >> > 35k records/sec or 5MB/s based on input size.  I ran iostat
>>during
>> > > >>this
>> > > >> > time as see the disk write throughput is 14MB/s.
>> > > >> >
>> > > >> > I didn't tweak any of the storage settings.
>> > > >> >
>> > > >> > A few questions:
>> > > >> > 1) Does this seem low?  I'm running on a Macbook Pro with SSD.
>> > > >> > 2) Do you have any recommendations for improving the load
>>speed?
>> > > >> >
>> > > >> > Thanks,
>> > > >> >
>> > > >> > Roger
>> > > >> >
>> > > >>
>> > >
>> > >
>> >
>>
>>
>>
>> --
>> Thanks and regards
>>
>> Chinmay Soman
>>

Re: Local state write throughput

Reply via email to