To add to what Jonathan and Jack have said...

To get high levels of performance with the python driver you should:

   - prepare your statements once (recent drivers default to Token Aware -
   and will correctly apply it if the statement is prepared).
   - execute asynchronously (up to ~150 futures - tho my [old] benchmarks
   showed smaller numbers worked fine.)
   - use multi-processing (performance leveled off in my [old] benchmark
   when each process consumed ~50% of a CPU)
   - Watch for network bottlenecks.

ml

On Thu, Dec 31, 2015 at 12:30 PM, Jack Krupansky <jack.krupan...@gmail.com>
wrote:

> Make sure the driver is configured for token aware routing, otherwise the
> coordinator node may have to redirect your write, adding a network hop.
>
> To be absolutely clear, Cassandra uses the distributed, parallel model for
> Big Data - lots of multi-threaded clients with lots of nodes. Clusters with
> less than six or eight nodes and using a single, single-threaded client are
> not a representative usage of Cassandra. Replication is presumed as well.
> Anything less than RF=3 is simply not a representative or recommended usage
> of Cassandra. Similarly, writes at less than QUORUM are neither
> representative nor recommended.
>
> CL=ONE has to update the memtable as well, not just the commit log.
> Flushing to sstables occurs once the memtables reach some threshold
> size.See:
>
> http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
>
>
> -- Jack Krupansky
>
> On Thu, Dec 31, 2015 at 11:13 AM, Jonathan Haddad <j...@jonhaddad.com>
> wrote:
>
>> The limitation is on the driver side. Try looking at
>> execute_concurrent_with_args in the cassandra.concurrent module to get
>> parallel writes with prepared statements.
>>
>> https://datastax.github.io/python-driver/api/cassandra/concurrent.html
>>
>> On Wed, Dec 30, 2015 at 11:34 PM Alexandre Beaulne <
>> alexandre.beau...@gmail.com> wrote:
>>
>>> Hi everyone,
>>>
>>> First and foremost thanks to everyone involved with making C* available
>>> to the world, it is a great technology to have access to.
>>>
>>> I'm experimenting with C* for one of our projects and I cannot reproduce
>>> the write speeds C* is lauded for. I would appreciate some guidance as to
>>> what I'm doing wrong.
>>>
>>> *Setup*: I have one, single-threaded, python client (using Datastax's
>>> python driver), writing (no reads) to a C* cluster. All C* nodes are
>>> launched by running the official Docker container. There's a single
>>> keyspace with replication factor of 1 and client is set to consistency
>>> level LOCAL ONE. In that keyspace there is a single table with ~40 columns
>>> of mixed types. Two columns are set as primary key and two more as
>>> clustering columns. The primary key is close to uniformly distributed in
>>> the dataset. The writer is in a tight-loop, building CQL 3 insert
>>> statements one by one and executing them against the C* cluster.
>>>
>>> *Specs*: Cassandra v3.0.1, python-driver v3.0.0, host is CentOS 7 with
>>> 40 cores @ 3GHz and 66Gb of RAM.
>>>
>>> In the course of my experimentation I came up with 7 scenarios trying to
>>> isolate the performance bottleneck:
>>>
>>> *Scenario 1*: the writer simply build the insert statement strings
>>> without doing anything with them.
>>>
>>> Results: sample size: 200002, percentiles (ms): [50] 0.00 - [95] 0.01 -
>>> [99] 0.01 [100] 0.05
>>>
>>> *Scenario 2*: the writer open a TCP socket and send the insert
>>> statement string to a simple reader running on the same host. The reader
>>> then append that insert statement string to a file on disk, mimicking a
>>> commit log of some sort.
>>>
>>> Results: sample size: 200002, percentiles (ms): [50] 0.01 - [95] 0.02 -
>>> [99] 0.03 [100] 63.33
>>>
>>> *Scenario 3*: is identical to scenario 2, but the reader is ran inside
>>> a Docker container, to measure if there is any overhead from running in the
>>> container.
>>>
>>> Results: sample size: 200002, percentiles (ms): [50] 0.01 - [95] 0.01 -
>>> [99] 0.01 [100] 4.45
>>>
>>> * Scenario 4*: the writer asynchronously executes the insert statements
>>> against a single-node C* cluster.
>>>
>>> Results: sample size: 200002, percentiles (ms): [50] 0.07 - [95] 0.15 -
>>> [99] 0.56 [100] 534.09
>>>
>>> *Scenario 5*: the writer synchronously executes the insert statements
>>> against a single-node C* cluster.
>>>
>>> Results: sample size: 200002, percentiles (ms): [50] 1.40 - [95] 1.46 -
>>> [99] 1.54 [100] 41.75
>>>
>>> *Scenario 6*: the writer asynchronously executes the insert statements
>>> against a four-nodes C* cluster.
>>>
>>> Results: sample size: 200002, percentiles (ms): [50] 0.09 - [95] 0.14 -
>>> [99] 0.16 [100] 838.83
>>>
>>> *Scenario 7*: the writer synchronously executes the insert statements
>>> against a four-nodes C* cluster.
>>>
>>> Results: sample size: 200002, percentiles (ms): [50] 1.73 - [95] 1.89 -
>>> [99] 2.15 [100] 50.94
>>>
>>> Looking at scenario 3 & 5, a synchronous write statement to C* is about
>>> 150x slower than appending to a flat file. Now I understand write to a DB
>>> is more involved than appending to a file, but I'm surprised by the
>>> magnitude of the difference. I thought all C* did for writes with
>>> consistency level of 1 was to append the write to its commit log and
>>> return, then distribute the write across the cluster in an eventual
>>> consistency manner. More than 1 ms per write is less than a 1000 writes per
>>> second, far from big data velocity.
>>>
>>> What am I doing wrong? Are writes supposed to be batched before
>>> inserted? Instead of appending rows to the table, would it be more
>>> efficient to append columns to the rows? Why writes are so slow?
>>>
>>> Thanks for your time,
>>> Alex
>>>
>>
>

Reply via email to