[
https://issues.apache.org/jira/browse/CASSANDRA-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15150253#comment-15150253
]
Stefania commented on CASSANDRA-11053:
--------------------------------------
Here are the latest results:
||MODULE CYTHONIZED||PREPARED STATEMENTS||NUM. WORKER PROCESSES||CHUNK
SIZE||AVERAGE ROWS / SEC||TOTAL TIME||
|DRIVER|YES|7|5,000|97,146|3' 31"|
|DRIVER|YES|8|5,000|103,037|3' 19"|
|DRIVER|YES|9|5,000|104,070|3' 17"|
|DRIVER|YES|10|5,000|*104,498*|3' 16"|
|DRIVER COPYUTIL|YES|7|5,000|89,123|3' 48"|
|DRIVER COPYUTIL|YES|8|5,000|107,897|3' 10"|
|DRIVER COPYUTIL|YES|9|5,000|*109,871*|3' 7"|
|DRIVER COPYUTIL|YES|10|5,000|109,616|3' 8"|
In addition to using separate pipes as mentioned above, I've found one more
optimization and I've calibrated how much data the parent process sends to the
worker processes. Two default parameters have changed: the max ingest rate is
now 150k and the report frequency has changed from 4 times per second to 2.
I've run cqlsh with {{SCHED_BATCH}} CPU scheduling ({{schedtool -B -e
./bin/cqlsh}}) (it helps a little bit, maybe 2-3k rows/second) and I've changed
the clock source from {{xen}} to {{tlc}} (unsure if this helps but it doesn't
hurt).
I would like to repeat the tests on an AWS instance with twice the number of
cores, to see how much we can scale. I've already verified that if we half the
number of cores (by fixing the affinity to only 4 cores) then the throughput
also halves. I'm thinking of testing on C4.4xlarge. So far I've used R3.2xlarge
but we don't need all that memory and so I would like to try a C4 instance
instead.
> COPY FROM on large datasets: fix progress report and debug performance
> ----------------------------------------------------------------------
>
> Key: CASSANDRA-11053
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11053
> Project: Cassandra
> Issue Type: Bug
> Components: Tools
> Reporter: Stefania
> Assignee: Stefania
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
> Attachments: copy_from_large_benchmark.txt,
> copy_from_large_benchmark_2.txt, parent_profile.txt, parent_profile_2.txt,
> worker_profiles.txt, worker_profiles_2.txt
>
>
> Running COPY from on a large dataset (20G divided in 20M records) revealed
> two issues:
> * The progress report is incorrect, it is very slow until almost the end of
> the test at which point it catches up extremely quickly.
> * The performance in rows per second is similar to running smaller tests with
> a smaller cluster locally (approx 35,000 rows per second). As a comparison,
> cassandra-stress manages 50,000 rows per second under the same set-up,
> therefore resulting 1.5 times faster.
> See attached file _copy_from_large_benchmark.txt_ for the benchmark details.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)