[
https://issues.apache.org/jira/browse/CASSANDRA-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201120#comment-15201120
]
Stefania commented on CASSANDRA-11053:
--------------------------------------
[~aholmber] the patch is ready, are you still available to review it?
||2.1||2.2||2.2 win||3.0||3.5||trunk||
|[patch|https://github.com/stef1927/cassandra/commits/11053-2.1]|[patch|https://github.com/stef1927/cassandra/commits/11053-2.2]|
|[patch|https://github.com/stef1927/cassandra/commits/11053-3.0]|[patch|https://github.com/stef1927/cassandra/commits/11053-3.5]|[patch|https://github.com/stef1927/cassandra/commits/11053]|
|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11053-2.1-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11053-2.2-dtest/]|[win
dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11053-2.2-windows-dtest_win32/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11053-3.0-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11053-3.5-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11053-dtest/]|
The 2.1 patch merges cleanly up to 3.5, then there is a simple conflict into
trunk.
The issue reported above by [~jjordan] was caused by the fact that the machine
has only one core. There was a typo that caused the number of worker processes
to be zero. This was easy to fix. However, I then introduced a bulk copy test
by simulating a single core machine, see [this pull
request|https://github.com/riptano/cassandra-dtest/pull/869], and this
highlighted a more serious deadlock in COPY TO. To fix this I had to introduce
a new thread in the COPY TO worker processes.
Incidentally, this bug means that the performance measurements taken above were
running 1 worker process less than indicated.
> COPY FROM on large datasets: fix progress report and debug performance
> ----------------------------------------------------------------------
>
> Key: CASSANDRA-11053
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11053
> Project: Cassandra
> Issue Type: Bug
> Components: Tools
> Reporter: Stefania
> Assignee: Stefania
> Labels: doc-impacting
> Fix For: 2.1.14, 2.2.6, 3.0.5, 3.5
>
> Attachments: copy_from_large_benchmark.txt,
> copy_from_large_benchmark_2.txt, parent_profile.txt, parent_profile_2.txt,
> worker_profiles.txt, worker_profiles_2.txt
>
>
> h5. Description
> Running COPY from on a large dataset (20G divided in 20M records) revealed
> two issues:
> * The progress report is incorrect, it is very slow until almost the end of
> the test at which point it catches up extremely quickly.
> * The performance in rows per second is similar to running smaller tests with
> a smaller cluster locally (approx 35,000 rows per second). As a comparison,
> cassandra-stress manages 50,000 rows per second under the same set-up,
> therefore resulting 1.5 times faster.
> See attached file _copy_from_large_benchmark.txt_ for the benchmark details.
> h5. Doc-impacting changes to COPY FROM options
> * A new option was added: PREPAREDSTATEMENTS - it indicates if prepared
> statements should be used; it defaults to true.
> * The default value of CHUNKSIZE changed from 1000 to 5000.
> * The default value of MINBATCHSIZE changed from 2 to 10.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)