[
https://issues.apache.org/jira/browse/CASSANDRA-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15165662#comment-15165662
]
Adam Holmberg commented on CASSANDRA-11053:
-------------------------------------------
Just starting on this. There are a lot of formatting changes
([96fb58|https://github.com/apache/cassandra/commit/96fb585574199b84646c1d97cc8dd689f32d4687],
[ce1504b|https://github.com/apache/cassandra/commit/ce1504b78e64597a1a2251ef3ecedf0452609694],
[8553e1|https://github.com/apache/cassandra/commit/8553e1fee4069cec6c65090e9774b68ef0de2e6c])
that refer to the Cythonized driver. Can you help me understand what the issue
was? It should not change return values. I think I'm missing something.
> COPY FROM on large datasets: fix progress report and debug performance
> ----------------------------------------------------------------------
>
> Key: CASSANDRA-11053
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11053
> Project: Cassandra
> Issue Type: Bug
> Components: Tools
> Reporter: Stefania
> Assignee: Stefania
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
> Attachments: copy_from_large_benchmark.txt,
> copy_from_large_benchmark_2.txt, parent_profile.txt, parent_profile_2.txt,
> worker_profiles.txt, worker_profiles_2.txt
>
>
> Running COPY from on a large dataset (20G divided in 20M records) revealed
> two issues:
> * The progress report is incorrect, it is very slow until almost the end of
> the test at which point it catches up extremely quickly.
> * The performance in rows per second is similar to running smaller tests with
> a smaller cluster locally (approx 35,000 rows per second). As a comparison,
> cassandra-stress manages 50,000 rows per second under the same set-up,
> therefore resulting 1.5 times faster.
> See attached file _copy_from_large_benchmark.txt_ for the benchmark details.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)