[jira] [Comment Edited] (CASSANDRA-11053) COPY FROM on large datasets: fix progress report and debug performance

Adam Holmberg (JIRA) Mon, 29 Feb 2016 11:18:07 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172187#comment-15172187
 ]


Adam Holmberg edited comment on CASSANDRA-11053 at 2/29/16 7:17 PM:
--------------------------------------------------------------------

Just a couple other minor comments:

cqlshlib.copyutil.ExportSession.\_\_init\_\_
+
cqlshlib.copyutil.ImportProcess.session:
{code}
    if LibevConnection:
        cluster.connection_class = LibevConnection
{code}
Did you find that the connection class was not defaulting properly when the 
extensions were built? It should take this value automatically if the extension 
is built.

cqlshlib.copyutil.ExportTaskError:
{quote}
An object send from child processes
{quote}
small typo (send-->sent)
--
+1 regardless of these


was (Author: aholmber):
Just a couple other minor comments:

cqlshlib.copyutil.ExportSession.\_\_init\_\_
+
cqlshlib.copyutil.ImportProcess.session:
{code}
    if LibevConnection:
        cluster.connection_class = LibevConnection
{code}
Did you find that the connection class was not defaulting properly when the 
extensions were built? It should take this value automatically if the extension 
is built.

cqlshlib.copyutil.ExportTaskError:
{quote}
An object send from child processes
{quote}
small typo (send-->sent)

> COPY FROM on large datasets: fix progress report and debug performance
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-11053
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11053
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tools
>            Reporter: Stefania
>            Assignee: Stefania
>             Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>         Attachments: copy_from_large_benchmark.txt, 
> copy_from_large_benchmark_2.txt, parent_profile.txt, parent_profile_2.txt, 
> worker_profiles.txt, worker_profiles_2.txt
>
>
> Running COPY from on a large dataset (20G divided in 20M records) revealed 
> two issues:
> * The progress report is incorrect, it is very slow until almost the end of 
> the test at which point it catches up extremely quickly.
> * The performance in rows per second is similar to running smaller tests with 
> a smaller cluster locally (approx 35,000 rows per second). As a comparison, 
> cassandra-stress manages 50,000 rows per second under the same set-up, 
> therefore resulting 1.5 times faster. 
> See attached file _copy_from_large_benchmark.txt_ for the benchmark details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-11053) COPY FROM on large datasets: fix progress report and debug performance

Reply via email to