[jira] [Commented] (CASSANDRA-9302) Optimize cqlsh COPY FROM, part 3

Stefania (JIRA) Tue, 15 Dec 2015 08:47:27 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058299#comment-15058299
 ]


Stefania commented on CASSANDRA-9302:
-------------------------------------

Thanks Adam. As discussed, here are two possible follow-ups:

* The ingest rate only works correctly if chunk size << ingest rate since we 
still send at least one chunk at a time.

* The 6 seconds improvement noted when I reverted to batching by primary key 
rather than by replica, is caused by a slow lookup in the token map (bisect 
right). The driver TAR only performs one lookup per batch whilst to batch by 
replica we must perform one lookup per record. In order to make batching by 
replica viable, which should be faster in theory, we must optimize the TM 
lookup but this is not easy to do. Provided we have at least one local replica 
this should not be worth it but we may want to revisit this for non local 
clusters if the need arises.

> Optimize cqlsh COPY FROM, part 3
> --------------------------------
>
>                 Key: CASSANDRA-9302
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9302
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Jonathan Ellis
>            Assignee: Stefania
>            Priority: Critical
>             Fix For: 2.1.x
>
>
> We've had some discussion moving to Spark CSV import for bulk load in 3.x, 
> but people need a good bulk load tool now.  One option is to add a separate 
> Java bulk load tool (CASSANDRA-9048), but if we can match that performance 
> from cqlsh I would prefer to leave COPY FROM as the preferred option to which 
> we point people, rather than adding more tools that need to be supported 
> indefinitely.
> Previous work on COPY FROM optimization was done in CASSANDRA-7405 and 
> CASSANDRA-8225.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9302) Optimize cqlsh COPY FROM, part 3

Reply via email to