[ 
https://issues.apache.org/jira/browse/CASSANDRA-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14659067#comment-14659067
 ] 

Tyler Hobbs commented on CASSANDRA-9302:
----------------------------------------

bq. Has there been discussion anywhere about implementing a loader on the Java 
driver, now that it's bundled with the server?

Yes, there was quite a bit on CASSANDRA-8225.  To summarize, we're making cqlsh 
"good enough" for most cases, and planning on using Spark for everything else.

bq. I hope nobody is surprised that the Python implementation is much slower 
than the C implementation. \[...\] Hopefully we can amortize this with batching 
by partition and/or giving it more processes.

If wide partitions are used, I think this will be okay.  We could perhaps take 
a quick sample of the file to determine if that's the case, and if not, skip 
using TAR with murmur3.

> Optimize cqlsh COPY FROM, part 3
> --------------------------------
>
>                 Key: CASSANDRA-9302
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9302
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Jonathan Ellis
>            Assignee: David Kua
>             Fix For: 2.1.x
>
>
> We've had some discussion moving to Spark CSV import for bulk load in 3.x, 
> but people need a good bulk load tool now.  One option is to add a separate 
> Java bulk load tool (CASSANDRA-9048), but if we can match that performance 
> from cqlsh I would prefer to leave COPY FROM as the preferred option to which 
> we point people, rather than adding more tools that need to be supported 
> indefinitely.
> Previous work on COPY FROM optimization was done in CASSANDRA-7405 and 
> CASSANDRA-8225.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to