[jira] [Commented] (CASSANDRA-9302) Optimize cqlsh COPY FROM, part 3

Adam Holmberg (JIRA) Mon, 14 Dec 2015 12:37:00 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15056669#comment-15056669
 ]


Adam Holmberg commented on CASSANDRA-9302:
------------------------------------------

Not a huge concern, but I noticed the {{INGESTRATE}} doesn't result in a very 
accurate real rate:
{code}
cassandra@cqlsh> COPY test.t from '2.1.csv' WITH HEADER = false AND 
REPORTFREQUENCY = '1' and CHUNKSIZE = '1' and INGESTRATE = '1';
...
Processed 7000 rows; Written: 4759.046292 rows/ss
7000 rows imported in 5.543 seconds.
cassandra@cqlsh> COPY test.t from '2.1.csv' WITH HEADER = false AND 
REPORTFREQUENCY = '1' and CHUNKSIZE = '1' and INGESTRATE = '100';
...
Processed 7000 rows; Written: 5297.179821 rows/ss
7000 rows imported in 3.972 seconds.
{code}

Am I misunderstanding its use?

Another minor observation: We sometimes get a double 's' in the rate message if 
the reported rate changes significant digits during the copy. We may want to 
use a fixed precision format for that.

> Optimize cqlsh COPY FROM, part 3
> --------------------------------
>
>                 Key: CASSANDRA-9302
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9302
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Jonathan Ellis
>            Assignee: Stefania
>            Priority: Critical
>             Fix For: 2.1.x
>
>
> We've had some discussion moving to Spark CSV import for bulk load in 3.x, 
> but people need a good bulk load tool now.  One option is to add a separate 
> Java bulk load tool (CASSANDRA-9048), but if we can match that performance 
> from cqlsh I would prefer to leave COPY FROM as the preferred option to which 
> we point people, rather than adding more tools that need to be supported 
> indefinitely.
> Previous work on COPY FROM optimization was done in CASSANDRA-7405 and 
> CASSANDRA-8225.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9302) Optimize cqlsh COPY FROM, part 3

Reply via email to