[
https://issues.apache.org/jira/browse/CASSANDRA-8225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190945#comment-14190945
]
Jonathan Ellis commented on CASSANDRA-8225:
-------------------------------------------
IMO we should
# Continue to use COPY FROM as the public face of this (no sense in having a
slow and a fast way to do the same thing) but
# Call out to a Java utility that writes sstables and loads them
# Ultimately, we really want to just write the converted data out to the
network directly; creating intermediate sstables is unnecessary. But this can
be a separate ticket.
> Production-capable COPY FROM
> ----------------------------
>
> Key: CASSANDRA-8225
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8225
> Project: Cassandra
> Issue Type: New Feature
> Components: Tools
> Reporter: Jonathan Ellis
> Fix For: 2.1.2
>
>
> Via [~schumacr],
> bq. I pulled down a sourceforge data generator and created a moc file of
> 500,000 rows that had an incrementing sequence number, date, and SSN. I then
> used our COPY command and MySQL's LOAD DATA INFILE to load the file on my
> Mac. Results were:
> {noformat}
> mysql> load data infile '/Users/robin/dev/datagen3.txt' into table p_test
> fields terminated by ',';
> Query OK, 500000 rows affected (2.18 sec)
> {noformat}
> C* 2.1.0 (pre-CASSANDRA-7405)
> {noformat}
> cqlsh:dev> copy p_test from '/Users/robin/dev/datagen3.txt' with
> delimiter=',';
> 500000 rows imported in 16 minutes and 45.485 seconds.
> {noformat}
> Cassandra 2.1.1:
> {noformat}
> cqlsh:dev> copy p_test from '/Users/robin/dev/datagen3.txt' with
> delimiter=',';
> Processed 500000 rows; Write: 4037.46 rows/s
> 500000 rows imported in 2 minutes and 3.058 seconds.
> {noformat}
> [jbellis] 7405 gets us almost an order of magnitude improvement.
> Unfortunately we're still almost 2 orders slower than mysql.
> I don't think we can continue to tell people, "use sstableloader instead."
> The number of users sophisticated enough to use the sstable writers is small
> and (relatively) decreasing as our user base expands.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)