Brian Hess created CASSANDRA-9552:
--------------------------------------
Summary: COPY FROM times out after 110000 inserts
Key: CASSANDRA-9552
URL: https://issues.apache.org/jira/browse/CASSANDRA-9552
Project: Cassandra
Issue Type: Improvement
Reporter: Brian Hess
I am trying to test out performance of COPY FROM on various schemas. I have a
100-BIGINT-column table defined as:
{{
CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy',
'replication_factor': '3'} AND durable_writes = true;
CREATE TABLE test.test100 (
pkey bigint, ccol bigint, col0 bigint, col1 bigint, col10
bigint,
col11 bigint, col12 bigint, col13 bigint, col14 bigint, col15
bigint,
col16 bigint, col17 bigint, col18 bigint, col19 bigint, col2
bigint,
col20 bigint, col21 bigint, col22 bigint, col23 bigint, col24
bigint,
col25 bigint, col26 bigint, col27 bigint, col28 bigint, col29
bigint,
col3 bigint, col30 bigint, col31 bigint, col32 bigint, col33
bigint,
col34 bigint, col35 bigint, col36 bigint, col37 bigint, col38
bigint,
col39 bigint, col4 bigint, col40 bigint, col41 bigint, col42
bigint,
col43 bigint, col44 bigint, col45 bigint, col46 bigint, col47
bigint,
col48 bigint, col49 bigint, col5 bigint, col50 bigint, col51
bigint,
col52 bigint, col53 bigint, col54 bigint, col55 bigint, col56
bigint,
col57 bigint, col58 bigint, col59 bigint, col6 bigint, col60
bigint,
col61 bigint, col62 bigint, col63 bigint, col64 bigint, col65
bigint,
col66 bigint, col67 bigint, col68 bigint, col69 bigint, col7
bigint,
col70 bigint, col71 bigint, col72 bigint, col73 bigint, col74
bigint,
col75 bigint, col76 bigint, col77 bigint, col78 bigint, col79
bigint,
col8 bigint, col80 bigint, col81 bigint, col82 bigint, col83
bigint,
col84 bigint, col85 bigint, col86 bigint, col87 bigint, col88
bigint,
col89 bigint, col9 bigint, col90 bigint, col91 bigint, col92
bigint,
col93 bigint, col94 bigint, col95 bigint, col96 bigint, col97
bigint,
PRIMARY KEY (pkey, ccol)
) WITH CLUSTERING ORDER BY (ccol ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32'}
AND compression = {'sstable_compression':
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
}}
I then try to load the linked file of 120,000 rows of 100 BIGINT columns via:
{{
cqlsh -e "COPY
test.test100(pkey,ccol,col0,col1,col2,col3,col4,col5,col6,col7,col8,col9,col10,col11,col12,col13,col14,col15,col16,col17,col18,col19,col20,col21,col22,col23,col24,col25,col26,col27,col28,col29,col30,col31,col32,col33,col34,col35,col36,col37,col38,col39,col40,col41,col42,col43,col44,col45,col46,col47,col48,col49,col50,col51,col52,col53,col54,col55,col56,col57,col58,col59,col60,col61,col62,col63,col64,col65,col66,col67,col68,col69,col70,col71,col72,col73,col74,col75,col76,col77,col78,col79,col80,col81,col82,col83,col84,col85,col86,col87,col88,col89,col90,col91,col92,col93,col94,col95,col96,col97)
FROM 'data120K.csv'"
}}
Data file here:
https://drive.google.com/file/d/0B87-Pevy14fuUVcxemFRcFFtRjQ/view?usp=sharing
After 110000 rows, it errors and hangs:
{{
<stdin>:1:110000 rows; Write: 19848.21 rows/s
Connection heartbeat failure
<stdin>:1:Aborting import at record #1196. Previously inserted records are
still present, and some records after that may be present as well.
}}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)