[
https://issues.apache.org/jira/browse/CASSANDRA-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Brent Haines updated CASSANDRA-10084:
-------------------------------------
Attachment: node3.txt
node2.txt
node1.txt
stack dumps for 3 nodes who are processing the slow streaming query.
> Very slow performance streaming a large query from a single CF
> --------------------------------------------------------------
>
> Key: CASSANDRA-10084
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10084
> Project: Cassandra
> Issue Type: Bug
> Environment: Cassandra 2.1.8
> 12GB EC2 instance
> 12 node cluster
> 32 concurrent reads
> 32 concurrent writes
> 6GB heap space
> Reporter: Brent Haines
> Attachments: cassandra.yaml, node1.txt, node2.txt, node3.txt
>
>
> We have a relatively simple column family that we use to track event data
> from different providers. We have been utilizing it for some time. Here is
> what it looks like:
> {code}
> CREATE TABLE data.stories_by_text (
> ref_id timeuuid,
> second_type text,
> second_value text,
> object_type text,
> field_name text,
> value text,
> story_id timeuuid,
> data map<text, text>,
> PRIMARY KEY ((ref_id, second_type, second_value, object_type,
> field_name), value, story_id)
> ) WITH CLUSTERING ORDER BY (value ASC, story_id ASC)
> AND bloom_filter_fp_chance = 0.01
> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
> AND comment = 'Searchable fields and actions in a story are indexed by
> ref id which corresponds to a brand, app, app instance, or user.'
> AND compaction = {'min_threshold': '4', 'cold_reads_to_omit': '0.0',
> 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32'}
> AND compression = {'sstable_compression':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99.0PERCENTILE';
> {code}
> We will, on a daily basis pull a query of the complete data for a given
> index, it will look like this:
> {code}
> select * from stories_by_text where ref_id =
> f0124740-2f5a-11e5-a113-03cdf3f3c6dc and second_type = 'Day' and second_value
> = '20150812' and object_type = 'booshaka:user' and field_name = 'hashedEmail';
> {code}
> In the past, we have been able to pull millions of records out of the CF in a
> few seconds. We recently added the data column so that we could filter on
> event data and provide more detailed analysis of activity for our reports.
> The data map, declared with 'data map<text, text>' is very small; only 2 or 3
> name/value pairs.
> Since we have added this column, our streaming query performance has gone
> straight to hell. I just ran the above query and it took 46 minutes to read
> 86K rows and then it timed out.
> I am uncertain what other data you need to see in order to diagnose this. We
> are using STCS and are considering a change to Leveled Compaction. The table
> is repaired nightly and the updates, which are at a very fast clip will only
> impact the partition key for today, while the queries are for previous days
> only.
> To my knowledge these queries no longer finish ever. They time out, even
> though I put a 60 second timeout on the read for the cluster. I can watch it
> pause for 30 to 50 seconds many times during the stream.
> Again, this only started happening when we added the data column.
> Please let me know what else you need for this. It is having a very big
> impact on our system.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)