Re: High Bloom Filter FP Ratio
Hi Tyler, I tried what you said and false positives look much more reasonable there. Thanks for looking into this. -Chris - Original Message - From: "Tyler Hobbs" To: user@cassandra.apache.org Sent: Friday, December 19, 2014 1:25:29 PM Subject: Re: High Bloom Filter FP Ratio I took a look at the code where the bloom filter true/false positive counters are updated and notice that the true-positive count isn't being updated on key cache hits: https://issues.apache.org/jira/browse/CASSANDRA-8525. That may explain your ratios. Can you try querying for a few non-existent partition keys in cqlsh with tracing enabled (just run "TRACING ON") and see if you really do get that high of a false-positive ratio? On Fri, Dec 19, 2014 at 9:59 AM, Mark Greene wrote: > > We're seeing similar behavior except our FP ratio is closer to 1.0 (100%). > > We're using Cassandra 2.1.2. > > > Schema > --- > CREATE TABLE contacts.contact ( > id bigint, > property_id int, > created_at bigint, > updated_at bigint, > value blob, > PRIMARY KEY (id, property_id) > ) WITH CLUSTERING ORDER BY (property_id ASC) > *AND bloom_filter_fp_chance = 0.001* > AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' > AND comment = '' > AND compaction = {'min_threshold': '4', 'class': > 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy', > 'max_threshold': '32'} > AND compression = {'sstable_compression': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND dclocal_read_repair_chance = 0.1 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.0 > AND speculative_retry = '99.0PERCENTILE'; > > CF Stats Output: > - > Keyspace: contacts > Read Count: 2458375 > Read Latency: 0.852844076675 ms. > Write Count: 10357 > Write Latency: 0.1816912233272183 ms. > Pending Flushes: 0 > Table: contact > SSTable count: 61 > SSTables in each level: [1, 10, 50, 0, 0, 0, 0, 0, 0] > Space used (live): 9047112471 > Space used (total): 9047112471 > Space used by snapshots (total): 0 > SSTable Compression Ratio: 0.34119240020241487 > Memtable cell count: 24570 > Memtable data size: 1299614 > Memtable switch count: 2 > Local read count: 2458290 > Local read latency: 0.853 ms > Local write count: 10044 > Local write latency: 0.186 ms > Pending flushes: 0 > Bloom filter false positives: 11096 > *Bloom filter false ratio: 0.99197* > Bloom filter space used: 3923784 > Compacted partition minimum bytes: 373 > Compacted partition maximum bytes: 152321 > Compacted partition mean bytes: 9938 > Average live cells per slice (last five minutes): 37.57851240677983 > Maximum live cells per slice (last five minutes): 63.0 > Average tombstones per slice (last five minutes): 0.0 > Maximum tombstones per slice (last five minutes): 0.0 > > -- > about.me <http://about.me/markgreene> > > On Wed, Dec 17, 2014 at 1:32 PM, Chris Hart wrote: >> >> Hi, >> >> I have create the following table with bloom_filter_fp_chance=0.01: >> >> CREATE TABLE logged_event ( >> time_key bigint, >> partition_key_randomizer int, >> resource_uuid timeuuid, >> event_json text, >> event_type text, >> field_error_list map, >> javascript_timestamp timestamp, >> javascript_uuid uuid, >> page_impression_guid uuid, >> page_request_guid uuid, >> server_received_timestamp timestamp, >> session_id bigint, >> PRIMARY KEY ((time_key, partition_key_randomizer), resource_uuid) >> ) WITH >> bloom_filter_fp_chance=0.01 AND >> caching='KEYS_ONLY' AND >> comment='' AND >> dclocal_read_repair_chance=0.00 AND >> gc_grace_seconds=864000 AND >> index_interval=128 AND >> read_repair_chance=0.00 AND >> replicate_on_write='true' AND >> populate_io_cache_on_flush='false' AND >> default_time_to_live=0 AND >> speculative_retry='99.0PERCENTILE' AN
High Bloom Filter FP Ratio
Hi, I have create the following table with bloom_filter_fp_chance=0.01: CREATE TABLE logged_event ( time_key bigint, partition_key_randomizer int, resource_uuid timeuuid, event_json text, event_type text, field_error_list map, javascript_timestamp timestamp, javascript_uuid uuid, page_impression_guid uuid, page_request_guid uuid, server_received_timestamp timestamp, session_id bigint, PRIMARY KEY ((time_key, partition_key_randomizer), resource_uuid) ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND index_interval=128 AND read_repair_chance=0.00 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND speculative_retry='99.0PERCENTILE' AND memtable_flush_period_in_ms=0 AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'LZ4Compressor'}; When I run cfstats, I see a much higher false positive ratio: Table: logged_event SSTable count: 15 Space used (live), bytes: 104128214227 Space used (total), bytes: 104129482871 SSTable Compression Ratio: 0.3295840184239226 Number of keys (estimate): 199293952 Memtable cell count: 56364 Memtable data size, bytes: 20903960 Memtable switch count: 148 Local read count: 1396402 Local read latency: 0.362 ms Local write count: 2345306 Local write latency: 0.062 ms Pending tasks: 0 Bloom filter false positives: 147705 Bloom filter false ratio: 0.49020 Bloom filter space used, bytes: 249129040 Compacted partition minimum bytes: 447 Compacted partition maximum bytes: 315852 Compacted partition mean bytes: 1636 Average live cells per slice (last five minutes): 0.0 Average tombstones per slice (last five minutes): 0.0 Any idea what could be causing this? This is timeseries data. Every time we read from this table, we read a single row key with 1000 partition_key_randomizer values. I'm running cassandra 2.0.11. I tried running an upgradesstables to rewrite them, which didn't change this behavior at all. I'm using size tiered compaction and I haven't done any major compactions. Thanks, Chris
Re: really bad select performance
Thanks for all the help everyone. The values were meant to be binary. I ended making the possible values between 0 and 50 instead of just 0 or 1. That way no single index row gets that wide. I now run queries for everything from 1 to 50 to get 'queued' items and set the value to 0 when I'm done (I will never query for row_loaded = 0). It's unfortunate Cassandra doesn't delegate the query execution to a node that had the index row on it, but rather tries to move the entire index row to the node that is queried. -Chris - Original Message - From: "David Leimbach" To: user@cassandra.apache.org Sent: Monday, April 2, 2012 8:51:46 AM Subject: Re: really bad select performance This is all very hypothetical, but I've been bitten by this before. Does row_loaded happen to be a binary or boolean value? If so the secondary index generated by Cassandra will have at most 2 rows, and they'll be REALLY wide if you have a lot of entries. Since Cassandra doesn't distribute columns over rows, those potentially very wide index rows, and their replicas, must live in SSTables in their entirety on the nodes that own them (and their replicas). Even though you limit 1, I'm not sure what "behind the scenes" things Cassandra does. I've received advice to avoid the built in secondary indexes in Cassandra for some of these reasons. Also if row_loaded is meant to implement some kind of queuing behavior, it could be the wrong problem space for Cassandra as a result of all of the above. On Sat, Mar 31, 2012 at 12:22 PM, aaron morton < aa...@thelastpickle.com > wrote: Is there anything in the logs when you run the queries ? Try turning the logging up to DEBUG on the node that fails to return and see what happens. You will see it send messages to other nodes and do work itself. One thing to note, a query that uses secondary indexes runs on a node for each token range. So it will use more than CL number of nodes. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 30/03/2012, at 11:52 AM, Chris Hart wrote: Hi, I have the following cluster: 136112946768375385385349842972707284580 MountainViewRAC1 Up Normal 1.86 GB 20.00% 0 MountainViewRAC1 Up Normal 2.17 GB 33.33% 56713727820156410577229101238628035242 MountainViewRAC1 Up Normal 2.41 GB 33.33% 113427455640312821154458202477256070485 Rackspace RAC1 Up Normal 3.9 GB 13.33% 136112946768375385385349842972707284580 The following query runs quickly on all nodes except 1 MountainView node: select * from Access_Log where row_loaded = 0 limit 1; There is a secondary index on row_loaded. The query usually doesn't complete (but sometimes does) on the bad node and returns very quickly on all other nodes. I've upping the rpc timeout to a full minute (rpc_timeout_in_ms: 6) in the yaml, but it still often doesn't complete in a minute. It seems just as likely to complete and takes about the same amount of time whether the limit is 1, 100 or 1000. Thanks for any help, Chris
really bad select performance
Hi, I have the following cluster: 136112946768375385385349842972707284580 MountainViewRAC1Up Normal 1.86 GB 20.00% 0 MountainViewRAC1Up Normal 2.17 GB 33.33% 56713727820156410577229101238628035242 MountainViewRAC1Up Normal 2.41 GB 33.33% 113427455640312821154458202477256070485 Rackspace RAC1Up Normal 3.9 GB 13.33% 136112946768375385385349842972707284580 The following query runs quickly on all nodes except 1 MountainView node: select * from Access_Log where row_loaded = 0 limit 1; There is a secondary index on row_loaded. The query usually doesn't complete (but sometimes does) on the bad node and returns very quickly on all other nodes. I've upping the rpc timeout to a full minute (rpc_timeout_in_ms: 6) in the yaml, but it still often doesn't complete in a minute. It seems just as likely to complete and takes about the same amount of time whether the limit is 1, 100 or 1000. Thanks for any help, Chris
Re: internode communication using multiple network interfaces
Thanks. Setting the broadcast address to the external IP address and setting the listen_address to 0.0.0.0 seems to have fixed it. Does that mean that all other nodes, even those on the same local network, will communicate with that node using it's external IP address? It would be much better if nodes on the local network could use the internal IP address and only nodes not on the same network would use the external one. - Original Message - From: "aaron morton" To: user@cassandra.apache.org Sent: Thursday, February 9, 2012 12:42:54 AM Subject: Re: internode communication using multiple network interfaces I have 3 Cassandra nodes in one data center all on the same local network, which needs to replicate from an off site data center. Only 1 of the 3 nodes, called dw01, is externally accessible. If you want to run a multi data centre cluster, all the nodes in both data centers need to be able to connect to each other. When it comes to exposing nodes behind a fire wall broadcast_address can help, see the help in cassandra.yam and https://issues.apache.org/jira/browse/CASSANDRA-2491 Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 9/02/2012, at 9:56 AM, Chris Hart wrote: Hi, I have 3 Cassandra nodes in one data center all on the same local network, which needs to replicate from an off site data center. Only 1 of the 3 nodes, called dw01, is externally accessible. dw01 has 2 network interfaces, one externally accessible and one internal. All 3 nodes talk to each other fine when I set dw01's listen_address to the internal IP address. As soon as I set the listen_address to the external IP address, there is no communication between dw01 and other 2 nodes. The other nodes should be able to send to dw01's external IP address (I can telnet from them to dw01 on port 7000 and 7001 just fine), but dw01 obviously would need to use it's internal network interface to send anything to the other 2 nodes. Is this a setup that is possible with Cassandra? If not, any recommendations on how I could implement this? Thanks, Chris
internode communication using multiple network interfaces
Hi, I have 3 Cassandra nodes in one data center all on the same local network, which needs to replicate from an off site data center. Only 1 of the 3 nodes, called dw01, is externally accessible. dw01 has 2 network interfaces, one externally accessible and one internal. All 3 nodes talk to each other fine when I set dw01's listen_address to the internal IP address. As soon as I set the listen_address to the external IP address, there is no communication between dw01 and other 2 nodes. The other nodes should be able to send to dw01's external IP address (I can telnet from them to dw01 on port 7000 and 7001 just fine), but dw01 obviously would need to use it's internal network interface to send anything to the other 2 nodes. Is this a setup that is possible with Cassandra? If not, any recommendations on how I could implement this? Thanks, Chris