We're seeing similar behavior except our FP ratio is closer to 1.0 (100%). We're using Cassandra 2.1.2.
Schema ----------------------------------------------------------------------- CREATE TABLE contacts.contact ( id bigint, property_id int, created_at bigint, updated_at bigint, value blob, PRIMARY KEY (id, property_id) ) WITH CLUSTERING ORDER BY (property_id ASC) * AND bloom_filter_fp_chance = 0.001* AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' AND comment = '' AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy', 'max_threshold': '32'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99.0PERCENTILE'; CF Stats Output: ------------------------------------------------------------------------- Keyspace: contacts Read Count: 2458375 Read Latency: 0.8528440766766665 ms. Write Count: 10357 Write Latency: 0.1816912233272183 ms. Pending Flushes: 0 Table: contact SSTable count: 61 SSTables in each level: [1, 10, 50, 0, 0, 0, 0, 0, 0] Space used (live): 9047112471 Space used (total): 9047112471 Space used by snapshots (total): 0 SSTable Compression Ratio: 0.34119240020241487 Memtable cell count: 24570 Memtable data size: 1299614 Memtable switch count: 2 Local read count: 2458290 Local read latency: 0.853 ms Local write count: 10044 Local write latency: 0.186 ms Pending flushes: 0 Bloom filter false positives: 11096 * Bloom filter false ratio: 0.99197* Bloom filter space used: 3923784 Compacted partition minimum bytes: 373 Compacted partition maximum bytes: 152321 Compacted partition mean bytes: 9938 Average live cells per slice (last five minutes): 37.57851240677983 Maximum live cells per slice (last five minutes): 63.0 Average tombstones per slice (last five minutes): 0.0 Maximum tombstones per slice (last five minutes): 0.0 -- about.me <http://about.me/markgreene> On Wed, Dec 17, 2014 at 1:32 PM, Chris Hart <ch...@remilon.com> wrote: > > Hi, > > I have create the following table with bloom_filter_fp_chance=0.01: > > CREATE TABLE logged_event ( > time_key bigint, > partition_key_randomizer int, > resource_uuid timeuuid, > event_json text, > event_type text, > field_error_list map<text, text>, > javascript_timestamp timestamp, > javascript_uuid uuid, > page_impression_guid uuid, > page_request_guid uuid, > server_received_timestamp timestamp, > session_id bigint, > PRIMARY KEY ((time_key, partition_key_randomizer), resource_uuid) > ) WITH > bloom_filter_fp_chance=0.010000 AND > caching='KEYS_ONLY' AND > comment='' AND > dclocal_read_repair_chance=0.000000 AND > gc_grace_seconds=864000 AND > index_interval=128 AND > read_repair_chance=0.000000 AND > replicate_on_write='true' AND > populate_io_cache_on_flush='false' AND > default_time_to_live=0 AND > speculative_retry='99.0PERCENTILE' AND > memtable_flush_period_in_ms=0 AND > compaction={'class': 'SizeTieredCompactionStrategy'} AND > compression={'sstable_compression': 'LZ4Compressor'}; > > > When I run cfstats, I see a much higher false positive ratio: > > Table: logged_event > SSTable count: 15 > Space used (live), bytes: 104128214227 > Space used (total), bytes: 104129482871 > SSTable Compression Ratio: 0.3295840184239226 > Number of keys (estimate): 199293952 > Memtable cell count: 56364 > Memtable data size, bytes: 20903960 > Memtable switch count: 148 > Local read count: 1396402 > Local read latency: 0.362 ms > Local write count: 2345306 > Local write latency: 0.062 ms > Pending tasks: 0 > Bloom filter false positives: 147705 > Bloom filter false ratio: 0.49020 > Bloom filter space used, bytes: 249129040 > Compacted partition minimum bytes: 447 > Compacted partition maximum bytes: 315852 > Compacted partition mean bytes: 1636 > Average live cells per slice (last five minutes): 0.0 > Average tombstones per slice (last five minutes): 0.0 > > Any idea what could be causing this? This is timeseries data. Every time > we read from this table, we read a single row key with 1000 > partition_key_randomizer values. I'm running cassandra 2.0.11. I tried > running an upgradesstables to rewrite them, which didn't change this > behavior at all. I'm using size tiered compaction and I haven't done any > major compactions. > > Thanks, > Chris >