Re: High Bloom Filter FP Ratio

Mark Greene Fri, 19 Dec 2014 08:00:51 -0800

We're seeing similar behavior except our FP ratio is closer to 1.0 (100%).

We're using Cassandra 2.1.2.



Schema
-----------------------------------------------------------------------
CREATE TABLE contacts.contact (
    id bigint,
    property_id int,
    created_at bigint,
    updated_at bigint,
    value blob,
    PRIMARY KEY (id, property_id)
) WITH CLUSTERING ORDER BY (property_id ASC)
*    AND bloom_filter_fp_chance = 0.001*
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'min_threshold': '4', 'class':
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy',
'max_threshold': '32'}
    AND compression = {'sstable_compression':
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

CF Stats Output:
-------------------------------------------------------------------------
Keyspace: contacts
    Read Count: 2458375
    Read Latency: 0.8528440766766665 ms.
    Write Count: 10357
    Write Latency: 0.1816912233272183 ms.
    Pending Flushes: 0
        Table: contact
        SSTable count: 61
        SSTables in each level: [1, 10, 50, 0, 0, 0, 0, 0, 0]
        Space used (live): 9047112471
        Space used (total): 9047112471
        Space used by snapshots (total): 0
        SSTable Compression Ratio: 0.34119240020241487
        Memtable cell count: 24570
        Memtable data size: 1299614
        Memtable switch count: 2
        Local read count: 2458290
        Local read latency: 0.853 ms
        Local write count: 10044
        Local write latency: 0.186 ms
        Pending flushes: 0
        Bloom filter false positives: 11096
*        Bloom filter false ratio: 0.99197*
        Bloom filter space used: 3923784
        Compacted partition minimum bytes: 373
        Compacted partition maximum bytes: 152321
        Compacted partition mean bytes: 9938
        Average live cells per slice (last five minutes): 37.57851240677983
        Maximum live cells per slice (last five minutes): 63.0
        Average tombstones per slice (last five minutes): 0.0
        Maximum tombstones per slice (last five minutes): 0.0

--
about.me <http://about.me/markgreene>

On Wed, Dec 17, 2014 at 1:32 PM, Chris Hart <ch...@remilon.com> wrote:
>
> Hi,
>
> I have create the following table with bloom_filter_fp_chance=0.01:
>
> CREATE TABLE logged_event (
>   time_key bigint,
>   partition_key_randomizer int,
>   resource_uuid timeuuid,
>   event_json text,
>   event_type text,
>   field_error_list map<text, text>,
>   javascript_timestamp timestamp,
>   javascript_uuid uuid,
>   page_impression_guid uuid,
>   page_request_guid uuid,
>   server_received_timestamp timestamp,
>   session_id bigint,
>   PRIMARY KEY ((time_key, partition_key_randomizer), resource_uuid)
> ) WITH
>   bloom_filter_fp_chance=0.010000 AND
>   caching='KEYS_ONLY' AND
>   comment='' AND
>   dclocal_read_repair_chance=0.000000 AND
>   gc_grace_seconds=864000 AND
>   index_interval=128 AND
>   read_repair_chance=0.000000 AND
>   replicate_on_write='true' AND
>   populate_io_cache_on_flush='false' AND
>   default_time_to_live=0 AND
>   speculative_retry='99.0PERCENTILE' AND
>   memtable_flush_period_in_ms=0 AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'LZ4Compressor'};
>
>
> When I run cfstats, I see a much higher false positive ratio:
>
>                 Table: logged_event
>                 SSTable count: 15
>                 Space used (live), bytes: 104128214227
>                 Space used (total), bytes: 104129482871
>                 SSTable Compression Ratio: 0.3295840184239226
>                 Number of keys (estimate): 199293952
>                 Memtable cell count: 56364
>                 Memtable data size, bytes: 20903960
>                 Memtable switch count: 148
>                 Local read count: 1396402
>                 Local read latency: 0.362 ms
>                 Local write count: 2345306
>                 Local write latency: 0.062 ms
>                 Pending tasks: 0
>                 Bloom filter false positives: 147705
>                 Bloom filter false ratio: 0.49020
>                 Bloom filter space used, bytes: 249129040
>                 Compacted partition minimum bytes: 447
>                 Compacted partition maximum bytes: 315852
>                 Compacted partition mean bytes: 1636
>                 Average live cells per slice (last five minutes): 0.0
>                 Average tombstones per slice (last five minutes): 0.0
>
> Any idea what could be causing this?  This is timeseries data.  Every time
> we read from this table, we read a single row key with 1000
> partition_key_randomizer values.  I'm running cassandra 2.0.11.  I tried
> running an upgradesstables to rewrite them, which didn't change this
> behavior at all.  I'm using size tiered compaction and I haven't done any
> major compactions.
>
> Thanks,
> Chris
>

Re: High Bloom Filter FP Ratio

Reply via email to