Re: High Bloom filter false ratio

Chris Lohfink Fri, 19 Feb 2016 08:51:08 -0800

>
> SSTable count: 1289


Thats seriously wrong and pretty horrific if this table is using
size tiered compaction. Is compaction not keeping up or hung? May be whats
affecting your BF FP ratio as well.

On Thu, Feb 18, 2016 at 9:52 PM, Anishek Agarwal <anis...@gmail.com> wrote:

> Hey all,
>
> @Jaydeep here is the cfstats output from one node.
>
> Read Count: 1721134722
>
> Read Latency: 0.04268825050756254 ms.
>
> Write Count: 56743880
>
> Write Latency: 0.014650376727851532 ms.
>
> Pending Tasks: 0
>
> Table: user_stay_points
>
> SSTable count: 1289
>
> Space used (live), bytes: 122141272262
>
> Space used (total), bytes: 224227850870
>
> Off heap memory used (total), bytes: 653827528
>
> SSTable Compression Ratio: 0.4959736121441446
>
> Number of keys (estimate): 345137664
>
> Memtable cell count: 339034
>
> Memtable data size, bytes: 106558314
>
> Memtable switch count: 3266
>
> Local read count: 1721134803
>
> Local read latency: 0.048 ms
>
> Local write count: 56743898
>
> Local write latency: 0.018 ms
>
> Pending tasks: 0
>
> Bloom filter false positives: 40664437
>
> Bloom filter false ratio: 0.69058
>
> Bloom filter space used, bytes: 493777336
>
> Bloom filter off heap memory used, bytes: 493767024
>
> Index summary off heap memory used, bytes: 91677192
>
> Compression metadata off heap memory used, bytes: 68383312
>
> Compacted partition minimum bytes: 104
>
> Compacted partition maximum bytes: 1629722
>
> Compacted partition mean bytes: 1773
>
> Average live cells per slice (last five minutes): 0.0
>
> Average tombstones per slice (last five minutes): 0.0
>
>
> @Tyler Hobbs
>
> we are using cassandra 2.0.15 so
> https://issues.apache.org/jira/browse/CASSANDRA-8525  shouldnt occur.
> Other problems looks like will be fixed in 3.0 .. we will mostly try and
> slot in an upgrade to 3.x version towards second quarter of this year.
>
>
> @Daemon
>
> Latencies seem to have higher ratios, attached is the graph.
>
>
> I am mostly trying to look at Bloom filters, because the way we do reads,
> we read data with non existent partition keys and it seems to be taking
> long to respond, like for 720 queries it takes 2 seconds, with all 721
> queries not returning anything. the 720 queries are done in sequence of
> 180 queries each with 180 of them running in parallel.
>
>
> thanks
>
> anishek
>
>
>
> On Fri, Feb 19, 2016 at 3:09 AM, Jaydeep Chovatia <
> chovatia.jayd...@gmail.com> wrote:
>
>> How many partition keys exists for the table which shows this problem (or
>> provide nodetool cfstats for that table)?
>>
>> On Thu, Feb 18, 2016 at 11:38 AM, daemeon reiydelle <daeme...@gmail.com>
>> wrote:
>>
>>> The bloom filter buckets the values in a small number of buckets. I have
>>> been surprised by how many cases I see with large cardinality where a few
>>> values populate a given bloom leaf, resulting in high false positives, and
>>> a surprising impact on latencies!
>>>
>>> Are you seeing 2:1 ranges between mean and worse case latencies
>>> (allowing for gc times)?
>>>
>>> Daemeon Reiydelle
>>> On Feb 18, 2016 8:57 AM, "Tyler Hobbs" <ty...@datastax.com> wrote:
>>>
>>>> You can try slightly lowering the bloom_filter_fp_chance on your table.
>>>>
>>>> Otherwise, it's possible that you're repeatedly querying one or two
>>>> partitions that always trigger a bloom filter false positive.  You could
>>>> try manually tracing a few queries on this table (for non-existent
>>>> partitions) to see if the bloom filter rejects them.
>>>>
>>>> Depending on your Cassandra version, your false positive ratio could be
>>>> inaccurate: https://issues.apache.org/jira/browse/CASSANDRA-8525
>>>>
>>>> There are also a couple of recent improvements to bloom filters:
>>>> * https://issues.apache.org/jira/browse/CASSANDRA-8413
>>>> * https://issues.apache.org/jira/browse/CASSANDRA-9167
>>>>
>>>>
>>>> On Thu, Feb 18, 2016 at 1:35 AM, Anishek Agarwal <anis...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> We have a table with composite partition key with humungous
>>>>> cardinality, its a combination of (long,long). On the table we have
>>>>> bloom_filter_fp_chance=0.010000.
>>>>>
>>>>> On doing "nodetool cfstats" on the 5 nodes we have in the cluster we
>>>>> are seeing  "Bloom filter false ratio:" in the range of 0.7 -0.9.
>>>>>
>>>>> I thought over time the bloom filter would adjust to the key space
>>>>> cardinality, we have been running the cluster for a long time now but have
>>>>> added significant traffic from Jan this year, which would not lead to
>>>>> writes in the db but would lead to high reads to see if are any values.
>>>>>
>>>>> Are there any settings that can be changed to allow better ratio.
>>>>>
>>>>> Thanks
>>>>> Anishek
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Tyler Hobbs
>>>> DataStax <http://datastax.com/>
>>>>
>>>
>>
>

Re: High Bloom filter false ratio

Reply via email to