Re: Drastic increase of bloom filter sizer after upgrading from 2.2.14 to 3.11.4

2019-10-01 Thread Matthias Pfau
delete those bloom filter files and restart cassandra, they are re-created. You can also run a user defined compaction on that sstable to rewrite the bloom filter file. This is exactly how we upgraded: determine which CFs have bigger bloom filters (cfstats) run upgradesstables individually for those

Re: Drastic increase of bloom filter sizer after upgrading from 2.2.14 to 3.11.4

2019-09-10 Thread Matthias Pfau
NVALID: > Hi there, > we just finished upgrading sstables on a single node after upgrading from  > 2.2.14 to 3.11.4. Since then, we noted a drastic increase of off heap memory > consumption. This is due to increased bloom filter size. > > According to cfstats output "Bloo

Drastic increase of bloom filter sizer after upgrading from 2.2.14 to 3.11.4

2019-09-10 Thread Matthias Pfau
Hi there, we just finished upgrading sstables on a single node after upgrading from  2.2.14 to 3.11.4. Since then, we noted a drastic increase of off heap memory consumption. This is due to increased bloom filter size. According to cfstats output "Bloom filter off heap memory used"

Re: Bloom filter false positives high

2019-05-16 Thread Martin Mačura
I've decreased bloom_filter_fp_chance from 0.01 to 0.001. The sstableupgrade took 3 days to complete. And this is a result: node1 Bloom filter false positives: 380965 Bloom filter false ratio: 0.46560 Bloom filter space used: 27.1 MiB

Re: Bloom filter false positives high

2019-04-17 Thread Stefan Miklosovic
an expert in this. > > If you think about this, the whole concept of Bloom filter is to check > if some record is in particular SSTable. False positive mean that, > obviously, filter thought it was there but in fact it is not. So > Cassandra did a look unnecess

Re: Bloom filter false positives high

2019-04-17 Thread Stefan Miklosovic
One thing comes to my mind but my reasoning is questionable as I am not an expert in this. If you think about this, the whole concept of Bloom filter is to check if some record is in particular SSTable. False positive mean that, obviously, filter thought it was there but in fact it is not. So

Re: Bloom filter false positives high

2019-04-17 Thread Martin Mačura
so...@instaclustr.com> wrote: > >> > >> What is your bloom_filter_fp_chance for either table? I guess it is > >> bigger for the first one, bigger that number is between 0 and 1, less > >> memory it will use (17 MiB against 54.9 Mib) which means more false > >&

Re: Bloom filter false positives high

2019-04-17 Thread Stefan Miklosovic
table? I guess it is >> bigger for the first one, bigger that number is between 0 and 1, less >> memory it will use (17 MiB against 54.9 Mib) which means more false >> positives you will get. >> >> On Wed, 17 Apr 2019 at 19:59, Martin Mačura wrote: >> > >>

Re: Bloom filter false positives high

2019-04-17 Thread Martin Mačura
or the first one, bigger that number is between 0 and 1, less > memory it will use (17 MiB against 54.9 Mib) which means more false > positives you will get. > > On Wed, 17 Apr 2019 at 19:59, Martin Mačura wrote: > > > > Hi, > > I have a table with poor bloom filter fal

Re: Bloom filter false positives high

2019-04-17 Thread Stefan Miklosovic
i, > I have a table with poor bloom filter false ratio: >SSTable count: 1223 >Space used (live): 726.58 GiB >Number of partitions (estimate): 8592749 > Bloom filter false positives: 35796352 >Bloom fil

Bloom filter false positives high

2019-04-17 Thread Martin Mačura
Hi, I have a table with poor bloom filter false ratio: SSTable count: 1223 Space used (live): 726.58 GiB Number of partitions (estimate): 8592749 Bloom filter false positives: 35796352 Bloom filter false ratio: 0.68472

Re: Bloom filter memory usage disparity

2016-05-17 Thread Jeff Jirsa
Even with the same data, bloom filter is based on sstables. If your compaction behaves differently on 2 nodes than the third, your bloom filter RAM usage may be different. From: Kai Wang Reply-To: "user@cassandra.apache.org" Date: Tuesday, May 17, 2016 at 8:02 PM

Re: Bloom filter memory usage disparity

2016-05-17 Thread Alain RODRIGUEZ
per node? In any case, we need the data size for the 3 nodes to understand. It might have been a temporary situation, but in this case you would know by now. C*heers, 2016-05-03 18:47 GMT+02:00 Kai Wang <dep...@gmail.com>: > Hi, > > I have a table on 3-node cluster. I notice bloom f

Bloom filter memory usage disparity

2016-05-03 Thread Kai Wang
Hi, I have a table on 3-node cluster. I notice bloom filter memory usage are very different on one of the node. For a given table, I checked CassandraMetricsRegistry$JmxGauge.[table]_BloomFilterOffHeapMemoryUsed.Value. 2 of 3 nodes show 1.5GB while the other shows 2.5 GB. What could

Re: High Bloom filter false ratio

2016-02-23 Thread Jeff Jirsa
;user@cassandra.apache.org" Date: Tuesday, February 23, 2016 at 12:37 AM To: "user@cassandra.apache.org" Subject: Re: High Bloom filter false ratio Looks like that sstablemetadata is available in 2.2 , we are on 2.0.x do you know anything that will work on 2.0.x On Tue, Feb 23, 2016

RE: High Bloom filter false ratio

2016-02-23 Thread SEAN_R_DURITY
I see the sstablemetadata tool as far back as 1.2.19 (in tools/bin). Sean Durity From: Anishek Agarwal [mailto:anis...@gmail.com] Sent: Tuesday, February 23, 2016 3:37 AM To: user@cassandra.apache.org Subject: Re: High Bloom filter false ratio Looks like that sstablemetadata is available in 2.2

Re: High Bloom filter false ratio

2016-02-23 Thread Anishek Agarwal
ould, >> very easily, write a script that gives you a list of sstables that you >> could feed to forceUserDefinedCompaction to join together to eliminate >> leftover waste. >> >> Your long ParNew times may be fixable by increasing the new gen size of >> your

Re: High Bloom filter false ratio

2016-02-23 Thread Anishek Agarwal
ong ParNew times may be fixable by increasing the new gen size of > your heap – the general guidance in cassandra-env.sh is out of date, you > may want to reference CASSANDRA-8150 for “newer” advice ( > http://issues.apache.org/jira/browse/CASSANDRA-8150 ) > > - Jeff > > From:

Re: High Bloom filter false ratio

2016-02-22 Thread Jeff Jirsa
-8150 ) - Jeff From: Anishek Agarwal Reply-To: "user@cassandra.apache.org" Date: Monday, February 22, 2016 at 8:33 PM To: "user@cassandra.apache.org" Subject: Re: High Bloom filter false ratio Hey Jeff, Thanks for the clarification, I did not exp

Re: High Bloom filter false ratio

2016-02-22 Thread Anishek Agarwal
ishek Agarwal > Reply-To: "user@cassandra.apache.org" > Date: Sunday, February 21, 2016 at 11:13 PM > To: "user@cassandra.apache.org" > Subject: Re: High Bloom filter false ratio > > Hey guys, > > Just did some more digging ... looks like DTCS is not rem

Re: High Bloom filter false ratio

2016-02-22 Thread Jeff Jirsa
uot;user@cassandra.apache.org" Date: Sunday, February 21, 2016 at 11:13 PM To: "user@cassandra.apache.org" Subject: Re: High Bloom filter false ratio Hey guys, Just did some more digging ... looks like DTCS is not removing old data completely, I used sstable2json for one such

Re: High Bloom filter false ratio

2016-02-22 Thread Christopher Bradford
gt;>> using STCS). Is it possible to change this to LCS? >>> >>> >>> Number of keys (estimate): 345137664 (345M partition keys) >>> >>> I don't have any suggestion about reducing this unless you partition >>> your data. >>> >>> &

Re: High Bloom filter false ratio

2016-02-21 Thread Anishek Agarwal
using STCS). Is it possible to change this to LCS? >> >> >> Number of keys (estimate): 345137664 (345M partition keys) >> >> I don't have any suggestion about reducing this unless you partition your >> data. >> >> >> Bloom filter space used,

Re: High Bloom filter false ratio

2016-02-21 Thread Anishek Agarwal
uggestion about reducing this unless you partition your > data. > > > Bloom filter space used, bytes: 493777336 (400MB is huge) > > If number of keys are reduced then this will automatically reduce bloom > filter size I believe. > > > > Jaydeep > > On Thu, Feb 18,

Re: High Bloom filter false ratio

2016-02-19 Thread Jaydeep Chovatia
this unless you partition your data. Bloom filter space used, bytes: 493777336 (400MB is huge) If number of keys are reduced then this will automatically reduce bloom filter size I believe. Jaydeep On Thu, Feb 18, 2016 at 7:52 PM, Anishek Agarwal <anis...@gmail.com> wrote: > Hey all, >

Re: High Bloom filter false ratio

2016-02-18 Thread Anishek Agarwal
read latency: 0.048 ms Local write count: 56743898 Local write latency: 0.018 ms Pending tasks: 0 Bloom filter false positives: 40664437 Bloom filter false ratio: 0.69058 Bloom filter space used, bytes: 493777336 Bloom filter off heap memory used, bytes: 493767024 Index summary off heap

Re: High Bloom filter false ratio

2016-02-18 Thread daemeon reiydelle
The bloom filter buckets the values in a small number of buckets. I have been surprised by how many cases I see with large cardinality where a few values populate a given bloom leaf, resulting in high false positives, and a surprising impact on latencies! Are you seeing 2:1 ranges between mean

Re: High Bloom filter false ratio

2016-02-18 Thread Tyler Hobbs
You can try slightly lowering the bloom_filter_fp_chance on your table. Otherwise, it's possible that you're repeatedly querying one or two partitions that always trigger a bloom filter false positive. You could try manually tracing a few queries on this table (for non-existent partitions

High Bloom filter false ratio

2016-02-17 Thread Anishek Agarwal
Hello, We have a table with composite partition key with humungous cardinality, its a combination of (long,long). On the table we have bloom_filter_fp_chance=0.01. On doing "nodetool cfstats" on the 5 nodes we have in the cluster we are seeing "Bloom filter false ratio:"

Re: High Bloom Filter FP Ratio

2014-12-19 Thread Mark Greene
: 10044 Local write latency: 0.186 ms Pending flushes: 0 Bloom filter false positives: 11096 *Bloom filter false ratio: 0.99197* Bloom filter space used: 3923784 Compacted partition minimum bytes: 373 Compacted partition maximum bytes: 152321

Re: High Bloom Filter FP Ratio

2014-12-19 Thread Tyler Hobbs
I took a look at the code where the bloom filter true/false positive counters are updated and notice that the true-positive count isn't being updated on key cache hits: https://issues.apache.org/jira/browse/CASSANDRA-8525. That may explain your ratios. Can you try querying for a few non-existent

Re: High Bloom Filter FP Ratio

2014-12-19 Thread Chris Hart
Hi Tyler, I tried what you said and false positives look much more reasonable there. Thanks for looking into this. -Chris - Original Message - From: Tyler Hobbs ty...@datastax.com To: user@cassandra.apache.org Sent: Friday, December 19, 2014 1:25:29 PM Subject: Re: High Bloom Filter

High Bloom Filter FP Ratio

2014-12-17 Thread Chris Hart
: 148 Local read count: 1396402 Local read latency: 0.362 ms Local write count: 2345306 Local write latency: 0.062 ms Pending tasks: 0 Bloom filter false positives: 147705 Bloom filter

Re: why bloom filter is only for row key?

2014-09-17 Thread Philo Yang
that bloom filter is built on row keys, not on column key. Can anyone tell me what is considered for not building bloom filter on column key? Is it a good idea to offer a table property option between row key and primary key for what boolm filter is built on? Here's the nitty gritty of the process

Re: why bloom filter is only for row key?

2014-09-15 Thread Philo Yang
Thanks DuyHai, I think the trouble of bloom filter on all row keys column names is memory usage. However, if a CF has only hundreds of columns per row, the number of total columns will be much fewer, so the bloom filter is possible for this condition, right? Is there a good way to adjust bloom

Re: why bloom filter is only for row key?

2014-09-15 Thread Robert Coli
On Sun, Sep 14, 2014 at 11:22 AM, Philo Yang ud1...@gmail.com wrote: After reading some docs, I find that bloom filter is built on row keys, not on column key. Can anyone tell me what is considered for not building bloom filter on column key? Is it a good idea to offer a table property option

Re: why bloom filter is only for row key?

2014-09-15 Thread DuyHai Doan
Nice catch Rob On Mon, Sep 15, 2014 at 8:04 PM, Robert Coli rc...@eventbrite.com wrote: On Sun, Sep 14, 2014 at 11:22 AM, Philo Yang ud1...@gmail.com wrote: After reading some docs, I find that bloom filter is built on row keys, not on column key. Can anyone tell me what is considered

why bloom filter is only for row key?

2014-09-14 Thread Philo Yang
Hi all, After reading some docs, I find that bloom filter is built on row keys, not on column key. Can anyone tell me what is considered for not building bloom filter on column key? Is it a good idea to offer a table property option between row key and primary key for what boolm filter is built

Re: why bloom filter is only for row key?

2014-09-14 Thread DuyHai Doan
Hello Philo Building bloom filter for column names (what you call column key) is technically possible but very expensive in term of memory usage. The approximate formula to calculate space required by bloom filter can be found on slide 27 here: http://fr.slideshare.net/quipo/modern-algorithms

Impact of Bloom filter false positive rate

2014-05-30 Thread Thomas GERBET
Hi, I'm currently working on some properties of Bloom filters and this is the first time I use Cassandre, so I'm sorry if my question seems dumb. Basically, I try to see the impact of the false positive rate of Bloom filter on performance. My test case is: 1. I create a table with: create table

Re: bloom filter + suddenly smaller CF

2014-04-14 Thread DuyHai Doan
14, 2014 at 3:44 PM, William Oberman ober...@civicscience.comwrote: I had a thread on this forum about clearing junk from a CF. In my case, it's ~90% of ~1 billion rows. One side effect I had hoped for was a reduction in the size of the bloom filter. But, according to nodetool cfstats, it's

Re: bloom filter + suddenly smaller CF

2014-04-14 Thread Michal Michalski
Oberman ober...@civicscience.com wrote: I had a thread on this forum about clearing junk from a CF. In my case, it's ~90% of ~1 billion rows. One side effect I had hoped for was a reduction in the size of the bloom filter. But, according to nodetool cfstats, it's still fairly large (~1.5GB

bloom filter + suddenly smaller CF

2014-04-14 Thread William Oberman
I had a thread on this forum about clearing junk from a CF. In my case, it's ~90% of ~1 billion rows. One side effect I had hoped for was a reduction in the size of the bloom filter. But, according to nodetool cfstats, it's still fairly large (~1.5GB of RAM). Do bloom filters ever resize

Re: bloom filter + suddenly smaller CF

2014-04-14 Thread William Oberman
ober...@civicscience.com wrote: I had a thread on this forum about clearing junk from a CF. In my case, it's ~90% of ~1 billion rows. One side effect I had hoped for was a reduction in the size of the bloom filter. But, according to nodetool cfstats, it's still fairly large (~1.5GB of RAM

Re: bloom filter + suddenly smaller CF

2014-04-14 Thread Michal Michalski
, William Oberman ober...@civicscience.com wrote: I had a thread on this forum about clearing junk from a CF. In my case, it's ~90% of ~1 billion rows. One side effect I had hoped for was a reduction in the size of the bloom filter. But, according to nodetool cfstats, it's still fairly large

Re: bloom filter + suddenly smaller CF

2014-04-14 Thread Michal Michalski
had a thread on this forum about clearing junk from a CF. In my case, it's ~90% of ~1 billion rows. One side effect I had hoped for was a reduction in the size of the bloom filter. But, according to nodetool cfstats, it's still fairly large (~1.5GB of RAM). Do bloom filters ever resize

Re: bloom filter + suddenly smaller CF

2014-04-14 Thread William Oberman
for was a reduction in the size of the bloom filter. But, according to nodetool cfstats, it's still fairly large (~1.5GB of RAM). Do bloom filters ever resize themselves when the CF suddenly gets smaller? My next test will be restarting one of the instances, though I'll have to wait

Re: bloom filter + suddenly smaller CF

2014-04-14 Thread Mark Reddy
Michalski, michal.michal...@boxever.com On 14 April 2014 14:44, William Oberman ober...@civicscience.comwrote: I had a thread on this forum about clearing junk from a CF. In my case, it's ~90% of ~1 billion rows. One side effect I had hoped for was a reduction in the size of the bloom

Fp chance for column level bloom filter

2013-07-17 Thread Takenori Sato
Hi, I thought memory consumption of column level bloom filter will become a big concern when a row becomes very wide like more than tens of millions of columns. But I read from source(1.0.7) that fp chance for column level bloom filter is hard-coded as 0.160, which is very high. So seems

Re: bloom filter fp ratio of 0.98 with fp_chance of 0.01

2013-03-28 Thread Alain RODRIGUEZ
27, 2013 1:19:06 AM Subject: Re: bloom filter fp ratio of 0.98 with fp_chance of 0.01 Aaron, What version are you using ? 1.1.9 Have you changed the bf_ chance ? The sstables need to be rebuilt for it to take affect. I did ( several times ) and I ran upgradesstables after

Re: bloom filter fp ratio of 0.98 with fp_chance of 0.01

2013-03-28 Thread Hiller, Dean
@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Thursday, March 28, 2013 3:18 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: bloom filter fp ratio of 0.98 with fp_chance of 0.01

Re: bloom filter fp ratio of 0.98 with fp_chance of 0.01

2013-03-27 Thread Andras Szerdahelyi
some light on how an FP chance of 0.01 coexist with a measured FP ratio of .. 0.98 ? Am I reading this wrong or are 98% of the requests hitting the bloom filter create a false positive while the target false ratio is 0.01? ( Also key cache hit ratio is around 0.001 and sstables read is in the skies

Re: bloom filter fp ratio of 0.98 with fp_chance of 0.01

2013-03-27 Thread Wei Zhu
is the related thread for your reference. -Wei - Original Message - From: Andras Szerdahelyi andras.szerdahe...@ignitionone.com To: user@cassandra.apache.org Sent: Wednesday, March 27, 2013 1:19:06 AM Subject: Re: bloom filter fp ratio of 0.98 with fp_chance of 0.01 Aaron, What version

Re: bloom filter fp ratio of 0.98 with fp_chance of 0.01

2013-03-27 Thread aaron morton
: bloom filter fp ratio of 0.98 with fp_chance of 0.01 What version are you using ? 1.2.0 allowed a null bf chance, and I think it returned .1 for LCS and .01 for STS compaction. Have you changed the bf_ chance ? The sstables need to be rebuilt for it to take affect

Re: bloom filter fp ratio of 0.98 with fp_chance of 0.01

2013-03-26 Thread aaron morton
ratio of .. 0.98 ? Am I reading this wrong or are 98% of the requests hitting the bloom filter create a false positive while the target false ratio is 0.01? ( Also key cache hit ratio is around 0.001 and sstables read is in the skies ( non-exponential (non-) drop off for LCS

bloom filter fp ratio of 0.98 with fp_chance of 0.01

2013-03-25 Thread Andras Szerdahelyi
Hello list, Could anyone shed some light on how an FP chance of 0.01 coexist with a measured FP ratio of .. 0.98 ? Am I reading this wrong or are 98% of the requests hitting the bloom filter create a false positive while the target false ratio is 0.01? ( Also key cache hit ratio is around

Re: Changing bloom filter false positive ratio

2012-09-14 Thread aaron morton
I have a hunch that the SSTable selection based on the Min and Max keys in ColumnFamilyStore.markReferenced() means that a higher false positive has less of an impact. it's just a hunch, i've not tested it. Cheers - Aaron Morton Freelance Developer @aaronmorton

Re: Changing bloom filter false positive ratio

2012-09-14 Thread Peter Schuller
I have a hunch that the SSTable selection based on the Min and Max keys in ColumnFamilyStore.markReferenced() means that a higher false positive has less of an impact. it's just a hunch, i've not tested it. For leveled compaction, yes. For non-leveled, I can't see how it would since each

Re: Changing bloom filter false positive ratio

2012-09-13 Thread Eric Czech
Thanks Peter. On Thu, Sep 13, 2012 at 12:52 PM, Peter Schuller peter.schul...@infidyne.com wrote: changing it on some of them. Can I just change that value through the cli and restart or are there any concerns I should have before trying to tweak that parameter? You can change it, you don't

Changing bloom filter false positive ratio

2012-09-12 Thread Eric Czech
Hi everyone, I'm running into heap pressure issues and I seem to have traced the problem to very large bloom filters. The bloom_filter_fp_chance is set to the default value on all my column families but I'd like to try changing it on some of them. Can I just change that value through the cli

Re: OOM opening bloom filter

2012-03-13 Thread aaron morton
Thanks for the update. How much smaller did the BF get to ? A - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 13/03/2012, at 8:24 AM, Mick Semb Wever wrote: It's my understanding then for this use case that bloom filters are of little

Re: OOM opening bloom filter

2012-03-13 Thread Mick Semb Wever
How much smaller did the BF get to ? After pending compactions completed today, i'm presuming fp_ratio is applied now to all sstables in the keyspace, it has gone from 20G+ down to 1G. This node is now running comfortably on Xmx4G (used heap ~1.5G). ~mck -- A Microsoft Certified System

Re: OOM opening bloom filter

2012-03-12 Thread aaron morton
It's my understanding then for this use case that bloom filters are of little importance and that i can Yes. AFAIK there is only one position seek (that will use the bloom filter) at the start of a get_range_slice request. After that the iterators step over the rows in the -Data files

Re: OOM opening bloom filter

2012-03-12 Thread Mick Semb Wever
It's my understanding then for this use case that bloom filters are of little importance and that i can Ok. To summarise our actions to get us out of this situation, in hope that it may help others one day, we did the following actions: 1) upgrade to 1.0.7 2) set fp_ratio=0.99 3)

OOM opening bloom filter

2012-03-11 Thread Mick Semb Wever
) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) This happens with (our normal) -Xmx12g setting. How did this this bloom filter get too big

Re: OOM opening bloom filter

2012-03-11 Thread Peter Schuller
How did this this bloom filter get too big? Bloom filters grow with the amount of row keys you have. It is natural that they grow bigger over time. The question is whether there is something wrong with this node (for example, lots of sstables and disk space used due to compaction not running

Re: OOM opening bloom filter

2012-03-11 Thread Mick Semb Wever
On Sun, 2012-03-11 at 15:06 -0700, Peter Schuller wrote: If it is legitimate use of memory, you *may*, depending on your workload, want to adjust target bloom filter false positive rates: https://issues.apache.org/jira/browse/CASSANDRA-3497 This particular cf has up to ~10 billion rows

Re: OOM opening bloom filter

2012-03-11 Thread Mick Semb Wever
On Sun, 2012-03-11 at 15:36 -0700, Peter Schuller wrote: Are you doing RF=1? That is correct. So are you calculations then :-) very small, 1k. Data from this cf is only read via hadoop jobs in batch reads of 16k rows at a time. [snip] It's my understanding then for this use case that

Re: reported bloom filter FP ratio

2011-12-26 Thread Radim Kolar
Dne 25.12.2011 20:58, Peter Schuller napsal(a): Read Count: 68844 [snip] why reported bloom filter FP ratio is not counted like this 10/68844.0 0.00014525594096798558 Because the read count is total amount of reads to the CF, while the bloom filter is per sstable. The number

Re: reported bloom filter FP ratio

2011-12-26 Thread Peter Schuller
but reported ratio is  Bloom Filter False Ratio: 0.00495 which is higher than my computed ratio 0.000145. If you were true than reported ratio should be lower then mine computed from CF reads because there are more reads to sstables then to CF. The ratio is the ratio of false positives

Re: reported bloom filter FP ratio

2011-12-26 Thread Radim Kolar
0.8 leading to higher memory consumption, i just checked few sstables for index to bloom filter ratio on same dataset. in 0.8 bloom filters are about 13% of index size and in 1.0, its about 16%. Key used in CF is fixed size 4byte integer. Cassandra does not measure memory used by index sampling

Re: reported bloom filter FP ratio

2011-12-26 Thread Peter Schuller
I don't understand how you reached that conclusion. On my nodes most memory is consumed by bloom filters. Also 1.0 creates The point is that just because that's the problem you have, doesn't mean the default is wrong, since it quite clearly depends on use-case. If your relative amounts of rows

reported bloom filter FP ratio

2011-12-25 Thread Radim Kolar
I have following CF Read Count: 68844 Read Latency: 9.942 ms. Write Count: 209712 Write Latency: 0.297 ms. Pending Tasks: 0 Bloom Filter False Postives: 10 Bloom Filter False Ratio

Re: reported bloom filter FP ratio

2011-12-25 Thread Peter Schuller
               Read Count: 68844 [snip] why reported bloom filter FP ratio is not counted like this 10/68844.0 0.00014525594096798558 Because the read count is total amount of reads to the CF, while the bloom filter is per sstable. The number of individual reads to sstables will be higher

Re: how to reduce disk read? (and bloom filter performance)

2011-10-17 Thread Mohit Anchlia
On Sun, Oct 16, 2011 at 2:20 AM, Radim Kolar h...@sendmail.cz wrote: Dne 10.10.2011 18:53, Mohit Anchlia napsal(a): Does it mean you are not updating a row or deleting them? yes. i have 350m rows and only about 100k of them are updated.  Can you look at JMX values of BloomFilter* ? i

Re: how to reduce disk read? (and bloom filter performance)

2011-10-17 Thread Radim Kolar
Look in jconcole - org.apache.cassandra.db - ColumnFamilies bloom filter false ratio is on this server 0.0018 and 0,06% reads hits more than 1 sstable. From cassandra point of view, it looks good.

Re: how to reduce disk read? (and bloom filter performance)

2011-10-16 Thread Radim Kolar
Dne 10.10.2011 18:53, Mohit Anchlia napsal(a): Does it mean you are not updating a row or deleting them? yes. i have 350m rows and only about 100k of them are updated. Can you look at JMX values of BloomFilter* ? i could not find this in jconsole mbeans or in jmx over http in cassandra 1.0

factors on the effectiveness of bloom filter?

2011-10-10 Thread Yang
I noticed that 2 of my CFs are showing very different bloom filter false ratios, one is close to 1.0; the other one is only 0.3 they have roughly the same sizes in SStables and counts, the difference is key construction, the one with 0.3 false ratio has a shorter key. assuming the key can

Re: factors on the effectiveness of bloom filter?

2011-10-10 Thread Radim Kolar
Dne 10.10.2011 18:31, Yang napsal(a): I noticed that 2 of my CFs are showing very different bloom filter false ratios, one is close to 1.0; the other one is only 0.3 cassandra bloom filters are computed for 1% false positive ratio. is there any measure to increase the effectiveness of bloom

Re: how to reduce disk read? (and bloom filter performance)

2011-10-10 Thread Mohit Anchlia
Does it mean you are not updating a row or deleting them? Can you look at JMX values of BloomFilter* ? I don't believe bloom filter false positive % value is configurable. Someone else might be able to throw more light on this. I believe if you want to keep disk seeks to 1 ssTable you will need

Re: how to reduce disk read? (and bloom filter performance)

2011-10-09 Thread Radim Kolar
857 3 56 it means bloom filter failure ratio over 1%. Cassandra in unit tests expects bloom filter false positive less than 1.05%. HBase has configurable bloom filters. You can choose 1% or 0.5% - it can make difference for large cache. But result is that my poor read

Re: how to reduce disk read? (and bloom filter performance)

2011-10-07 Thread Radim Kolar
Dne 16.9.2011 8:20, Yang napsal(a): I looked at the JMX attributes CFS.BloomFilterFalseRatio, it's 1.0 , BloomFilterFalsePositives, it's 2810, its possible to query this bloom filter false ratio from command line?

Re: how to reduce disk read? (and bloom filter performance)

2011-10-07 Thread aaron morton
, Radim Kolar wrote: Dne 16.9.2011 8:20, Yang napsal(a): I looked at the JMX attributes CFS.BloomFilterFalseRatio, it's 1.0 , BloomFilterFalsePositives, it's 2810, its possible to query this bloom filter false ratio from command line?

Re: how to reduce disk read? (and bloom filter performance)

2011-10-07 Thread Radim Kolar
Dne 7.10.2011 10:04, aaron morton napsal(a): Of the top of my head I it's not exposed via nodetool. You can get it via HTTP if you install mx4j or if you could try http://wiki.cyclopsgroup.org/jmxterm i have MX4J/Http but cant find that info in listing. i suspect that bloom filter

Re: how to reduce disk read? (and bloom filter performance)

2011-10-07 Thread Mohit Anchlia
of my head I it's not exposed via nodetool. You can get it via HTTP if you install mx4j or if you could try http://wiki.cyclopsgroup.org/jmxterm i have MX4J/Http but cant find that info in listing. i suspect that bloom filter performance is not so great on my 30GB CFs because one read

Re: how to reduce disk read? (and bloom filter performance)

2011-10-07 Thread Radim Kolar
Dne 7.10.2011 15:55, Mohit Anchlia napsal(a): Check your disk utilization using iostat. Also, check if compactions are causing reads to be slow. Check GC too. You can look at cfhistograms output or post it here. i dont know how to interpret cf historgrams. can you write it to wiki?

Re: how to reduce disk read? (and bloom filter performance)

2011-10-07 Thread Mohit Anchlia
You'll see output like: Offset SSTables 1 8021 2 783 Which means 783 read operations accessed 2 SSTables On Fri, Oct 7, 2011 at 2:03 PM, Radim Kolar h...@sendmail.cz wrote: Dne 7.10.2011 15:55, Mohit Anchlia napsal(a): Check your disk utilization using

how to reduce disk read? (and bloom filter performance)

2011-09-16 Thread Yang
after I put my cassandra cluster on heavy load (1k/s write + 1k/s read ) for 1 day, I accumulated about 30GB of data in sstables. I think the caches have warmed up to their stable state. when I started this, I manually cat all the sstables to /dev/null , so that they are loaded into memory (the

Re: Confused about get_slice SliceRange behavior with bloom filter

2011-02-14 Thread Sylvain Lebresne
SSTable? I had thought that it would use the bloom filter on the row key so that it would only do a seek to SSTables that have a very high probability of containing columns for that row. Yes. In the linked doc above, it seems to say that it is only used for exact column names. Am

Re: Confused about get_slice SliceRange behavior with bloom filter

2011-02-14 Thread Aditya Narayan
/ArchitectureInternals and am hoping someone can help me understand what the io behavior of this operation would be. When I do a get_slice for a column range, will it seek to every SSTable? I had thought that it would use the bloom filter on the row key so that it would only do a seek

Re: Confused about get_slice SliceRange behavior with bloom filter

2011-02-14 Thread Sylvain Lebresne
confused by http://wiki.apache.org/cassandra/ArchitectureInternals and am hoping someone can help me understand what the io behavior of this operation would be. When I do a get_slice for a column range, will it seek to every SSTable? I had thought that it would use the bloom filter

Re: Confused about get_slice SliceRange behavior with bloom filter

2011-02-14 Thread Aditya Narayan
do a get_slice for a column range, will it seek to every SSTable? I had thought that it would use the bloom filter on the row key so that it would only do a seek to SSTables that have a very high probability of containing columns for that row. Yes. In the linked doc above, it seems

Re: Confused about get_slice SliceRange behavior with bloom filter

2011-02-13 Thread Jonathan Ellis
range, will it seek to every SSTable?  I had thought that it would use the bloom filter on the row key so that it would only do a seek to SSTables that have a very high probability of containing columns for that row. Yes. In the linked doc above, it seems to say that it is only used for exact

Re: Confused about get_slice SliceRange behavior with bloom filter

2011-02-13 Thread Aditya Narayan
/ArchitectureInternals and am hoping someone can help me understand what the io behavior of this operation would be. When I do a get_slice for a column range, will it seek to every SSTable? I had thought that it would use the bloom filter on the row key so that it would only do a seek to SSTables

Re: Confused about get_slice SliceRange behavior with bloom filter

2011-02-13 Thread aaron morton
? I had thought that it would use the bloom filter on the row key so that it would only do a seek to SSTables that have a very high probability of containing columns for that row. Yes. In the linked doc above, it seems to say that it is only used for exact column names. Am I

Confused about get_slice SliceRange behavior with bloom filter

2011-02-12 Thread E S
the bloom filter on the row key so that it would only do a seek to SSTables that have a very high probability of containing columns for that row. In the linked doc above, it seems to say that it is only used for exact column names. Am I misunderstanding this? On a related note, if instead

Bloom filter

2011-01-13 Thread Carlos Sanchez
All, Could someone tell me where (what classes) or what library is Cassandra using for its bloom filters? Thanks Carlos This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged

Re: Bloom filter

2011-01-13 Thread Chris Burroughs
On 01/13/2011 04:07 PM, Carlos Sanchez wrote: Could someone tell me where (what classes) or what library is Cassandra using for its bloom filters? src/java/org/apache/cassandra/utils/BloomFilter.java

Re: bloom filter

2010-05-07 Thread David Strauss
On 2010-05-07 10:51, vineet daniel wrote: what is the benefit of creating bloom filter when cassandra writes data, how does it helps ? http://wiki.apache.org/cassandra/ArchitectureOverview -- David Strauss | da...@fourkitchens.com Four Kitchens | http://fourkitchens.com | +1 512 454

Re: bloom filter

2010-05-07 Thread Peter Schüller
what is the benefit of creating bloom filter when cassandra writes data, how does it helps ? It allows Cassandra to answer requests for non-existent keys without going to disk, except in cases where the bloom filter gives a false positive. See: http://spyced.blogspot.com/2009/01/all-you-ever

  1   2   >