"java.io.IOError: java.io.EOFException: EOF after 13889 bytes out of 460861" occured when I query from a table
Hi, all I hava a problem. I create a table named "tblA" in c* and create a materialized view name viewA on tblA. I run spark job to processing data from 'viewA'. In the beginning, it works well. But in the next day, the spark job failed. And when I select data from the 'viewA' and 'tblA' using cql, it throw the follwing exception. query from viewA: "ServerError: " and query from tblA: "ServerError: " My system version is : Cassandra 3.7 + spark1.6.2 + Spark Cassandra Connector 1.6 If anyone know about this problem? Look forward to your reply. Thanks
Re: question on an article
Hi Peter, Thanks for sending this over. I dont know how 100 Bytes (10 bytes of data * 10 columns) can represent anything useful? These days it is better to benchmark things around 1KB. Thanks! On Mon, Oct 31, 2016 at 4:58 PM, Peter Reillywrote: > The original article > http://techblog.netflix.com/2011/11/benchmarking- > cassandra-scalability-on.html > > > On Mon, Oct 31, 2016 at 5:57 PM, Peter Reilly > wrote: > >> From the article: >> java -jar stress.jar -d "144 node ids" -e ONE -n 2700 -l 3 -i 1 -t >> 200 -p 7102 -o INSERT -c 10 -r >> >> The client is writing 10 columns per row key, row key randomly chosen >> from 27 million ids, each column has a key and 10 bytes of data. The total >> on disk size for each write including all overhead is about 400 bytes. >> >> Note to sure able the batching - it may be one of the parameters to >> stress.jar. >> >> Peter >> >> On Mon, Oct 31, 2016 at 4:07 PM, Kant Kodali wrote: >> >>> Hi Guys, >>> >>> >>> I keep reading the articles below but the biggest questions for me are >>> as follows >>> >>> 1) what is the "data size" per request? without data size it hard for me >>> to see anything sensible >>> 2) is there batching here? >>> >>> http://www.datastax.com/1-million-writes >>> >>> http://techblog.netflix.com/2014/07/revisiting-1-million-wri >>> tes-per-second.html >>> >>> Thanks! >>> >>> >>> >>> >> >
Cassandra reaper
Hello, Has anyone played around with the cassandra reaper ( https://github.com/spotify/cassandra-reaper)? if so can some please help me with the set-up, I can't get it working. I used the below steps: 1. create jar file using maven 2. java -jar cassandra-reaper-0.2.3-SNAPSHOT.jar server cassandra-reaper.yaml 3. ./bin/spreaper repair production users
Re: Migrate from C* 2.1.11 to 3.9 (max version I can find in docker hub)
Hey Jeff, Thanks a lot. Biggest change I have my mind is using TimeWindowCompactionStrategy in our timeseries tables (currently we use SizeTieredCompactionStrategy). We already have data in those tables (6 nodes each with 250GB and timedout data but didn't get deleted from the disk) and do you think its safe to do the migration just by changing the table property ? I couldn't find a migration strategy for TWCS from STCS. BTW, thanks for the great work with TWCS. Lahiru On Mon, Oct 31, 2016 at 5:08 PM, Jeff Jirsawrote: > Should be the same as going to 3.0, no file format version bumps between > 3.0 and 3.9 > > > > (There was one format change in 3.6 – CASSANDRA-11206 should have probably > bumped the version identifier, but we didn’t, and there’s nothing special > you’d need to do for it anyway.) > > > > > > > > *From: *Lahiru Gamathige > *Reply-To: *"user@cassandra.apache.org" > *Date: *Monday, October 31, 2016 at 5:04 PM > *To: *"user@cassandra.apache.org" > *Subject: *Migrate from C* 2.1.11 to 3.9 (max version I can find in > docker hub) > > > > Hi Users, > > > > I am trying to find a migration guide from 2.1.* to 3.x and figured I > should go through the NEWS.txt so I read that and found out few things that > I should be careful/consider during the upgrade. > > > > I'm curious there's any documentation with specific steps how to do the > migration. > > > > Anyone finished a successful migration from 2.1.* to 3.x (x > 8). Any > warnings or red lights I need to be considered ? > > > > Regards > > Lahiru >
Re: Migrate from C* 2.1.11 to 3.9 (max version I can find in docker hub)
Should be the same as going to 3.0, no file format version bumps between 3.0 and 3.9 (There was one format change in 3.6 – CASSANDRA-11206 should have probably bumped the version identifier, but we didn’t, and there’s nothing special you’d need to do for it anyway.) From: Lahiru GamathigeReply-To: "user@cassandra.apache.org" Date: Monday, October 31, 2016 at 5:04 PM To: "user@cassandra.apache.org" Subject: Migrate from C* 2.1.11 to 3.9 (max version I can find in docker hub) Hi Users, I am trying to find a migration guide from 2.1.* to 3.x and figured I should go through the NEWS.txt so I read that and found out few things that I should be careful/consider during the upgrade. I'm curious there's any documentation with specific steps how to do the migration. Anyone finished a successful migration from 2.1.* to 3.x (x > 8). Any warnings or red lights I need to be considered ? Regards Lahiru smime.p7s Description: S/MIME cryptographic signature
Migrate from C* 2.1.11 to 3.9 (max version I can find in docker hub)
Hi Users, I am trying to find a migration guide from 2.1.* to 3.x and figured I should go through the NEWS.txt so I read that and found out few things that I should be careful/consider during the upgrade. I'm curious there's any documentation with specific steps how to do the migration. Anyone finished a successful migration from 2.1.* to 3.x (x > 8). Any warnings or red lights I need to be considered ? Regards Lahiru
Re: question on an article
The original article http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html On Mon, Oct 31, 2016 at 5:57 PM, Peter Reillywrote: > From the article: > java -jar stress.jar -d "144 node ids" -e ONE -n 2700 -l 3 -i 1 -t 200 > -p 7102 -o INSERT -c 10 -r > > The client is writing 10 columns per row key, row key randomly chosen from > 27 million ids, each column has a key and 10 bytes of data. The total on > disk size for each write including all overhead is about 400 bytes. > > Note to sure able the batching - it may be one of the parameters to > stress.jar. > > Peter > > On Mon, Oct 31, 2016 at 4:07 PM, Kant Kodali wrote: > >> Hi Guys, >> >> >> I keep reading the articles below but the biggest questions for me are as >> follows >> >> 1) what is the "data size" per request? without data size it hard for me >> to see anything sensible >> 2) is there batching here? >> >> http://www.datastax.com/1-million-writes >> >> http://techblog.netflix.com/2014/07/revisiting-1-million-wri >> tes-per-second.html >> >> Thanks! >> >> >> >> >
Re: question on an article
>From the article: java -jar stress.jar -d "144 node ids" -e ONE -n 2700 -l 3 -i 1 -t 200 -p 7102 -o INSERT -c 10 -r The client is writing 10 columns per row key, row key randomly chosen from 27 million ids, each column has a key and 10 bytes of data. The total on disk size for each write including all overhead is about 400 bytes. Note to sure able the batching - it may be one of the parameters to stress.jar. Peter On Mon, Oct 31, 2016 at 4:07 PM, Kant Kodaliwrote: > Hi Guys, > > > I keep reading the articles below but the biggest questions for me are as > follows > > 1) what is the "data size" per request? without data size it hard for me > to see anything sensible > 2) is there batching here? > > http://www.datastax.com/1-million-writes > > http://techblog.netflix.com/2014/07/revisiting-1-million- > writes-per-second.html > > Thanks! > > > >
Re: Incremental repairs leading to unrepaired data
Blowing out to 1k SSTables seems a bit full on. What args are you passing to repair? Kurt Greaves k...@instaclustr.com www.instaclustr.com On 31 October 2016 at 09:49, Stefano Ortolaniwrote: > I've collected some more data-points, and I still see dropped > mutations with compaction_throughput_mb_per_sec set to 8. > The only notable thing regarding the current setup is that I have > another keyspace (not being repaired though) with really wide rows > (100MB per partition), but that shouldn't have any impact in theory. > Nodes do not seem that overloaded either and don't see any GC spikes > while those mutations are dropped :/ > > Hitting a dead end here, any further idea where to look for further ideas? > > Regards, > Stefano > > On Wed, Aug 10, 2016 at 12:41 PM, Stefano Ortolani > wrote: > > That's what I was thinking. Maybe GC pressure? > > Some more details: during anticompaction I have some CFs exploding to 1K > > SStables (to be back to ~200 upon completion). > > HW specs should be quite good (12 cores/32 GB ram) but, I admit, still > > relying on spinning disks, with ~150GB per node. > > Current version is 3.0.8. > > > > > > On Wed, Aug 10, 2016 at 12:36 PM, Paulo Motta > > wrote: > >> > >> That's pretty low already, but perhaps you should lower to see if it > will > >> improve the dropped mutations during anti-compaction (even if it > increases > >> repair time), otherwise the problem might be somewhere else. Generally > >> dropped mutations is a signal of cluster overload, so if there's nothing > >> else wrong perhaps you need to increase your capacity. What version are > you > >> in? > >> > >> 2016-08-10 8:21 GMT-03:00 Stefano Ortolani : > >>> > >>> Not yet. Right now I have it set at 16. > >>> Would halving it more or less double the repair time? > >>> > >>> On Tue, Aug 9, 2016 at 7:58 PM, Paulo Motta > >>> wrote: > > Anticompaction throttling can be done by setting the usual > compaction_throughput_mb_per_sec knob on cassandra.yaml or via > nodetool > setcompactionthroughput. Did you try lowering that and checking if > that > improves the dropped mutations? > > 2016-08-09 13:32 GMT-03:00 Stefano Ortolani : > > > > Hi all, > > > > I am running incremental repaird on a weekly basis (can't do it every > > day as one single run takes 36 hours), and every time, I have at > least one > > node dropping mutations as part of the process (this almost always > during > > the anticompaction phase). Ironically this leads to a system where > repairing > > makes data consistent at the cost of making some other data not > consistent. > > > > Does anybody know why this is happening? > > > > My feeling is that this might be caused by anticompacting column > > families with really wide rows and with many SStables. If that is > the case, > > any way I can throttle that? > > > > Thanks! > > Stefano > > > >>> > >> > > >
question on an article
Hi Guys, I keep reading the articles below but the biggest questions for me are as follows 1) what is the "data size" per request? without data size it hard for me to see anything sensible 2) is there batching here? http://www.datastax.com/1-million-writes http://techblog.netflix.com/2014/07/revisiting-1-million-writes-per-second.html Thanks!
Re: Secondary Index on Boolean column with TTL
Technically TTL should be handled properly. However, be careful of expired data turning into tombstones. For the original table, it may be a tombstone on a skinny partition but for the 2nd index, it may be a tombstone set on a wide partition and you'll start getting into trouble when reading a partition with a lot of them On Mon, Oct 31, 2016 at 5:08 PM, Oleg Krayushkinwrote: > Hi, DuyHai, thank you. > > I got the idea of caveat with too low cardinality, but still wondering of > possible troubles at the idea to put TTL (months) on indexed column (not > bool, say, 100 different values of int). > > 2016-10-31 16:33 GMT+03:00 DuyHai Doan : > >> http://www.planetcassandra.org/blog/cassandra-native-seconda >> ry-index-deep-dive/ >> >> See section E Caveats which applies to your boolean use-case >> >> On Mon, Oct 31, 2016 at 2:19 PM, Oleg Krayushkin >> wrote: >> >>> Hi, >>> >>> Is it a good approach to make a boolean column with TTL and build a >>> secondary index on it? >>> (For example, I want to get rows which need to be updated after a >>> certain time, but I don't want, say, to add a filed "update_date" as >>> clustering column or to create another table) >>> >>> In what kind of trouble it could lead me? >>> >>> Thanks in advance for any suggestions. >>> >>> -- >>> >>> Oleg Krayushkin >>> >> >> > > > -- > > Oleg Krayushkin >
Does securing C*'s CQL native interface (running on port 9042) automatically secure its Thrift API interface (running on port 9160)?
Hi, I secured my C* cluster by having "authenticator: org.apache.cassandra.auth.PasswordAuthenticator" in cassandra.yaml. I know it secures the CQL native interface running on port 9042 because my code uses such interface. Does this also secure the Thrift API interface running on port 9160? I searched around the web for answers but could not find any. I supposed I can write a sample application using Thrift API interface to confirm it, but wondering if I can get a quick answer from you experts. Thanks. George.
Re: Secondary Index on Boolean column with TTL
Hi, DuyHai, thank you. I got the idea of caveat with too low cardinality, but still wondering of possible troubles at the idea to put TTL (months) on indexed column (not bool, say, 100 different values of int). 2016-10-31 16:33 GMT+03:00 DuyHai Doan: > http://www.planetcassandra.org/blog/cassandra-native- > secondary-index-deep-dive/ > > See section E Caveats which applies to your boolean use-case > > On Mon, Oct 31, 2016 at 2:19 PM, Oleg Krayushkin > wrote: > >> Hi, >> >> Is it a good approach to make a boolean column with TTL and build a >> secondary index on it? >> (For example, I want to get rows which need to be updated after a certain >> time, but I don't want, say, to add a filed "update_date" as clustering >> column or to create another table) >> >> In what kind of trouble it could lead me? >> >> Thanks in advance for any suggestions. >> >> -- >> >> Oleg Krayushkin >> > > -- Oleg Krayushkin
Re: given partition key and secondary index, still require allow_filtering?
Native Cassandra 2nd index does not perform very well with inequalities (<, >, <=, >=). In your case, even if you provide partition key (which is a very good idea), Cassandra still need to perform a full scan on the local node to find any score matching the inequality and it is pretty expensive, thus requiring ALLOW FILTERING. General thumb of rule for production is: ALLOW FILTERING == SURELY TIMEOUT On Mon, Oct 31, 2016 at 9:00 AM, Zao Liuwrote: > Hi, > > I created a table, schema like here: > > CREATE TABLE profile_new.user_categories_1477899735 ( > > id bigint, > > category int, > > score double, > > PRIMARY KEY (id, category) > > ) WITH CLUSTERING ORDER BY (category ASC) > > AND bloom_filter_fp_chance = 0.01 > > AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} > > AND comment = '' > > AND compaction = {'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32', 'min_threshold': '4'} > > AND compression = {'chunk_length_in_kb': '64', 'class': ' > org.apache.cassandra.io.compress.LZ4Compressor'} > > AND crc_check_chance = 1.0 > > AND dclocal_read_repair_chance = 0.1 > > AND default_time_to_live = 0 > > AND gc_grace_seconds = 864000 > > AND max_index_interval = 2048 > > AND memtable_flush_period_in_ms = 0 > > AND min_index_interval = 128 > > AND read_repair_chance = 0.0 > > AND speculative_retry = '99PERCENTILE'; > > CREATE INDEX user_categories_1477899735_score_idx ON > profile_new.user_categories_1477899735 (score); > > > cqlsh:profile_new> select * from user_categories_1477899735 where id=3674; > > > But somehow when I pass partition key and secondary index key, it still > complains: > > cqlsh:profile_new> select * from user_categories_1477899735 where id=3674 > and score > 0.5; > > *InvalidRequest: Error from server: code=2200 [Invalid query] > message="Cannot execute this query as it might involve data filtering and > thus may have unpredictable performance. If you want to execute this query > despite the performance unpredictability, use ALLOW FILTERING"* > > cqlsh:profile_new> > > >
Re: Secondary Index on Boolean column with TTL
http://www.planetcassandra.org/blog/cassandra-native-secondary-index-deep-dive/ See section E Caveats which applies to your boolean use-case On Mon, Oct 31, 2016 at 2:19 PM, Oleg Krayushkinwrote: > Hi, > > Is it a good approach to make a boolean column with TTL and build a > secondary index on it? > (For example, I want to get rows which need to be updated after a certain > time, but I don't want, say, to add a filed "update_date" as clustering > column or to create another table) > > In what kind of trouble it could lead me? > > Thanks in advance for any suggestions. > > -- > > Oleg Krayushkin >
Secondary Index on Boolean column with TTL
Hi, Is it a good approach to make a boolean column with TTL and build a secondary index on it? (For example, I want to get rows which need to be updated after a certain time, but I don't want, say, to add a filed "update_date" as clustering column or to create another table) In what kind of trouble it could lead me? Thanks in advance for any suggestions. -- Oleg Krayushkin
Re: Securing a Cassandra 2.2.6 Cluster
I would set rpc_address to 0.0.0.0 and broadcast_rpc_address to EACH_IP This allows to connect to both 127.0.0.1 from inside and to IP from outside. By a way, I see that port 7000 bound to external IP. Aren't both node in the same network? If yes, use internal IPs. Best regards, Vladimir Yudovin, Winguzone - Hosted Cloud Cassandra Launch your cluster in minutes. On Sun, 30 Oct 2016 15:37:50 -0400Raimund Klein chessra...@gmail.com wrote Hi guys, Thank you for your responses. Let me try to address them: I just tried cqlsh directly with the IP, no change in behaviour. (I previously tried the hostnames, didn't work either.) As for the "empty" ..._address: I meant that I leave these blank. Please let me quote from the default cassandra.yaml: # Leaving it blank leaves it up to InetAddress.getLocalHost(). This # (hostname, name resolution, etc), and the Right Thing is to use the # address associated with the hostname (it might not be). # will always do the Right Thing _if_ the node is properly configured So what should I put instead? Requested outputs: nodetool status Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN IP_1 344.56 KB 256 100.0% 6271c749-e41d-443c-89e4-46c0fbac49af rack1 UN IP_2 266.91 KB 256 100.0% e50a1076-7149-45f3-9001-26bb479f2a50 rack1 # netstat -lptn | grep java tcp0 0 IP_1:70000.0.0.0:* LISTEN 17040/java tcp0 0 127.0.0.1:36415 0.0.0.0:* LISTEN 17040/java tcp0 0 127.0.0.1:7199 0.0.0.0:* LISTEN 17040/java tcp6 0 0 IP_1:9042:::*LISTEN 17040/java # netstat -lptn | grep java tcp0 0 127.0.0.1:43569 0.0.0.0:* LISTEN 49349/java tcp0 0 IP_2:7000 0.0.0.0:* LISTEN 49349/java tcp0 0 127.0.0.1:7199 0.0.0.0:* LISTEN 49349/java tcp6 0 0 :::8009 :::*LISTEN 42088/java tcp6 0 0 :::8080 :::*LISTEN 42088/java tcp6 0 0 IP_2:9042 :::*LISTEN 49349/java tcp6 0 0 127.0.0.1:8005 :::*LISTEN 42088/java Jonathan, thank you for reassuring me that I didn't misunderstand seeds completely. ;-) Any ideas? Regards Raimund 2016-10-30 18:48 GMT+00:00 Jonathan Haddad j...@jonhaddad.com: I always prefer to set the listen interface instead of listen adress Both nodes can be seeds. In fact, there should be more than one seed. Having your first 2 nodes as seeds is usual the correct thing to do. On Sun, Oct 30, 2016 at 8:28 AM Vladimir Yudovin vla...@winguzone.com wrote: Empty listen_address and rpc_address. What do you mean by "Empty"? You should set either ***_address or ***_interface. Otherwise Cassandra will not listen on port 9042. Open ports 9042, 7000 and 7001 for external communication. Only port 9042 should be open to the world, Port 7000 for internode communication, and 7001 for internode SSL communication (only one of them is used). What is the best order of steps Order doesn't really matter. Define both machines as seeds. It's wrong. Only one (started first) should be seed. nodetool sees both of them cqlsh refuses to connect Can you please give output of nodetool status and netstat -lptn | grep java Best regards, Vladimir Yudovin, Winguzone - Hosted Cloud Cassandra Launch your cluster in minutes. On Sun, 30 Oct 2016 14:11:55 -0400Raimund Klein chessra...@gmail.com wrote Hi everyone, We've managed to set up a Cassandra 2.2.6 cluster of two physical nodes (nodetool sees both of them, so I'm quite certain the cluster is indeed active). My steps to create the cluster were (this applies to both machines): - Empty listen_address and rpc_address. - Define a cluster_name. - Define both machines as seeds. - Open ports 9042, 7000 and 7001 for external communication. Now I want to secure access to the cluster in all forms: - define a different database user with a new password - encrypt communication bet ween clients and the cluster including client verification - encrypt communication between the nodes including verification What is the best order of steps and correct way to achieve this? I wanted to start with defining a different user, but cqlsh refuses to connect after enforcing user/password authentication: cqlsh -u cassandra -p cassandra Connection error: ('Unable to
Re: Securing a Cassandra 2.2.6 Cluster
Both nodes can be seeds. Probably I misunderstood Raimund as setting each node as the only seed. If he set both IP on both nodes it's OK. Best regards, Vladimir Yudovin, Winguzone - Hosted Cloud Cassandra Launch your cluster in minutes. On Sun, 30 Oct 2016 14:48:00 -0400Jonathan Haddad j...@jonhaddad.com wrote I always prefer to set the listen interface instead of listen adress Both nodes can be seeds. In fact, there should be more than one seed. Having your first 2 nodes as seeds is usual the correct thing to do. On Sun, Oct 30, 2016 at 8:28 AM Vladimir Yudovin vla...@winguzone.com wrote: Empty listen_address and rpc_address. What do you mean by "Empty"? You should set either ***_address or ***_interface. Otherwise Cassandra will not listen on port 9042. Open ports 9042, 7000 and 7001 for external communication. Only port 9042 should be open to the world, Port 7000 for internode communication, and 7001 for internode SSL communication (only one of them is used). What is the best order of steps Order doesn't really matter. Define both machines as seeds. It's wrong. Only one (started first) should be seed. nodetool sees both of them cqlsh refuses to connect Can you please give output of nodetool status and netstat -lptn | grep java Best regards, Vladimir Yudovin, Winguzone - Hosted Cloud Cassandra Launch your cluster in minutes. On Sun, 30 Oct 2016 14:11:55 -0400Raimund Klein chessra...@gmail.com wrote Hi everyone, We've managed to set up a Cassandra 2.2.6 cluster of two physical nodes (nodetool sees both of them, so I'm quite certain the cluster is indeed active). My steps to create the cluster were (this applies to both machines): - Empty listen_address and rpc_address. - Define a cluster_name. - Define both machines as seeds. - Open ports 9042, 7000 and 7001 for external communication. Now I want to secure access to the cluster in all forms: - define a different database user with a new password - encrypt communication bet ween clients and the cluster including client verification - encrypt communication between the nodes including verification What is the best order of steps and correct way to achieve this? I wanted to start with defining a different user, but cqlsh refuses to connect after enforcing user/password authentication: cqlsh -u cassandra -p cassandra Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")}) This happens when I run the command on either of the two machines. Any help would be greatly appreciated.
Re: Incremental repairs leading to unrepaired data
I've collected some more data-points, and I still see dropped mutations with compaction_throughput_mb_per_sec set to 8. The only notable thing regarding the current setup is that I have another keyspace (not being repaired though) with really wide rows (100MB per partition), but that shouldn't have any impact in theory. Nodes do not seem that overloaded either and don't see any GC spikes while those mutations are dropped :/ Hitting a dead end here, any further idea where to look for further ideas? Regards, Stefano On Wed, Aug 10, 2016 at 12:41 PM, Stefano Ortolaniwrote: > That's what I was thinking. Maybe GC pressure? > Some more details: during anticompaction I have some CFs exploding to 1K > SStables (to be back to ~200 upon completion). > HW specs should be quite good (12 cores/32 GB ram) but, I admit, still > relying on spinning disks, with ~150GB per node. > Current version is 3.0.8. > > > On Wed, Aug 10, 2016 at 12:36 PM, Paulo Motta > wrote: >> >> That's pretty low already, but perhaps you should lower to see if it will >> improve the dropped mutations during anti-compaction (even if it increases >> repair time), otherwise the problem might be somewhere else. Generally >> dropped mutations is a signal of cluster overload, so if there's nothing >> else wrong perhaps you need to increase your capacity. What version are you >> in? >> >> 2016-08-10 8:21 GMT-03:00 Stefano Ortolani : >>> >>> Not yet. Right now I have it set at 16. >>> Would halving it more or less double the repair time? >>> >>> On Tue, Aug 9, 2016 at 7:58 PM, Paulo Motta >>> wrote: Anticompaction throttling can be done by setting the usual compaction_throughput_mb_per_sec knob on cassandra.yaml or via nodetool setcompactionthroughput. Did you try lowering that and checking if that improves the dropped mutations? 2016-08-09 13:32 GMT-03:00 Stefano Ortolani : > > Hi all, > > I am running incremental repaird on a weekly basis (can't do it every > day as one single run takes 36 hours), and every time, I have at least one > node dropping mutations as part of the process (this almost always during > the anticompaction phase). Ironically this leads to a system where > repairing > makes data consistent at the cost of making some other data not > consistent. > > Does anybody know why this is happening? > > My feeling is that this might be caused by anticompacting column > families with really wide rows and with many SStables. If that is the > case, > any way I can throttle that? > > Thanks! > Stefano >>> >> >
given partition key and secondary index, still require allow_filtering?
Hi, I created a table, schema like here: CREATE TABLE profile_new.user_categories_1477899735 ( id bigint, category int, score double, PRIMARY KEY (id, category) ) WITH CLUSTERING ORDER BY (category ASC) AND bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'} AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99PERCENTILE'; CREATE INDEX user_categories_1477899735_score_idx ON profile_new.user_categories_1477899735 (score); cqlsh:profile_new> select * from user_categories_1477899735 where id=3674; But somehow when I pass partition key and secondary index key, it still complains: cqlsh:profile_new> select * from user_categories_1477899735 where id=3674 and score > 0.5; *InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"* cqlsh:profile_new>