"java.io.IOError: java.io.EOFException: EOF after 13889 bytes out of 460861" occured when I query from a table

2016-10-31 Thread ????/??????
Hi, all I hava a problem. I create a table named "tblA" in c* and create a materialized view name viewA on tblA. I run spark job to processing data from 'viewA'. In the beginning, it works well. But in the next day, the spark job failed. And when I select data from the 'viewA' and

Re: question on an article

2016-10-31 Thread Kant Kodali
Hi Peter, Thanks for sending this over. I dont know how 100 Bytes (10 bytes of data * 10 columns) can represent anything useful? These days it is better to benchmark things around 1KB. Thanks! On Mon, Oct 31, 2016 at 4:58 PM, Peter Reilly wrote: > The original

Cassandra reaper

2016-10-31 Thread Jai Bheemsen Rao Dhanwada
Hello, Has anyone played around with the cassandra reaper ( https://github.com/spotify/cassandra-reaper)? if so can some please help me with the set-up, I can't get it working. I used the below steps: 1. create jar file using maven 2. java -jar cassandra-reaper-0.2.3-SNAPSHOT.jar server

Re: Migrate from C* 2.1.11 to 3.9 (max version I can find in docker hub)

2016-10-31 Thread Lahiru Gamathige
Hey Jeff, Thanks a lot. Biggest change I have my mind is using TimeWindowCompactionStrategy in our timeseries tables (currently we use SizeTieredCompactionStrategy). We already have data in those tables (6 nodes each with 250GB and timedout data but didn't get deleted from the disk) and do you

Re: Migrate from C* 2.1.11 to 3.9 (max version I can find in docker hub)

2016-10-31 Thread Jeff Jirsa
Should be the same as going to 3.0, no file format version bumps between 3.0 and 3.9 (There was one format change in 3.6 – CASSANDRA-11206 should have probably bumped the version identifier, but we didn’t, and there’s nothing special you’d need to do for it anyway.) From: Lahiru

Migrate from C* 2.1.11 to 3.9 (max version I can find in docker hub)

2016-10-31 Thread Lahiru Gamathige
Hi Users, I am trying to find a migration guide from 2.1.* to 3.x and figured I should go through the NEWS.txt so I read that and found out few things that I should be careful/consider during the upgrade. I'm curious there's any documentation with specific steps how to do the migration. Anyone

Re: question on an article

2016-10-31 Thread Peter Reilly
The original article http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html On Mon, Oct 31, 2016 at 5:57 PM, Peter Reilly wrote: > From the article: > java -jar stress.jar -d "144 node ids" -e ONE -n 2700 -l 3 -i 1 -t 200 > -p 7102 -o

Re: question on an article

2016-10-31 Thread Peter Reilly
>From the article: java -jar stress.jar -d "144 node ids" -e ONE -n 2700 -l 3 -i 1 -t 200 -p 7102 -o INSERT -c 10 -r The client is writing 10 columns per row key, row key randomly chosen from 27 million ids, each column has a key and 10 bytes of data. The total on disk size for each write

Re: Incremental repairs leading to unrepaired data

2016-10-31 Thread kurt Greaves
Blowing out to 1k SSTables seems a bit full on. What args are you passing to repair? Kurt Greaves k...@instaclustr.com www.instaclustr.com On 31 October 2016 at 09:49, Stefano Ortolani wrote: > I've collected some more data-points, and I still see dropped > mutations with

question on an article

2016-10-31 Thread Kant Kodali
Hi Guys, I keep reading the articles below but the biggest questions for me are as follows 1) what is the "data size" per request? without data size it hard for me to see anything sensible 2) is there batching here? http://www.datastax.com/1-million-writes

Re: Secondary Index on Boolean column with TTL

2016-10-31 Thread DuyHai Doan
Technically TTL should be handled properly. However, be careful of expired data turning into tombstones. For the original table, it may be a tombstone on a skinny partition but for the 2nd index, it may be a tombstone set on a wide partition and you'll start getting into trouble when reading a

Does securing C*'s CQL native interface (running on port 9042) automatically secure its Thrift API interface (running on port 9160)?

2016-10-31 Thread Li, Guangxing
Hi, I secured my C* cluster by having "authenticator: org.apache.cassandra.auth.PasswordAuthenticator" in cassandra.yaml. I know it secures the CQL native interface running on port 9042 because my code uses such interface. Does this also secure the Thrift API interface running on port 9160? I

Re: Secondary Index on Boolean column with TTL

2016-10-31 Thread Oleg Krayushkin
Hi, DuyHai, thank you. I got the idea of caveat with too low cardinality, but still wondering of possible troubles at the idea to put TTL (months) on indexed column (not bool, say, 100 different values of int). 2016-10-31 16:33 GMT+03:00 DuyHai Doan : >

Re: given partition key and secondary index, still require allow_filtering?

2016-10-31 Thread DuyHai Doan
Native Cassandra 2nd index does not perform very well with inequalities (<, >, <=, >=). In your case, even if you provide partition key (which is a very good idea), Cassandra still need to perform a full scan on the local node to find any score matching the inequality and it is pretty expensive,

Re: Secondary Index on Boolean column with TTL

2016-10-31 Thread DuyHai Doan
http://www.planetcassandra.org/blog/cassandra-native-secondary-index-deep-dive/ See section E Caveats which applies to your boolean use-case On Mon, Oct 31, 2016 at 2:19 PM, Oleg Krayushkin wrote: > Hi, > > Is it a good approach to make a boolean column with TTL and build

Secondary Index on Boolean column with TTL

2016-10-31 Thread Oleg Krayushkin
Hi, Is it a good approach to make a boolean column with TTL and build a secondary index on it? (For example, I want to get rows which need to be updated after a certain time, but I don't want, say, to add a filed "update_date" as clustering column or to create another table) In what kind of

Re: Securing a Cassandra 2.2.6 Cluster

2016-10-31 Thread Vladimir Yudovin
I would set rpc_address to 0.0.0.0 and broadcast_rpc_address to EACH_IP This allows to connect to both 127.0.0.1 from inside and to IP from outside. By a way, I see that port 7000 bound to external IP. Aren't both node in the same network? If yes, use internal IPs. Best regards,

Re: Securing a Cassandra 2.2.6 Cluster

2016-10-31 Thread Vladimir Yudovin
Both nodes can be seeds. Probably I misunderstood Raimund as setting each node as the only seed. If he set both IP on both nodes it's OK. Best regards, Vladimir Yudovin, Winguzone - Hosted Cloud Cassandra Launch your cluster in minutes. On Sun, 30 Oct 2016 14:48:00 -0400Jonathan

Re: Incremental repairs leading to unrepaired data

2016-10-31 Thread Stefano Ortolani
I've collected some more data-points, and I still see dropped mutations with compaction_throughput_mb_per_sec set to 8. The only notable thing regarding the current setup is that I have another keyspace (not being repaired though) with really wide rows (100MB per partition), but that shouldn't

given partition key and secondary index, still require allow_filtering?

2016-10-31 Thread Zao Liu
Hi, I created a table, schema like here: CREATE TABLE profile_new.user_categories_1477899735 ( id bigint, category int, score double, PRIMARY KEY (id, category) ) WITH CLUSTERING ORDER BY (category ASC) AND bloom_filter_fp_chance = 0.01 AND caching = {'keys':