"java.io.IOError: java.io.EOFException: EOF after 13889 bytes out of 460861" occured when I query from a table

2016-10-31 Thread ????/??????
Hi, all
I hava a problem. I create a table named "tblA" in c* and create a 
materialized view name viewA on tblA. I run spark job to processing data from 
'viewA'.
In the beginning, it works well. But in the next day, the spark job failed. 
And when I select data from the 'viewA' and 'tblA' using cql, it throw the 
follwing exception.
query from viewA:
 "ServerError: "
and query from tblA:
 "ServerError: "


My system version is :
Cassandra 3.7  +   spark1.6.2   +  Spark Cassandra Connector 1.6


If anyone know about this problem? Look forward to your reply.


Thanks

Re: question on an article

2016-10-31 Thread Kant Kodali
Hi Peter,

Thanks for sending this over. I dont know how 100 Bytes (10 bytes of data *
10 columns) can represent anything useful? These days it is better to
benchmark things around 1KB.

Thanks!

On Mon, Oct 31, 2016 at 4:58 PM, Peter Reilly 
wrote:

> The original article
> http://techblog.netflix.com/2011/11/benchmarking-
> cassandra-scalability-on.html
>
>
> On Mon, Oct 31, 2016 at 5:57 PM, Peter Reilly  > wrote:
>
>> From the article:
>> java -jar stress.jar -d "144 node ids" -e ONE -n 2700 -l 3 -i 1 -t
>> 200 -p 7102 -o INSERT -c 10 -r
>>
>> The client is writing 10 columns per row key, row key randomly chosen
>> from 27 million ids, each column has a key and 10 bytes of data. The total
>> on disk size for each write including all overhead is about 400 bytes.
>>
>> Note to sure able the batching - it may be one of the parameters to
>> stress.jar.
>>
>> Peter
>>
>> On Mon, Oct 31, 2016 at 4:07 PM, Kant Kodali  wrote:
>>
>>> Hi Guys,
>>>
>>>
>>> I keep reading the articles below but the biggest questions for me are
>>> as follows
>>>
>>> 1) what is the "data size" per request? without data size it hard for me
>>> to see anything sensible
>>> 2) is there batching here?
>>>
>>> http://www.datastax.com/1-million-writes
>>>
>>> http://techblog.netflix.com/2014/07/revisiting-1-million-wri
>>> tes-per-second.html
>>>
>>> Thanks!
>>>
>>>
>>>
>>>
>>
>


Cassandra reaper

2016-10-31 Thread Jai Bheemsen Rao Dhanwada
Hello,

Has anyone played around with the cassandra reaper (
https://github.com/spotify/cassandra-reaper)?

if so can some please help me with the set-up, I can't get it working. I
used the below steps:

1. create jar file using maven
2. java -jar cassandra-reaper-0.2.3-SNAPSHOT.jar server
cassandra-reaper.yaml
3. ./bin/spreaper repair production users


Re: Migrate from C* 2.1.11 to 3.9 (max version I can find in docker hub)

2016-10-31 Thread Lahiru Gamathige
Hey Jeff,

Thanks a lot. Biggest change I have my mind is using
TimeWindowCompactionStrategy  in our timeseries tables (currently we use
SizeTieredCompactionStrategy).

We already have data in those tables (6 nodes each with 250GB and timedout
data but didn't get deleted from the disk) and do you think its safe to do
the migration just by changing the table property ?

I couldn't find a migration strategy for TWCS from STCS.

BTW, thanks for the great work with TWCS.

Lahiru

On Mon, Oct 31, 2016 at 5:08 PM, Jeff Jirsa 
wrote:

> Should be the same as going to 3.0, no file format version bumps between
> 3.0 and 3.9
>
>
>
> (There was one format change in 3.6 – CASSANDRA-11206 should have probably
> bumped the version identifier, but we didn’t, and there’s nothing special
> you’d need to do for it anyway.)
>
>
>
>
>
>
>
> *From: *Lahiru Gamathige 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Monday, October 31, 2016 at 5:04 PM
> *To: *"user@cassandra.apache.org" 
> *Subject: *Migrate from C* 2.1.11 to 3.9 (max version I can find in
> docker hub)
>
>
>
> Hi Users,
>
>
>
> I am trying to find a migration guide from 2.1.* to 3.x and figured I
> should go through the NEWS.txt so I read that and found out few things that
> I should be careful/consider during the upgrade.
>
>
>
> I'm curious there's any documentation with specific steps how to do the
> migration.
>
>
>
> Anyone finished a successful migration from 2.1.* to 3.x (x > 8). Any
> warnings or red lights I need to be considered ?
>
>
>
> Regards
>
> Lahiru
>


Re: Migrate from C* 2.1.11 to 3.9 (max version I can find in docker hub)

2016-10-31 Thread Jeff Jirsa
Should be the same as going to 3.0, no file format version bumps between 3.0 
and 3.9

 

(There was one format change in 3.6 – CASSANDRA-11206 should have probably 
bumped the version identifier, but we didn’t, and there’s nothing special you’d 
need to do for it anyway.)

 

 

 

From: Lahiru Gamathige 
Reply-To: "user@cassandra.apache.org" 
Date: Monday, October 31, 2016 at 5:04 PM
To: "user@cassandra.apache.org" 
Subject: Migrate from C* 2.1.11 to 3.9 (max version I can find in docker hub)

 

Hi Users, 

 

I am trying to find a migration guide from 2.1.* to 3.x and figured I should go 
through the NEWS.txt so I read that and found out few things that I should be 
careful/consider during the upgrade.

 

I'm curious there's any documentation with specific steps how to do the 
migration. 

 

Anyone finished a successful migration from 2.1.* to 3.x (x > 8). Any warnings 
or red lights I need to be considered ?

 

Regards

Lahiru



smime.p7s
Description: S/MIME cryptographic signature


Migrate from C* 2.1.11 to 3.9 (max version I can find in docker hub)

2016-10-31 Thread Lahiru Gamathige
Hi Users,

I am trying to find a migration guide from 2.1.* to 3.x and figured I
should go through the NEWS.txt so I read that and found out few things that
I should be careful/consider during the upgrade.

I'm curious there's any documentation with specific steps how to do the
migration.

Anyone finished a successful migration from 2.1.* to 3.x (x > 8). Any
warnings or red lights I need to be considered ?

Regards
Lahiru


Re: question on an article

2016-10-31 Thread Peter Reilly
The original article
http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html


On Mon, Oct 31, 2016 at 5:57 PM, Peter Reilly 
wrote:

> From the article:
> java -jar stress.jar -d "144 node ids" -e ONE -n 2700 -l 3 -i 1 -t 200
> -p 7102 -o INSERT -c 10 -r
>
> The client is writing 10 columns per row key, row key randomly chosen from
> 27 million ids, each column has a key and 10 bytes of data. The total on
> disk size for each write including all overhead is about 400 bytes.
>
> Note to sure able the batching - it may be one of the parameters to
> stress.jar.
>
> Peter
>
> On Mon, Oct 31, 2016 at 4:07 PM, Kant Kodali  wrote:
>
>> Hi Guys,
>>
>>
>> I keep reading the articles below but the biggest questions for me are as
>> follows
>>
>> 1) what is the "data size" per request? without data size it hard for me
>> to see anything sensible
>> 2) is there batching here?
>>
>> http://www.datastax.com/1-million-writes
>>
>> http://techblog.netflix.com/2014/07/revisiting-1-million-wri
>> tes-per-second.html
>>
>> Thanks!
>>
>>
>>
>>
>


Re: question on an article

2016-10-31 Thread Peter Reilly
>From the article:
java -jar stress.jar -d "144 node ids" -e ONE -n 2700 -l 3 -i 1 -t 200
-p 7102 -o INSERT -c 10 -r

The client is writing 10 columns per row key, row key randomly chosen from
27 million ids, each column has a key and 10 bytes of data. The total on
disk size for each write including all overhead is about 400 bytes.

Note to sure able the batching - it may be one of the parameters to
stress.jar.

Peter

On Mon, Oct 31, 2016 at 4:07 PM, Kant Kodali  wrote:

> Hi Guys,
>
>
> I keep reading the articles below but the biggest questions for me are as
> follows
>
> 1) what is the "data size" per request? without data size it hard for me
> to see anything sensible
> 2) is there batching here?
>
> http://www.datastax.com/1-million-writes
>
> http://techblog.netflix.com/2014/07/revisiting-1-million-
> writes-per-second.html
>
> Thanks!
>
>
>
>


Re: Incremental repairs leading to unrepaired data

2016-10-31 Thread kurt Greaves
Blowing out to 1k SSTables seems a bit full on. What args are you passing
to repair?

Kurt Greaves
k...@instaclustr.com
www.instaclustr.com

On 31 October 2016 at 09:49, Stefano Ortolani  wrote:

> I've collected some more data-points, and I still see dropped
> mutations with compaction_throughput_mb_per_sec set to 8.
> The only notable thing regarding the current setup is that I have
> another keyspace (not being repaired though) with really wide rows
> (100MB per partition), but that shouldn't have any impact in theory.
> Nodes do not seem that overloaded either and don't see any GC spikes
> while those mutations are dropped :/
>
> Hitting a dead end here, any further idea where to look for further ideas?
>
> Regards,
> Stefano
>
> On Wed, Aug 10, 2016 at 12:41 PM, Stefano Ortolani 
> wrote:
> > That's what I was thinking. Maybe GC pressure?
> > Some more details: during anticompaction I have some CFs exploding to 1K
> > SStables (to be back to ~200 upon completion).
> > HW specs should be quite good (12 cores/32 GB ram) but, I admit, still
> > relying on spinning disks, with ~150GB per node.
> > Current version is 3.0.8.
> >
> >
> > On Wed, Aug 10, 2016 at 12:36 PM, Paulo Motta 
> > wrote:
> >>
> >> That's pretty low already, but perhaps you should lower to see if it
> will
> >> improve the dropped mutations during anti-compaction (even if it
> increases
> >> repair time), otherwise the problem might be somewhere else. Generally
> >> dropped mutations is a signal of cluster overload, so if there's nothing
> >> else wrong perhaps you need to increase your capacity. What version are
> you
> >> in?
> >>
> >> 2016-08-10 8:21 GMT-03:00 Stefano Ortolani :
> >>>
> >>> Not yet. Right now I have it set at 16.
> >>> Would halving it more or less double the repair time?
> >>>
> >>> On Tue, Aug 9, 2016 at 7:58 PM, Paulo Motta 
> >>> wrote:
> 
>  Anticompaction throttling can be done by setting the usual
>  compaction_throughput_mb_per_sec knob on cassandra.yaml or via
> nodetool
>  setcompactionthroughput. Did you try lowering that  and checking if
> that
>  improves the dropped mutations?
> 
>  2016-08-09 13:32 GMT-03:00 Stefano Ortolani :
> >
> > Hi all,
> >
> > I am running incremental repaird on a weekly basis (can't do it every
> > day as one single run takes 36 hours), and every time, I have at
> least one
> > node dropping mutations as part of the process (this almost always
> during
> > the anticompaction phase). Ironically this leads to a system where
> repairing
> > makes data consistent at the cost of making some other data not
> consistent.
> >
> > Does anybody know why this is happening?
> >
> > My feeling is that this might be caused by anticompacting column
> > families with really wide rows and with many SStables. If that is
> the case,
> > any way I can throttle that?
> >
> > Thanks!
> > Stefano
> 
> 
> >>>
> >>
> >
>


question on an article

2016-10-31 Thread Kant Kodali
Hi Guys,


I keep reading the articles below but the biggest questions for me are as
follows

1) what is the "data size" per request? without data size it hard for me to
see anything sensible
2) is there batching here?

http://www.datastax.com/1-million-writes

http://techblog.netflix.com/2014/07/revisiting-1-million-writes-per-second.html

Thanks!


Re: Secondary Index on Boolean column with TTL

2016-10-31 Thread DuyHai Doan
Technically TTL should be handled properly. However, be careful of expired
data turning into tombstones. For the original table, it may be a tombstone
on a skinny partition but for the 2nd index, it may be a tombstone set on a
wide partition and you'll start getting into trouble when reading a
partition with a lot of them

On Mon, Oct 31, 2016 at 5:08 PM, Oleg Krayushkin 
wrote:

> Hi, DuyHai, thank you.
>
> I got the idea of caveat with too low cardinality, but still wondering of
> possible troubles at the idea to put TTL (months) on indexed column (not
> bool, say, 100 different values of int).
>
> 2016-10-31 16:33 GMT+03:00 DuyHai Doan :
>
>> http://www.planetcassandra.org/blog/cassandra-native-seconda
>> ry-index-deep-dive/
>>
>> See section E Caveats which applies to your boolean use-case
>>
>> On Mon, Oct 31, 2016 at 2:19 PM, Oleg Krayushkin 
>> wrote:
>>
>>> Hi,
>>>
>>> Is it a good approach to make a boolean column with TTL and build a
>>> secondary index on it?
>>> (For example, I want to get rows which need to be updated after a
>>> certain time, but I don't want, say, to add a filed "update_date" as
>>> clustering column or to create another table)
>>>
>>> In what kind of trouble it could lead me?
>>>
>>> Thanks in advance for any suggestions.
>>>
>>> --
>>>
>>> Oleg Krayushkin
>>>
>>
>>
>
>
> --
>
> Oleg Krayushkin
>


Does securing C*'s CQL native interface (running on port 9042) automatically secure its Thrift API interface (running on port 9160)?

2016-10-31 Thread Li, Guangxing
Hi,

I secured my C* cluster by having "authenticator:
org.apache.cassandra.auth.PasswordAuthenticator" in cassandra.yaml. I know
it secures the CQL native interface running on port 9042 because my code
uses such interface. Does this also secure the Thrift API interface running
on port 9160? I searched around the web for answers but could not find any.
I supposed I can write a sample application using Thrift API interface to
confirm it, but wondering if I can get a quick answer from you experts.

Thanks.

George.


Re: Secondary Index on Boolean column with TTL

2016-10-31 Thread Oleg Krayushkin
Hi, DuyHai, thank you.

I got the idea of caveat with too low cardinality, but still wondering of
possible troubles at the idea to put TTL (months) on indexed column (not
bool, say, 100 different values of int).

2016-10-31 16:33 GMT+03:00 DuyHai Doan :

> http://www.planetcassandra.org/blog/cassandra-native-
> secondary-index-deep-dive/
>
> See section E Caveats which applies to your boolean use-case
>
> On Mon, Oct 31, 2016 at 2:19 PM, Oleg Krayushkin 
> wrote:
>
>> Hi,
>>
>> Is it a good approach to make a boolean column with TTL and build a
>> secondary index on it?
>> (For example, I want to get rows which need to be updated after a certain
>> time, but I don't want, say, to add a filed "update_date" as clustering
>> column or to create another table)
>>
>> In what kind of trouble it could lead me?
>>
>> Thanks in advance for any suggestions.
>>
>> --
>>
>> Oleg Krayushkin
>>
>
>


-- 

Oleg Krayushkin


Re: given partition key and secondary index, still require allow_filtering?

2016-10-31 Thread DuyHai Doan
Native Cassandra 2nd index does not perform very well with inequalities (<,
>, <=, >=). In your case, even if you provide partition key (which is a
very good idea), Cassandra still need to perform a full scan on the local
node to find any score matching the inequality and it is pretty expensive,
thus requiring ALLOW FILTERING.

General thumb of rule for production is:  ALLOW FILTERING == SURELY TIMEOUT

On Mon, Oct 31, 2016 at 9:00 AM, Zao Liu  wrote:

> Hi,
>
> I created a table, schema like here:
>
> CREATE TABLE profile_new.user_categories_1477899735 (
>
> id bigint,
>
> category int,
>
> score double,
>
> PRIMARY KEY (id, category)
>
> ) WITH CLUSTERING ORDER BY (category ASC)
>
> AND bloom_filter_fp_chance = 0.01
>
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>
> AND comment = ''
>
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32', 'min_threshold': '4'}
>
> AND compression = {'chunk_length_in_kb': '64', 'class': '
> org.apache.cassandra.io.compress.LZ4Compressor'}
>
> AND crc_check_chance = 1.0
>
> AND dclocal_read_repair_chance = 0.1
>
> AND default_time_to_live = 0
>
> AND gc_grace_seconds = 864000
>
> AND max_index_interval = 2048
>
> AND memtable_flush_period_in_ms = 0
>
> AND min_index_interval = 128
>
> AND read_repair_chance = 0.0
>
> AND speculative_retry = '99PERCENTILE';
>
> CREATE INDEX user_categories_1477899735_score_idx ON
> profile_new.user_categories_1477899735 (score);
>
>
> cqlsh:profile_new> select * from user_categories_1477899735 where id=3674;
>
>
> But somehow when I pass partition key and secondary index key, it still
> complains:
>
> cqlsh:profile_new> select * from user_categories_1477899735 where id=3674
> and score > 0.5;
>
> *InvalidRequest: Error from server: code=2200 [Invalid query]
> message="Cannot execute this query as it might involve data filtering and
> thus may have unpredictable performance. If you want to execute this query
> despite the performance unpredictability, use ALLOW FILTERING"*
>
> cqlsh:profile_new>
>
>
>


Re: Secondary Index on Boolean column with TTL

2016-10-31 Thread DuyHai Doan
http://www.planetcassandra.org/blog/cassandra-native-secondary-index-deep-dive/

See section E Caveats which applies to your boolean use-case

On Mon, Oct 31, 2016 at 2:19 PM, Oleg Krayushkin 
wrote:

> Hi,
>
> Is it a good approach to make a boolean column with TTL and build a
> secondary index on it?
> (For example, I want to get rows which need to be updated after a certain
> time, but I don't want, say, to add a filed "update_date" as clustering
> column or to create another table)
>
> In what kind of trouble it could lead me?
>
> Thanks in advance for any suggestions.
>
> --
>
> Oleg Krayushkin
>


Secondary Index on Boolean column with TTL

2016-10-31 Thread Oleg Krayushkin
Hi,

Is it a good approach to make a boolean column with TTL and build a
secondary index on it?
(For example, I want to get rows which need to be updated after a certain
time, but I don't want, say, to add a filed "update_date" as clustering
column or to create another table)

In what kind of trouble it could lead me?

Thanks in advance for any suggestions.

-- 

Oleg Krayushkin


Re: Securing a Cassandra 2.2.6 Cluster

2016-10-31 Thread Vladimir Yudovin
I would set rpc_address to 0.0.0.0 and broadcast_rpc_address to EACH_IP

This allows to connect to both 127.0.0.1 from inside and to IP from outside.



By a way, I see that port 7000 bound to external IP. Aren't both node in the 
same network? If yes, use internal IPs. 

 



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Sun, 30 Oct 2016 15:37:50 -0400Raimund Klein 
chessra...@gmail.com wrote 




Hi guys,



Thank you for your responses. Let me try to address them:



I just tried cqlsh directly with the IP, no change in behaviour. (I previously 
tried the hostnames, didn't work either.)

As for the "empty" ..._address: I meant that I leave these blank. Please let me 
quote from the default cassandra.yaml:

# Leaving it blank leaves it up to InetAddress.getLocalHost(). This

# (hostname, name resolution, etc), and the Right Thing is to use the

# address associated with the hostname (it might not be).

# will always do the Right Thing _if_ the node is properly configured

So what should I put instead?


Requested outputs:

 

nodetool status
Datacenter: datacenter1
===

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address   Load   Tokens   Owns (effective)  Host ID 
  Rack

UN  IP_1   344.56 KB  256  100.0%
6271c749-e41d-443c-89e4-46c0fbac49af  rack1

UN  IP_2  266.91 KB  256  100.0%
e50a1076-7149-45f3-9001-26bb479f2a50  rack1


# netstat -lptn | grep java
tcp0  0 IP_1:70000.0.0.0:*   LISTEN 
 17040/java  
tcp0  0 127.0.0.1:36415 0.0.0.0:*   LISTEN  
17040/java  
tcp0  0 127.0.0.1:7199  0.0.0.0:*   LISTEN  
17040/java  
tcp6   0  0 IP_1:9042:::*LISTEN 
 17040/java

# netstat -lptn | grep java

tcp0  0 127.0.0.1:43569 0.0.0.0:*   LISTEN  
49349/java  
tcp0  0 IP_2:7000   0.0.0.0:*   LISTEN  
49349/java  
tcp0  0 127.0.0.1:7199  0.0.0.0:*   LISTEN  
49349/java  
tcp6   0  0 :::8009 :::*LISTEN  
42088/java  
tcp6   0  0 :::8080 :::*LISTEN  
42088/java  
tcp6   0  0 IP_2:9042   :::*LISTEN  
49349/java  
tcp6   0  0 127.0.0.1:8005  :::*LISTEN  
42088/java

Jonathan, thank you for reassuring me that I didn't misunderstand seeds 
completely. ;-)




Any ideas?



Regards

Raimund




2016-10-30 18:48 GMT+00:00 Jonathan Haddad j...@jonhaddad.com:






I always prefer to set the listen interface instead of listen adress



Both nodes can be seeds. In fact, there should be more than one seed. Having 
your first 2 nodes as seeds is usual the correct thing to do. 

On Sun, Oct 30, 2016 at 8:28 AM Vladimir Yudovin vla...@winguzone.com 
wrote:



Empty listen_address and rpc_address.

What do you mean by "Empty"? You should set either ***_address or 
***_interface. Otherwise 

Cassandra will not listen on port 9042.





Open ports 9042, 7000 and 7001 for external communication.



Only port 9042 should be open to the world, Port 7000 for internode 
communication, and 7001 for internode SSL communication (only one of them is 
used).





What is the best order of steps



Order doesn't really matter.





Define both machines as seeds.



It's wrong. Only one (started first) should be seed.





nodetool sees both of them

cqlsh refuses to connect

Can you please give output of

nodetool status

and

netstat -lptn | grep java



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Sun, 30 Oct 2016 14:11:55 -0400Raimund Klein 
chessra...@gmail.com wrote 







Hi everyone,

 

We've managed to set up a Cassandra 2.2.6 cluster of two physical nodes 
(nodetool sees both of them, so I'm quite certain the cluster is indeed 
active). My steps to create the cluster were (this applies to both machines):



 - Empty listen_address and rpc_address.

 - Define a cluster_name.

 - Define both machines as seeds.

 - Open ports 9042, 7000 and 7001 for external communication.



 



Now I want to secure access to the cluster in all forms:



 - define a different database user with a new password

 - encrypt communication bet ween clients and the cluster including client 
verification

 - encrypt communication between the nodes including verification



What is the best order of steps and correct way to achieve this? I wanted to 
start with defining a different user, but cqlsh refuses to connect after 
enforcing user/password authentication:



cqlsh -u cassandra -p cassandra

Connection error: ('Unable to 

Re: Securing a Cassandra 2.2.6 Cluster

2016-10-31 Thread Vladimir Yudovin
Both nodes can be seeds.

Probably I misunderstood Raimund as setting each node as the only seed. If he 
set both IP on both nodes it's OK.



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Sun, 30 Oct 2016 14:48:00 -0400Jonathan Haddad 
j...@jonhaddad.com wrote 




I always prefer to set the listen interface instead of listen adress



Both nodes can be seeds. In fact, there should be more than one seed. Having 
your first 2 nodes as seeds is usual the correct thing to do. 

On Sun, Oct 30, 2016 at 8:28 AM Vladimir Yudovin vla...@winguzone.com 
wrote:







Empty listen_address and rpc_address.

What do you mean by "Empty"? You should set either ***_address or 
***_interface. Otherwise 

Cassandra will not listen on port 9042.





Open ports 9042, 7000 and 7001 for external communication.



Only port 9042 should be open to the world, Port 7000 for internode 
communication, and 7001 for internode SSL communication (only one of them is 
used).





What is the best order of steps



Order doesn't really matter.





Define both machines as seeds.



It's wrong. Only one (started first) should be seed.





nodetool sees both of them

cqlsh refuses to connect

Can you please give output of

nodetool status

and

netstat -lptn | grep java



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Sun, 30 Oct 2016 14:11:55 -0400Raimund Klein 
chessra...@gmail.com wrote 







Hi everyone,

 

We've managed to set up a Cassandra 2.2.6 cluster of two physical nodes 
(nodetool sees both of them, so I'm quite certain the cluster is indeed 
active). My steps to create the cluster were (this applies to both machines):



 - Empty listen_address and rpc_address.

 - Define a cluster_name.

 - Define both machines as seeds.

 - Open ports 9042, 7000 and 7001 for external communication.



 



Now I want to secure access to the cluster in all forms:



 - define a different database user with a new password

 - encrypt communication bet ween clients and the cluster including client 
verification

 - encrypt communication between the nodes including verification



What is the best order of steps and correct way to achieve this? I wanted to 
start with defining a different user, but cqlsh refuses to connect after 
enforcing user/password authentication:



cqlsh -u cassandra -p cassandra

Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, 
"Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})



 



This happens when I run the command on either of the two machines. Any help 
would be greatly appreciated.














Re: Incremental repairs leading to unrepaired data

2016-10-31 Thread Stefano Ortolani
I've collected some more data-points, and I still see dropped
mutations with compaction_throughput_mb_per_sec set to 8.
The only notable thing regarding the current setup is that I have
another keyspace (not being repaired though) with really wide rows
(100MB per partition), but that shouldn't have any impact in theory.
Nodes do not seem that overloaded either and don't see any GC spikes
while those mutations are dropped :/

Hitting a dead end here, any further idea where to look for further ideas?

Regards,
Stefano

On Wed, Aug 10, 2016 at 12:41 PM, Stefano Ortolani  wrote:
> That's what I was thinking. Maybe GC pressure?
> Some more details: during anticompaction I have some CFs exploding to 1K
> SStables (to be back to ~200 upon completion).
> HW specs should be quite good (12 cores/32 GB ram) but, I admit, still
> relying on spinning disks, with ~150GB per node.
> Current version is 3.0.8.
>
>
> On Wed, Aug 10, 2016 at 12:36 PM, Paulo Motta 
> wrote:
>>
>> That's pretty low already, but perhaps you should lower to see if it will
>> improve the dropped mutations during anti-compaction (even if it increases
>> repair time), otherwise the problem might be somewhere else. Generally
>> dropped mutations is a signal of cluster overload, so if there's nothing
>> else wrong perhaps you need to increase your capacity. What version are you
>> in?
>>
>> 2016-08-10 8:21 GMT-03:00 Stefano Ortolani :
>>>
>>> Not yet. Right now I have it set at 16.
>>> Would halving it more or less double the repair time?
>>>
>>> On Tue, Aug 9, 2016 at 7:58 PM, Paulo Motta 
>>> wrote:

 Anticompaction throttling can be done by setting the usual
 compaction_throughput_mb_per_sec knob on cassandra.yaml or via nodetool
 setcompactionthroughput. Did you try lowering that  and checking if that
 improves the dropped mutations?

 2016-08-09 13:32 GMT-03:00 Stefano Ortolani :
>
> Hi all,
>
> I am running incremental repaird on a weekly basis (can't do it every
> day as one single run takes 36 hours), and every time, I have at least one
> node dropping mutations as part of the process (this almost always during
> the anticompaction phase). Ironically this leads to a system where 
> repairing
> makes data consistent at the cost of making some other data not 
> consistent.
>
> Does anybody know why this is happening?
>
> My feeling is that this might be caused by anticompacting column
> families with really wide rows and with many SStables. If that is the 
> case,
> any way I can throttle that?
>
> Thanks!
> Stefano


>>>
>>
>


given partition key and secondary index, still require allow_filtering?

2016-10-31 Thread Zao Liu
Hi,

I created a table, schema like here:

CREATE TABLE profile_new.user_categories_1477899735 (

id bigint,

category int,

score double,

PRIMARY KEY (id, category)

) WITH CLUSTERING ORDER BY (category ASC)

AND bloom_filter_fp_chance = 0.01

AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}

AND comment = ''

AND compaction = {'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '4'}

AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}

AND crc_check_chance = 1.0

AND dclocal_read_repair_chance = 0.1

AND default_time_to_live = 0

AND gc_grace_seconds = 864000

AND max_index_interval = 2048

AND memtable_flush_period_in_ms = 0

AND min_index_interval = 128

AND read_repair_chance = 0.0

AND speculative_retry = '99PERCENTILE';

CREATE INDEX user_categories_1477899735_score_idx ON
profile_new.user_categories_1477899735 (score);


cqlsh:profile_new> select * from user_categories_1477899735 where id=3674;


But somehow when I pass partition key and secondary index key, it still
complains:

cqlsh:profile_new> select * from user_categories_1477899735 where id=3674
and score > 0.5;

*InvalidRequest: Error from server: code=2200 [Invalid query]
message="Cannot execute this query as it might involve data filtering and
thus may have unpredictable performance. If you want to execute this query
despite the performance unpredictability, use ALLOW FILTERING"*

cqlsh:profile_new>