Re: Problems with adding datacenter and schema version disagreement

2014-03-13 Thread olek.stas...@gmail.com
Bump, are there any solutions to bring my cluster back to schema consistency?
I've 6 node cluster with exactly six versions of schema, how to deal with it?
regards
Aleksander

2014-03-11 14:36 GMT+01:00 olek.stas...@gmail.com olek.stas...@gmail.com:
 Didn't help :)
 thanks and regards
 Aleksander

 2014-03-11 14:14 GMT+01:00 Duncan Sands duncan.sa...@gmail.com:
 On 11/03/14 14:00, olek.stas...@gmail.com wrote:

 I plan to install 2.0.6 as soon as it will be available in datastax rpm
 repo.
 But how to deal with schema inconsistency on such scale?


 Does it get better if you restart all the nodes?  In my case restarting just
 some of the nodes didn't help, but restarting all nodes did.

 Ciao, Duncan.


select query returns wrong value if use DESC option

2014-03-13 Thread Katsutoshi Nagaoka
Hi.

I am using Cassandra 2.0.6 version. There is a case that select query
returns wrong value if use DESC option. My test procedure is as follows:

--
cqlsh:test CREATE TABLE mytable (key int, range int, PRIMARY KEY (key,
range));
cqlsh:test INSERT INTO mytable (key, range) VALUES (0, 0);
cqlsh:test SELECT * FROM mytable WHERE key = 0 AND range = 0;

 key | range
-+---
   0 | 0

(1 rows)

cqlsh:test SELECT * FROM mytable WHERE key = 0 AND range = 0 ORDER BY
range ASC;

 key | range
-+---
   0 | 0

(1 rows)

cqlsh:test SELECT * FROM mytable WHERE key = 0 AND range = 0 ORDER BY
range DESC;

(0 rows)
--

Why returns value is 0 rows if using DESC option? I expected the same 1 row
as the return value of other queries. Does anyone has a similar issue?

Thanks,
Katsutoshi


CQL Select Map using an IN relationship

2014-03-13 Thread David Savage
Hi there,

I'm experimenting using cassandra and have run across an error message
which I need a little more information on.

The use case I'm experimenting with is a series of document updates
(documents being an arbitrary map of key value pairs), I would like to find
the latest document updates after a specified time period. I don't want to
store many copies of the documents (one per update) as the updates are
often only to single keys in the map so that would involve a lot of
duplicated data.

The solution I've found that seems to fit best in terms of performance is
to have two tables.

One that has an event log of timeuuid - docid and a second that stores the
documents themselves stored by docid - mapstring, string. I then run two
queries, one to select ids that have changed after a certain time:

SELECT id FROM eventlog WHERE timestamp=minTimeuuid($minimumTime)

and then a second to select the actual documents themselves

SELECT id, data FROM documents WHERE id IN (0, 1, 2, 3, 4, 5, 6, 7...)

However this then explodes on query with the error message:

Cannot restrict PRIMARY KEY part id by IN relation as a collection is
selected by the query

Detective work lead me to these lines in
org.apache.cassandra.cql3.statementsSelectStatement:

// We only support IN for the last name and for compact
storage so far
// TODO: #3885 allows us to extend to non compact as
well, but that remains to be done
if (i != stmt.columnRestrictions.length - 1)
throw new
InvalidRequestException(String.format(PRIMARY KEY part %s cannot be
restricted by IN relation, cname));
else if (stmt.selectACollection())
throw new
InvalidRequestException(String.format(Cannot restrict PRIMARY KEY part %s
by IN relation as a collection is selected by the query, cname));

It seems like #3885 will allow support for the first IF block above, but I
don't think it will allow the second, am I correct?

Any pointers on how I can work around this would be greatly appreciated.

Kind regards,

Dave


Re: CQL Select Map using an IN relationship

2014-03-13 Thread Peter Lin
it's not clear to me if your id column is the KEY or just a regular
column with secondary index.

queries that have IN on non primary key columns isn't supported yet. not
sure if that answers your question.


On Thu, Mar 13, 2014 at 7:12 AM, David Savage davemssav...@gmail.comwrote:

 Hi there,

 I'm experimenting using cassandra and have run across an error message
 which I need a little more information on.

 The use case I'm experimenting with is a series of document updates
 (documents being an arbitrary map of key value pairs), I would like to find
 the latest document updates after a specified time period. I don't want to
 store many copies of the documents (one per update) as the updates are
 often only to single keys in the map so that would involve a lot of
 duplicated data.

 The solution I've found that seems to fit best in terms of performance is
 to have two tables.

 One that has an event log of timeuuid - docid and a second that stores
 the documents themselves stored by docid - mapstring, string. I then run
 two queries, one to select ids that have changed after a certain time:

 SELECT id FROM eventlog WHERE timestamp=minTimeuuid($minimumTime)

 and then a second to select the actual documents themselves

 SELECT id, data FROM documents WHERE id IN (0, 1, 2, 3, 4, 5, 6, 7...)

 However this then explodes on query with the error message:

 Cannot restrict PRIMARY KEY part id by IN relation as a collection is
 selected by the query

 Detective work lead me to these lines in
 org.apache.cassandra.cql3.statementsSelectStatement:

 // We only support IN for the last name and for
 compact storage so far
 // TODO: #3885 allows us to extend to non compact as
 well, but that remains to be done
 if (i != stmt.columnRestrictions.length - 1)
 throw new
 InvalidRequestException(String.format(PRIMARY KEY part %s cannot be
 restricted by IN relation, cname));
 else if (stmt.selectACollection())
 throw new
 InvalidRequestException(String.format(Cannot restrict PRIMARY KEY part %s
 by IN relation as a collection is selected by the query, cname));

 It seems like #3885 will allow support for the first IF block above, but I
 don't think it will allow the second, am I correct?

 Any pointers on how I can work around this would be greatly appreciated.

 Kind regards,

 Dave



Re: select query returns wrong value if use DESC option

2014-03-13 Thread Edward Capriolo
Consider filing a jira. Cql is the standard interface to cassandra
everything is heavily tested.
On Thursday, March 13, 2014, Katsutoshi Nagaoka nagapad.0...@gmail.com
wrote:
 Hi.

 I am using Cassandra 2.0.6 version. There is a case that select query
returns wrong value if use DESC option. My test procedure is as follows:

 --
 cqlsh:test CREATE TABLE mytable (key int, range int, PRIMARY KEY (key,
range));
 cqlsh:test INSERT INTO mytable (key, range) VALUES (0, 0);
 cqlsh:test SELECT * FROM mytable WHERE key = 0 AND range = 0;

  key | range
 -+---
0 | 0

 (1 rows)

 cqlsh:test SELECT * FROM mytable WHERE key = 0 AND range = 0 ORDER BY
range ASC;

  key | range
 -+---
0 | 0

 (1 rows)

 cqlsh:test SELECT * FROM mytable WHERE key = 0 AND range = 0 ORDER BY
range DESC;

 (0 rows)
 --

 Why returns value is 0 rows if using DESC option? I expected the same 1
row as the return value of other queries. Does anyone has a similar issue?

 Thanks,
 Katsutoshi

-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.


Re: CQL Select Map using an IN relationship

2014-03-13 Thread David Savage
Hi Peter,

Thanks for the help, unfortunately I'm not sure that's the problem, the id
is the primary key on the documents table and the timestamp is the primary
key on the eventlog table

Kind regards,


Dave

On Thursday, 13 March 2014, Peter Lin wool...@gmail.com wrote:


 it's not clear to me if your id column is the KEY or just a regular
 column with secondary index.

 queries that have IN on non primary key columns isn't supported yet. not
 sure if that answers your question.


 On Thu, Mar 13, 2014 at 7:12 AM, David Savage davemssav...@gmail.comwrote:

 Hi there,

 I'm experimenting using cassandra and have run across an error message
 which I need a little more information on.

 The use case I'm experimenting with is a series of document updates
 (documents being an arbitrary map of key value pairs), I would like to find
 the latest document updates after a specified time period. I don't want to
 store many copies of the documents (one per update) as the updates are
 often only to single keys in the map so that would involve a lot of
 duplicated data.

 The solution I've found that seems to fit best in terms of performance is
 to have two tables.

 One that has an event log of timeuuid - docid and a second that stores
 the documents themselves stored by docid - mapstring, string. I then run
 two queries, one to select ids that have changed after a certain time:

 SELECT id FROM eventlog WHERE timestamp=minTimeuuid($minimumTime)

 and then a second to select the actual documents themselves

 SELECT id, data FROM documents WHERE id IN (0, 1, 2, 3, 4, 5, 6, 7...)

 However this then explodes on query with the error message:

 Cannot restrict PRIMARY KEY part id by IN relation as a collection is
 selected by the query

 Detective work lead me to these lines in
 org.apache.cassandra.cql3.statementsSelectStatement:

 // We only support IN for the last name and for
 compact storage so far
 // TODO: #3885 allows us to extend to non compact as
 well, but that remains to be done
 if (i != stmt.columnRestrictions.length - 1)
 throw new
 InvalidRequestException(String.format(PRIMARY KEY part %s cannot be
 restricted by IN relation, cname));
 else if (stmt.selectACollection())
 throw new
 InvalidRequestException(String.format(Cannot restrict PRIMARY KEY part %s
 by IN relation as a collection is selected by the query, cname));

 It seems like #3885 will allow support for the first IF block above, but
 I don't think it will allow the second, am I correct?

 Any pointers on how I can work around this would be greatly appreciated.

 Kind regards,

 Dave





Re: Opscenter help?

2014-03-13 Thread Drew from Zhrodague

On 3/13/14, 12:14 AM, Jack Krupansky wrote:

Please do use Stack Overflow - that is the appropriate forum for
OpsCenter support (unless you are a DataStax customer). Use the
OpsCenter tag:

http://stackoverflow.com/tags/opscenter/info


	Unfortunately, as a new user, I cannot use the opscenter tag. I don't 
have a good enough reputation yet.


Thanks for the pointer anyway.


--

Drew from Zhrodague
post-apocalyptic ad-hoc industrialist
d...@zhrodague.net


Re:

2014-03-13 Thread Batranut Bogdan
Thanks ,

Edward and David,
Your contribution lead me to the conclusion. Unknown to me, the partition key 
had 1 value. So of course all info was stored on a single node. Having 
replication factor 3 lead to having 99 % CPU on 3 machines. 
Regarding RAM we have so much CPU power that I actualy run a tomcat server on 
each node and tomcat prepares data for queries, sends them, retrieves results, 
processes them and sends responses to clients. We are running 6 Intel® Xeon® 
E3-1270 v3    Quad-Core Haswell    Hyper-Threading. So there is CPU power there 
to handle high java heap IMO.

Thank you all for your prompt responses.



On Thursday, March 13, 2014 1:38 AM, David McNelis dmcne...@gmail.com wrote:
 
Not knowing anything about your data structure (to expand on what Edward said), 
you could be running into something where you've got some hot keys that are 
getting the majority of writes during those heavily loads more specifically 
I might look for a single key that you're writing, since you're RF=3 and you 
have 3 nodes specifically that are causing problems.




On Wed, Mar 12, 2014 at 7:28 PM, Russ Bradberry rbradbe...@gmail.com wrote:

I wouldn't go above 8G unless you have a very powerful machine that can keep 
the GC pauses low.

Sent from my iPhone

On Mar 12, 2014, at 7:11 PM, Edward Capriolo edlinuxg...@gmail.com wrote:


That is too much ram for cassandra make that 6g to 10g. 

The uneven perf could be because your requests do not shard evenly.

On Wednesday, March 12, 2014, Batranut Bogdan batra...@yahoo.com wrote:
 Hello all,

 The environment:

 I have a 6 node Cassandra cluster. On each node I have:
 - 32 G RAM
 - 24 G RAM for cassa
 - ~150 - 200 MB/s disk speed
 - tomcat 6 with axis2 webservice that uses the datastax java driver to make
 asynch reads / writes 
 - replication factor for the keyspace is 3

 All nodes in the same data center 
 The clients that read / write are in the same datacenter so network is
 Gigabit.

 Writes are performed via exposed methods from Axis2 WS . The Cassandra Java
 driver uses the round robin load balancing policy so all the nodes in the
 cluster should be hit with write requests under heavy write or read load
 from multiple clients.

 I am monitoring all nodes with JConsole from another box.

 The problem:

 When wrinting to a particular column family, only 3 nodes have high CPU load
 ~ 80 - 99 %. The remaining 3 are at ~2 - 10 % CPU. During writes, reads
 timeout. 

 I need more speed for both writes of reads. Due to the fact that 3 nodes
 barely have CPU activity leads me to think that the whole potential for C*
 is not touched.

 I am running out of ideas...

 If further details about the environment I can provide them.


 Thank you very much.

-- 
Sorry this was sent from mobile. Will do less grammar and spell check than 
usual.


Re: Opscenter help?

2014-03-13 Thread Jack Krupansky
You don't need any reputation points to ask a new question with an existing 
tag - just type opscenter in the Tags box under the question. Otherwise, 
how would any new user ever be able to ask a question and have it tagged?!


-- Jack Krupansky

-Original Message- 
From: Drew from Zhrodague

Sent: Thursday, March 13, 2014 9:29 AM
To: user@cassandra.apache.org
Subject: Re: Opscenter help?

On 3/13/14, 12:14 AM, Jack Krupansky wrote:

Please do use Stack Overflow - that is the appropriate forum for
OpsCenter support (unless you are a DataStax customer). Use the
OpsCenter tag:

http://stackoverflow.com/tags/opscenter/info


Unfortunately, as a new user, I cannot use the opscenter tag. I don't
have a good enough reputation yet.

Thanks for the pointer anyway.


--

Drew from Zhrodague
post-apocalyptic ad-hoc industrialist
d...@zhrodague.net 



Re: CQL Select Map using an IN relationship

2014-03-13 Thread Laing, Michael
I have no problem doing this w 2.0.5 - what version of C* are you using? Or
maybe I don't understand your data model... attach 'creates' if you don't
mind.

ml


On Thu, Mar 13, 2014 at 9:24 AM, David Savage davemssav...@gmail.comwrote:

 Hi Peter,

 Thanks for the help, unfortunately I'm not sure that's the problem, the id
 is the primary key on the documents table and the timestamp is the
 primary key on the eventlog table

 Kind regards,


 Dave

 On Thursday, 13 March 2014, Peter Lin wool...@gmail.com wrote:


 it's not clear to me if your id column is the KEY or just a regular
 column with secondary index.

 queries that have IN on non primary key columns isn't supported yet. not
 sure if that answers your question.


 On Thu, Mar 13, 2014 at 7:12 AM, David Savage davemssav...@gmail.comwrote:

 Hi there,

 I'm experimenting using cassandra and have run across an error message
 which I need a little more information on.

 The use case I'm experimenting with is a series of document updates
 (documents being an arbitrary map of key value pairs), I would like to find
 the latest document updates after a specified time period. I don't want to
 store many copies of the documents (one per update) as the updates are
 often only to single keys in the map so that would involve a lot of
 duplicated data.

 The solution I've found that seems to fit best in terms of performance
 is to have two tables.

 One that has an event log of timeuuid - docid and a second that stores
 the documents themselves stored by docid - mapstring, string. I then run
 two queries, one to select ids that have changed after a certain time:

 SELECT id FROM eventlog WHERE timestamp=minTimeuuid($minimumTime)

 and then a second to select the actual documents themselves

 SELECT id, data FROM documents WHERE id IN (0, 1, 2, 3, 4, 5, 6, 7…)

 However this then explodes on query with the error message:

 Cannot restrict PRIMARY KEY part id by IN relation as a collection is
 selected by the query

 Detective work lead me to these lines in
 org.apache.cassandra.cql3.statementsSelectStatement:

 // We only support IN for the last name and for
 compact storage so far
 // TODO: #3885 allows us to extend to non compact as
 well, but that remains to be done
 if (i != stmt.columnRestrictions.length - 1)
 throw new
 InvalidRequestException(String.format(PRIMARY KEY part %s cannot be
 restricted by IN relation, cname));
 else if (stmt.selectACollection())
 throw new
 InvalidRequestException(String.format(Cannot restrict PRIMARY KEY part %s
 by IN relation as a collection is selected by the query, cname));

 It seems like #3885 will allow support for the first IF block above, but
 I don't think it will allow the second, am I correct?

 Any pointers on how I can work around this would be greatly appreciated.

 Kind regards,

 Dave





Re: CQL Select Map using an IN relationship

2014-03-13 Thread David Savage
Hmmm that maybe the problem, I'm currently testing with 2.0.2 which got
dragged in by the cassandra unit library I'm using for testing [1] I will
try to fix my build dependencies and retry, thx.

/Dave

[1] https://github.com/jsevellec/cassandra-unit


On Thu, Mar 13, 2014 at 1:56 PM, Laing, Michael
michael.la...@nytimes.comwrote:

 I have no problem doing this w 2.0.5 - what version of C* are you using?
 Or maybe I don't understand your data model... attach 'creates' if you
 don't mind.

 ml


 On Thu, Mar 13, 2014 at 9:24 AM, David Savage davemssav...@gmail.comwrote:

 Hi Peter,

 Thanks for the help, unfortunately I'm not sure that's the problem, the
 id is the primary key on the documents table and the timestamp is the
 primary key on the eventlog table

 Kind regards,


 Dave

 On Thursday, 13 March 2014, Peter Lin wool...@gmail.com wrote:


 it's not clear to me if your id column is the KEY or just a regular
 column with secondary index.

 queries that have IN on non primary key columns isn't supported yet. not
 sure if that answers your question.


 On Thu, Mar 13, 2014 at 7:12 AM, David Savage davemssav...@gmail.comwrote:

 Hi there,

 I'm experimenting using cassandra and have run across an error message
 which I need a little more information on.

 The use case I'm experimenting with is a series of document updates
 (documents being an arbitrary map of key value pairs), I would like to find
 the latest document updates after a specified time period. I don't want to
 store many copies of the documents (one per update) as the updates are
 often only to single keys in the map so that would involve a lot of
 duplicated data.

 The solution I've found that seems to fit best in terms of performance
 is to have two tables.

 One that has an event log of timeuuid - docid and a second that stores
 the documents themselves stored by docid - mapstring, string. I then run
 two queries, one to select ids that have changed after a certain time:

 SELECT id FROM eventlog WHERE timestamp=minTimeuuid($minimumTime)

 and then a second to select the actual documents themselves

 SELECT id, data FROM documents WHERE id IN (0, 1, 2, 3, 4, 5, 6, 7...)

 However this then explodes on query with the error message:

 Cannot restrict PRIMARY KEY part id by IN relation as a collection is
 selected by the query

 Detective work lead me to these lines in
 org.apache.cassandra.cql3.statementsSelectStatement:

 // We only support IN for the last name and for
 compact storage so far
 // TODO: #3885 allows us to extend to non compact
 as well, but that remains to be done
 if (i != stmt.columnRestrictions.length - 1)
 throw new
 InvalidRequestException(String.format(PRIMARY KEY part %s cannot be
 restricted by IN relation, cname));
 else if (stmt.selectACollection())
 throw new
 InvalidRequestException(String.format(Cannot restrict PRIMARY KEY part %s
 by IN relation as a collection is selected by the query, cname));

 It seems like #3885 will allow support for the first IF block above,
 but I don't think it will allow the second, am I correct?

 Any pointers on how I can work around this would be greatly appreciated.

 Kind regards,

 Dave






Re: Opscenter help?

2014-03-13 Thread Drew from Zhrodague

On 3/13/14, 9:49 AM, Jack Krupansky wrote:

You don't need any reputation points to ask a new question with an
existing tag - just type opscenter in the Tags box under the question.
Otherwise, how would any new user ever be able to ask a question and
have it tagged?!


	I dunno, I don't use SE often - it draws a red box and says I need 300 
points to be able to type 'opscenter' in the tags box.



--

Drew from Zhrodague
post-apocalyptic ad-hoc industrialist
d...@zhrodague.net


Re: Opscenter help?

2014-03-13 Thread Nick Bailey
I'm happy to help here as well :)

Can you give some more information? Specifically:

What exact versions of EL5 and EL6 have you tried?
What version of OpsCenter are you using?
What file/dependency is rpm/yum saying conflicts with sudo?

Also, you can find the OpsCenter documentation here
http://www.datastax.com/documentation/opscenter/4.1/index.html, although
this isn't an issue I've seen before.

-Nick


On Wed, Mar 12, 2014 at 1:51 PM, Drew from Zhrodague 
drewzhroda...@zhrodague.net wrote:

 I am having a hard time installing the Datastax Opscenter agents
 on EL6 and EL5 hosts. Where is an appropriate place to ask for help?
 Datastax has move their forums to Stack Exchange, which seems to be a waste
 of time, as I don't have enough reputation points to properly tag my
 questions.

 The agent installation seems to be broken:
 [] agent rpm conflicts with sudo
 [] install from opscenter does not work, even if manually
 installing the rpm (requres --force, conflicts with sudo)
 [] error message re: log4j #noconf
 [] Could not find the main class: opsagent.opsagent. Program will
 exit.
 [] No other (helpful/more in-depth) documentation exists


 --

 Drew from Zhrodague
 post-apocalyptic ad-hoc industrialist
 d...@zhrodague.net



Re: CQL Select Map using an IN relationship

2014-03-13 Thread Laing, Michael
Create your table like this and it will work:

CREATE TABLE test.documents (group text,id bigint,data
maptext,text,PRIMARY KEY ((group, id)));

The extra parens catenate 'group' and 'id' into the partition key - IN will
work on the last component of a partition key.

ml


On Thu, Mar 13, 2014 at 10:40 AM, David Savage davemssav...@gmail.comwrote:

 Nope, upgraded to 2.0.5 and still get the same problem, I actually
 simplified the problem a little in my first post, there's a composite
 primary key involved as I need to partition ids into groups

 So the full CQL statements are:

 CREATE KEYSPACE test WITH replication = {'class':'SimpleStrategy',
 'replication_factor':3};


 CREATE TABLE test.documents (group text,id bigint,data
 maptext,text,PRIMARY KEY (group, id));


 INSERT INTO test.documents(id,group,data) VALUES (0,'test',{'count':'0'});

 INSERT INTO test.documents(id,group,data) VALUES (1,'test',{'count':'1'});

 INSERT INTO test.documents(id,group,data) VALUES (2,'test',{'count':'2'});


 SELECT id,data FROM test.documents WHERE group='test' AND id IN (0,1,2);


 Thanks for your help.


 Kind regards,


 /Dave


 On Thu, Mar 13, 2014 at 2:00 PM, David Savage davemssav...@gmail.comwrote:

 Hmmm that maybe the problem, I'm currently testing with 2.0.2 which got
 dragged in by the cassandra unit library I'm using for testing [1] I will
 try to fix my build dependencies and retry, thx.

 /Dave

 [1] https://github.com/jsevellec/cassandra-unit


 On Thu, Mar 13, 2014 at 1:56 PM, Laing, Michael 
 michael.la...@nytimes.com wrote:

 I have no problem doing this w 2.0.5 - what version of C* are you using?
 Or maybe I don't understand your data model... attach 'creates' if you
 don't mind.

 ml


 On Thu, Mar 13, 2014 at 9:24 AM, David Savage davemssav...@gmail.comwrote:

 Hi Peter,

 Thanks for the help, unfortunately I'm not sure that's the problem, the
 id is the primary key on the documents table and the timestamp is the
 primary key on the eventlog table

 Kind regards,


 Dave

 On Thursday, 13 March 2014, Peter Lin wool...@gmail.com wrote:


 it's not clear to me if your id column is the KEY or just a regular
 column with secondary index.

 queries that have IN on non primary key columns isn't supported yet.
 not sure if that answers your question.


 On Thu, Mar 13, 2014 at 7:12 AM, David Savage 
 davemssav...@gmail.comwrote:

 Hi there,

 I'm experimenting using cassandra and have run across an error
 message which I need a little more information on.

 The use case I'm experimenting with is a series of document updates
 (documents being an arbitrary map of key value pairs), I would like to 
 find
 the latest document updates after a specified time period. I don't want 
 to
 store many copies of the documents (one per update) as the updates are
 often only to single keys in the map so that would involve a lot of
 duplicated data.

 The solution I've found that seems to fit best in terms of
 performance is to have two tables.

 One that has an event log of timeuuid - docid and a second that
 stores the documents themselves stored by docid - mapstring, string. I
 then run two queries, one to select ids that have changed after a certain
 time:

 SELECT id FROM eventlog WHERE timestamp=minTimeuuid($minimumTime)

 and then a second to select the actual documents themselves

 SELECT id, data FROM documents WHERE id IN (0, 1, 2, 3, 4, 5, 6, 7…)

 However this then explodes on query with the error message:

 Cannot restrict PRIMARY KEY part id by IN relation as a collection
 is selected by the query

 Detective work lead me to these lines in
 org.apache.cassandra.cql3.statementsSelectStatement:

 // We only support IN for the last name and for
 compact storage so far
 // TODO: #3885 allows us to extend to non compact
 as well, but that remains to be done
 if (i != stmt.columnRestrictions.length - 1)
 throw new
 InvalidRequestException(String.format(PRIMARY KEY part %s cannot be
 restricted by IN relation, cname));
 else if (stmt.selectACollection())
 throw new
 InvalidRequestException(String.format(Cannot restrict PRIMARY KEY part 
 %s
 by IN relation as a collection is selected by the query, cname));

 It seems like #3885 will allow support for the first IF block above,
 but I don't think it will allow the second, am I correct?

 Any pointers on how I can work around this would be greatly
 appreciated.

 Kind regards,

 Dave








Re: Opscenter help?

2014-03-13 Thread Rahul Menon
I have seen the conflicts with sudo error but that was with 3.X rpm on the
amazon ami, i was how ever able to install it from the tar ball. As Nick
has pointed out, the versions of OS and Opscenter will help in looking at
this.

Thanks
Rahul


On Thu, Mar 13, 2014 at 7:56 PM, Nick Bailey n...@datastax.com wrote:

 I'm happy to help here as well :)

 Can you give some more information? Specifically:

 What exact versions of EL5 and EL6 have you tried?
 What version of OpsCenter are you using?
 What file/dependency is rpm/yum saying conflicts with sudo?

 Also, you can find the OpsCenter documentation here
 http://www.datastax.com/documentation/opscenter/4.1/index.html, although
 this isn't an issue I've seen before.

 -Nick


 On Wed, Mar 12, 2014 at 1:51 PM, Drew from Zhrodague 
 drewzhroda...@zhrodague.net wrote:

 I am having a hard time installing the Datastax Opscenter agents
 on EL6 and EL5 hosts. Where is an appropriate place to ask for help?
 Datastax has move their forums to Stack Exchange, which seems to be a waste
 of time, as I don't have enough reputation points to properly tag my
 questions.

 The agent installation seems to be broken:
 [] agent rpm conflicts with sudo
 [] install from opscenter does not work, even if manually
 installing the rpm (requres --force, conflicts with sudo)
 [] error message re: log4j #noconf
 [] Could not find the main class: opsagent.opsagent. Program will
 exit.
 [] No other (helpful/more in-depth) documentation exists


 --

 Drew from Zhrodague
 post-apocalyptic ad-hoc industrialist
 d...@zhrodague.net





Re: CQL Select Map using an IN relationship

2014-03-13 Thread Peter Lin
probably a good idea to open a jira ticket to explain this better in the
docs. the downside of moving so fast is the docs often fall behind and
users have to dig around to figure things out. not everyone wants to read
the CQL3 antlr grammar to figure things out.


On Thu, Mar 13, 2014 at 11:27 AM, David Savage davemssav...@gmail.comwrote:

 Great that works, thx! I probably would have never found that...

 It now makes me wonder in general when to use PRIMARY KEY (key1, key2) or
 PRIMARY KEY ((key1, key2)), any examples would be welcome if you have the
 time.

 Kind regards,

 Dave


 On Thu, Mar 13, 2014 at 2:56 PM, Laing, Michael michael.la...@nytimes.com
  wrote:

 Create your table like this and it will work:

 CREATE TABLE test.documents (group text,id bigint,data
 maptext,text,PRIMARY KEY ((group, id)));

 The extra parens catenate 'group' and 'id' into the partition key - IN
 will work on the last component of a partition key.

 ml


 On Thu, Mar 13, 2014 at 10:40 AM, David Savage davemssav...@gmail.comwrote:

 Nope, upgraded to 2.0.5 and still get the same problem, I actually
 simplified the problem a little in my first post, there's a composite
 primary key involved as I need to partition ids into groups

 So the full CQL statements are:

 CREATE KEYSPACE test WITH replication = {'class':'SimpleStrategy',
 'replication_factor':3};


 CREATE TABLE test.documents (group text,id bigint,data
 maptext,text,PRIMARY KEY (group, id));


 INSERT INTO test.documents(id,group,data) VALUES
 (0,'test',{'count':'0'});

 INSERT INTO test.documents(id,group,data) VALUES
 (1,'test',{'count':'1'});

 INSERT INTO test.documents(id,group,data) VALUES
 (2,'test',{'count':'2'});


 SELECT id,data FROM test.documents WHERE group='test' AND id IN (0,1,2);


 Thanks for your help.


 Kind regards,


 /Dave


 On Thu, Mar 13, 2014 at 2:00 PM, David Savage davemssav...@gmail.comwrote:

 Hmmm that maybe the problem, I'm currently testing with 2.0.2 which got
 dragged in by the cassandra unit library I'm using for testing [1] I will
 try to fix my build dependencies and retry, thx.

 /Dave

 [1] https://github.com/jsevellec/cassandra-unit


 On Thu, Mar 13, 2014 at 1:56 PM, Laing, Michael 
 michael.la...@nytimes.com wrote:

 I have no problem doing this w 2.0.5 - what version of C* are you
 using? Or maybe I don't understand your data model... attach 'creates' if
 you don't mind.

 ml


 On Thu, Mar 13, 2014 at 9:24 AM, David Savage 
 davemssav...@gmail.comwrote:

 Hi Peter,

 Thanks for the help, unfortunately I'm not sure that's the problem,
 the id is the primary key on the documents table and the timestamp
 is the primary key on the eventlog table

 Kind regards,


 Dave

 On Thursday, 13 March 2014, Peter Lin wool...@gmail.com wrote:


 it's not clear to me if your id column is the KEY or just a
 regular column with secondary index.

 queries that have IN on non primary key columns isn't supported yet.
 not sure if that answers your question.


 On Thu, Mar 13, 2014 at 7:12 AM, David Savage 
 davemssav...@gmail.com wrote:

 Hi there,

 I'm experimenting using cassandra and have run across an error
 message which I need a little more information on.

 The use case I'm experimenting with is a series of document updates
 (documents being an arbitrary map of key value pairs), I would like to 
 find
 the latest document updates after a specified time period. I don't 
 want to
 store many copies of the documents (one per update) as the updates are
 often only to single keys in the map so that would involve a lot of
 duplicated data.

 The solution I've found that seems to fit best in terms of
 performance is to have two tables.

 One that has an event log of timeuuid - docid and a second that
 stores the documents themselves stored by docid - mapstring, 
 string. I
 then run two queries, one to select ids that have changed after a 
 certain
 time:

 SELECT id FROM eventlog WHERE timestamp=minTimeuuid($minimumTime)

 and then a second to select the actual documents themselves

 SELECT id, data FROM documents WHERE id IN (0, 1, 2, 3, 4, 5, 6, 7...)

 However this then explodes on query with the error message:

 Cannot restrict PRIMARY KEY part id by IN relation as a collection
 is selected by the query

 Detective work lead me to these lines in
 org.apache.cassandra.cql3.statementsSelectStatement:

 // We only support IN for the last name and for
 compact storage so far
 // TODO: #3885 allows us to extend to non
 compact as well, but that remains to be done
 if (i != stmt.columnRestrictions.length - 1)
 throw new
 InvalidRequestException(String.format(PRIMARY KEY part %s cannot be
 restricted by IN relation, cname));
 else if (stmt.selectACollection())
 throw new
 InvalidRequestException(String.format(Cannot restrict PRIMARY KEY 
 part %s
 by IN relation as a collection is selected by the query, 

Re: 750Gb compaction task

2014-03-13 Thread Kumar Ranjan
M —
Sent from Mailbox for iPhone

On Thu, Mar 13, 2014 at 1:28 AM, Plotnik, Alexey aplot...@rhonda.ru
wrote:

 After rebalance and cleanup I have leveled CF (SSTable size = 100MB) and a 
 compaction Task that is going to process ~750GB:
 root@da1-node1:~# nodetool compactionstats
 pending tasks: 10556
   compaction typekeyspace   column family   completed 
   total  unit  progress
Compaction cafs_chunks  chunks 41015024065
 808740269082 bytes 5.07%
 I have no space for this operation, I have 300 Gb only. Is it possible to 
 resolve this situation?

Re: Dead node seen as UP by replacement node

2014-03-13 Thread Rahul Menon
And the token value as suggested is tokenvalueoddeadnode-1 ?


On Thu, Mar 13, 2014 at 9:29 PM, Paulo Ricardo Motta Gomes 
paulo.mo...@chaordicsystems.com wrote:

 Nope, they have different IPs. I'm using the procedure described here to
 replace a dead node:
 http://www.datastax.com/docs/1.1/cluster_management#replacing-a-dead-node

 Dead node token: X (IP: Y)
 Replacement node token: X-1 (IP: Z)

 So, as soon as the replacement node (Z) is started, it sees the dead node
 (Y) as UP, and tries to stream data from it during the join process. About
 10 minutes later, the failure detector of Z detects Y as down, but since it
 was trying to fetch data from him, it fails the join/bootstrap process
 altogether.




Re: CQL Select Map using an IN relationship

2014-03-13 Thread Sylvain Lebresne
On Thu, Mar 13, 2014 at 12:12 PM, David Savage davemssav...@gmail.comwrote:

 Hi there,

 I'm experimenting using cassandra and have run across an error message
 which I need a little more information on.

 The use case I'm experimenting with is a series of document updates
 (documents being an arbitrary map of key value pairs), I would like to find
 the latest document updates after a specified time period. I don't want to
 store many copies of the documents (one per update) as the updates are
 often only to single keys in the map so that would involve a lot of
 duplicated data.

 The solution I've found that seems to fit best in terms of performance is
 to have two tables.

 One that has an event log of timeuuid - docid and a second that stores
 the documents themselves stored by docid - mapstring, string. I then run
 two queries, one to select ids that have changed after a certain time:

 SELECT id FROM eventlog WHERE timestamp=minTimeuuid($minimumTime)

 and then a second to select the actual documents themselves

 SELECT id, data FROM documents WHERE id IN (0, 1, 2, 3, 4, 5, 6, 7...)

 However this then explodes on query with the error message:

 Cannot restrict PRIMARY KEY part id by IN relation as a collection is
 selected by the query

 Detective work lead me to these lines in
 org.apache.cassandra.cql3.statementsSelectStatement:

 // We only support IN for the last name and for
 compact storage so far
 // TODO: #3885 allows us to extend to non compact as
 well, but that remains to be done
 if (i != stmt.columnRestrictions.length - 1)
 throw new
 InvalidRequestException(String.format(PRIMARY KEY part %s cannot be
 restricted by IN relation, cname));
 else if (stmt.selectACollection())
 throw new
 InvalidRequestException(String.format(Cannot restrict PRIMARY KEY part %s
 by IN relation as a collection is selected by the query, cname));

 It seems like #3885 will allow support for the first IF block above, but I
 don't think it will allow the second, am I correct?


Right, #3885 is about the first one. Tbh, the 2nd limitation is kind of
historical and unless I'm forgetting something, we should be able to lift
that pretty easily. If you don't mind opening a JIRA ticket, I'll have a
look at removing said limitation.

--
Sylvain




 Any pointers on how I can work around this would be greatly appreciated.

 Kind regards,

 Dave



Re: Dead node seen as UP by replacement node

2014-03-13 Thread Paulo Ricardo Motta Gomes
Yes, exactly.


On Thu, Mar 13, 2014 at 1:27 PM, Rahul Menon ra...@apigee.com wrote:

 And the token value as suggested is tokenvalueoddeadnode-1 ?


 On Thu, Mar 13, 2014 at 9:29 PM, Paulo Ricardo Motta Gomes 
 paulo.mo...@chaordicsystems.com wrote:

 Nope, they have different IPs. I'm using the procedure described here to
 replace a dead node:
 http://www.datastax.com/docs/1.1/cluster_management#replacing-a-dead-node

 Dead node token: X (IP: Y)
 Replacement node token: X-1 (IP: Z)

 So, as soon as the replacement node (Z) is started, it sees the dead node
 (Y) as UP, and tries to stream data from it during the join process. About
 10 minutes later, the failure detector of Z detects Y as down, but since it
 was trying to fetch data from him, it fails the join/bootstrap process
 altogether.





-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br http://www.chaordic.com.br/*
+55 48 3232.3200
+55 83 9690-1314


Re: CQL Select Map using an IN relationship

2014-03-13 Thread Laing, Michael
Think of them as:

PRIMARY KEY (partition_key[, range_key])

where the partition_key can be compounded as:

(partition_key0 [, partition_key1, ...])

and the optional range_key can be compounded as:

range_key0 [, range_key1 ...]

If you do this: PRIMARY KEY (key1, key2) - then key1 is the partition_key
and key2 is the range_key and queries will work that hash to key1 (the
partition) using = or IN and specify a range on key2.

But if you do this: PRIMARY key ((key1, key2)) then (key1, key2) is the
compound partition key - there is no range key - and you can specify = on
key1 and = or IN on key2 (but not a range).

Anyway that's what I remember! Hope it helps.

ml


On Thu, Mar 13, 2014 at 11:27 AM, David Savage davemssav...@gmail.comwrote:

 Great that works, thx! I probably would have never found that...

 It now makes me wonder in general when to use PRIMARY KEY (key1, key2) or
 PRIMARY KEY ((key1, key2)), any examples would be welcome if you have the
 time.

 Kind regards,

 Dave


 On Thu, Mar 13, 2014 at 2:56 PM, Laing, Michael michael.la...@nytimes.com
  wrote:

 Create your table like this and it will work:

 CREATE TABLE test.documents (group text,id bigint,data
 maptext,text,PRIMARY KEY ((group, id)));

 The extra parens catenate 'group' and 'id' into the partition key - IN
 will work on the last component of a partition key.

 ml


 On Thu, Mar 13, 2014 at 10:40 AM, David Savage davemssav...@gmail.comwrote:

 Nope, upgraded to 2.0.5 and still get the same problem, I actually
 simplified the problem a little in my first post, there's a composite
 primary key involved as I need to partition ids into groups

 So the full CQL statements are:

 CREATE KEYSPACE test WITH replication = {'class':'SimpleStrategy',
 'replication_factor':3};


 CREATE TABLE test.documents (group text,id bigint,data
 maptext,text,PRIMARY KEY (group, id));


 INSERT INTO test.documents(id,group,data) VALUES
 (0,'test',{'count':'0'});

 INSERT INTO test.documents(id,group,data) VALUES
 (1,'test',{'count':'1'});

 INSERT INTO test.documents(id,group,data) VALUES
 (2,'test',{'count':'2'});


 SELECT id,data FROM test.documents WHERE group='test' AND id IN (0,1,2);


 Thanks for your help.


 Kind regards,


 /Dave


 On Thu, Mar 13, 2014 at 2:00 PM, David Savage davemssav...@gmail.comwrote:

 Hmmm that maybe the problem, I'm currently testing with 2.0.2 which got
 dragged in by the cassandra unit library I'm using for testing [1] I will
 try to fix my build dependencies and retry, thx.

 /Dave

 [1] https://github.com/jsevellec/cassandra-unit


 On Thu, Mar 13, 2014 at 1:56 PM, Laing, Michael 
 michael.la...@nytimes.com wrote:

 I have no problem doing this w 2.0.5 - what version of C* are you
 using? Or maybe I don't understand your data model... attach 'creates' if
 you don't mind.

 ml


 On Thu, Mar 13, 2014 at 9:24 AM, David Savage 
 davemssav...@gmail.comwrote:

 Hi Peter,

 Thanks for the help, unfortunately I'm not sure that's the problem,
 the id is the primary key on the documents table and the timestamp
 is the primary key on the eventlog table

 Kind regards,


 Dave

 On Thursday, 13 March 2014, Peter Lin wool...@gmail.com wrote:


 it's not clear to me if your id column is the KEY or just a
 regular column with secondary index.

 queries that have IN on non primary key columns isn't supported yet.
 not sure if that answers your question.


 On Thu, Mar 13, 2014 at 7:12 AM, David Savage 
 davemssav...@gmail.com wrote:

 Hi there,

 I'm experimenting using cassandra and have run across an error
 message which I need a little more information on.

 The use case I'm experimenting with is a series of document updates
 (documents being an arbitrary map of key value pairs), I would like to 
 find
 the latest document updates after a specified time period. I don't 
 want to
 store many copies of the documents (one per update) as the updates are
 often only to single keys in the map so that would involve a lot of
 duplicated data.

 The solution I've found that seems to fit best in terms of
 performance is to have two tables.

 One that has an event log of timeuuid - docid and a second that
 stores the documents themselves stored by docid - mapstring, 
 string. I
 then run two queries, one to select ids that have changed after a 
 certain
 time:

 SELECT id FROM eventlog WHERE timestamp=minTimeuuid($minimumTime)

 and then a second to select the actual documents themselves

 SELECT id, data FROM documents WHERE id IN (0, 1, 2, 3, 4, 5, 6, 7…)

 However this then explodes on query with the error message:

 Cannot restrict PRIMARY KEY part id by IN relation as a collection
 is selected by the query

 Detective work lead me to these lines in
 org.apache.cassandra.cql3.statementsSelectStatement:

 // We only support IN for the last name and for
 compact storage so far
 // TODO: #3885 allows us to extend to non
 compact as well, but that remains to be done
 if (i != 

Re: 750Gb compaction task

2014-03-13 Thread Robert Coli
On Wed, Mar 12, 2014 at 10:27 PM, Plotnik, Alexey aplot...@rhonda.ruwrote:

 I have no space for this operation, I have 300 Gb only. Is it possible to
 resolve this situation?


compactionstats shows non-compressed size. As long as you have compression
enabled, which is the default, you should be fine.

=Rob


Re: Problems with adding datacenter and schema version disagreement

2014-03-13 Thread Robert Coli
On Thu, Mar 13, 2014 at 2:05 AM, olek.stas...@gmail.com 
olek.stas...@gmail.com wrote:

 Bump, are there any solutions to bring my cluster back to schema
 consistency?
 I've 6 node cluster with exactly six versions of schema, how to deal with
 it?


The simplest way, which is most likely to actually work, is to down all
nodes, nuke schema, and reload it from a dump.

=Rob


Re: CQL Select Map using an IN relationship

2014-03-13 Thread Jack Krupansky
“range key” is formally known as “clustering column”. One or more clustering 
columns can be specified to identify individual rows in a partition. Without 
clustering columns, one partition is one row. So, it’s a matter of whether you 
want your rows to be in the same partition or distributed.

-- Jack Krupansky

From: Laing, Michael 
Sent: Thursday, March 13, 2014 1:39 PM
To: user@cassandra.apache.org 
Subject: Re: CQL Select Map using an IN relationship

Think of them as: 

  PRIMARY KEY (partition_key[, range_key])

where the partition_key can be compounded as:


  (partition_key0 [, partition_key1, ...])

and the optional range_key can be compounded as: 

  range_key0 [, range_key1 ...]

If you do this: PRIMARY KEY (key1, key2) - then key1 is the partition_key and 
key2 is the range_key and queries will work that hash to key1 (the partition) 
using = or IN and specify a range on key2.

But if you do this: PRIMARY key ((key1, key2)) then (key1, key2) is the 
compound partition key - there is no range key - and you can specify = on key1 
and = or IN on key2 (but not a range).

Anyway that's what I remember! Hope it helps.

ml



On Thu, Mar 13, 2014 at 11:27 AM, David Savage davemssav...@gmail.com wrote:

  Great that works, thx! I probably would have never found that... 

  It now makes me wonder in general when to use PRIMARY KEY (key1, key2) or 
PRIMARY KEY ((key1, key2)), any examples would be welcome if you have the time.

  Kind regards,

  Dave


  On Thu, Mar 13, 2014 at 2:56 PM, Laing, Michael michael.la...@nytimes.com 
wrote:

Create your table like this and it will work: 

CREATE TABLE test.documents (group text,id bigint,data 
maptext,text,PRIMARY KEY ((group, id)));



The extra parens catenate 'group' and 'id' into the partition key - IN will 
work on the last component of a partition key.


ml



On Thu, Mar 13, 2014 at 10:40 AM, David Savage davemssav...@gmail.com 
wrote:

  Nope, upgraded to 2.0.5 and still get the same problem, I actually 
simplified the problem a little in my first post, there's a composite primary 
key involved as I need to partition ids into groups 

  So the full CQL statements are:

  CREATE KEYSPACE test WITH replication = {'class':'SimpleStrategy', 
'replication_factor':3};



  CREATE TABLE test.documents (group text,id bigint,data 
maptext,text,PRIMARY KEY (group, id));



  INSERT INTO test.documents(id,group,data) VALUES (0,'test',{'count':'0'});

  INSERT INTO test.documents(id,group,data) VALUES (1,'test',{'count':'1'});

  INSERT INTO test.documents(id,group,data) VALUES (2,'test',{'count':'2'});



  SELECT id,data FROM test.documents WHERE group='test' AND id IN (0,1,2);



  Thanks for your help.



  Kind regards,



  /Dave




  On Thu, Mar 13, 2014 at 2:00 PM, David Savage davemssav...@gmail.com 
wrote:

Hmmm that maybe the problem, I'm currently testing with 2.0.2 which got 
dragged in by the cassandra unit library I'm using for testing [1] I will try 
to fix my build dependencies and retry, thx.

/Dave


[1] https://github.com/jsevellec/cassandra-unit



On Thu, Mar 13, 2014 at 1:56 PM, Laing, Michael 
michael.la...@nytimes.com wrote:

  I have no problem doing this w 2.0.5 - what version of C* are you 
using? Or maybe I don't understand your data model... attach 'creates' if you 
don't mind. 

  ml



  On Thu, Mar 13, 2014 at 9:24 AM, David Savage 
davemssav...@gmail.com wrote:

Hi Peter, 

Thanks for the help, unfortunately I'm not sure that's the problem, 
the id is the primary key on the documents table and the timestamp is the 
primary key on the eventlog table

Kind regards,



Dave


On Thursday, 13 March 2014, Peter Lin wool...@gmail.com wrote:


  it's not clear to me if your id column is the KEY or just a 
regular column with secondary index.


  queries that have IN on non primary key columns isn't supported 
yet. not sure if that answers your question.




  On Thu, Mar 13, 2014 at 7:12 AM, David Savage 
davemssav...@gmail.com wrote:

Hi there, 

I'm experimenting using cassandra and have run across an error 
message which I need a little more information on.

The use case I'm experimenting with is a series of document 
updates (documents being an arbitrary map of key value pairs), I would like to 
find the latest document updates after a specified time period. I don't want to 
store many copies of the documents (one per update) as the updates are often 
only to single keys in the map so that would involve a lot of duplicated data.

The solution I've found that seems to fit best in terms of 
performance is to have two tables.

One that has an event log of timeuuid - docid and a second 
that stores the documents themselves stored by docid - 

Re: CQL Select Map using an IN relationship

2014-03-13 Thread Laing, Michael
I have found that range_key communicates better what you can actually do
with them, whereas clustering is more passive.

ml


On Thu, Mar 13, 2014 at 2:08 PM, Jack Krupansky j...@basetechnology.comwrote:

   “range key” is formally known as “clustering column”. One or more
 clustering columns can be specified to identify individual rows in a
 partition. Without clustering columns, one partition is one row. So, it’s a
 matter of whether you want your rows to be in the same partition or
 distributed.

 -- Jack Krupansky

  *From:* Laing, Michael michael.la...@nytimes.com
 *Sent:* Thursday, March 13, 2014 1:39 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: CQL Select Map using an IN relationship

  Think of them as:


 PRIMARY KEY (partition_key[, range_key])


 where the partition_key can be compounded as:


 (partition_key0 [, partition_key1, ...])


 and the optional range_key can be compounded as:


 range_key0 [, range_key1 ...]


 If you do this: PRIMARY KEY (key1, key2) - then key1 is the partition_key
 and key2 is the range_key and queries will work that hash to key1 (the
 partition) using = or IN and specify a range on key2.

 But if you do this: PRIMARY key ((key1, key2)) then (key1, key2) is the
 compound partition key - there is no range key - and you can specify = on
 key1 and = or IN on key2 (but not a range).

 Anyway that's what I remember! Hope it helps.

 ml


 On Thu, Mar 13, 2014 at 11:27 AM, David Savage davemssav...@gmail.comwrote:

 Great that works, thx! I probably would have never found that...

 It now makes me wonder in general when to use PRIMARY KEY (key1, key2) or
 PRIMARY KEY ((key1, key2)), any examples would be welcome if you have the
 time.

 Kind regards,

 Dave


 On Thu, Mar 13, 2014 at 2:56 PM, Laing, Michael 
 michael.la...@nytimes.com wrote:

 Create your table like this and it will work:

 CREATE TABLE test.documents (group text,id bigint,data
 maptext,text,PRIMARY KEY ((group, id)));

 The extra parens catenate 'group' and 'id' into the partition key - IN
 will work on the last component of a partition key.

 ml


 On Thu, Mar 13, 2014 at 10:40 AM, David Savage 
 davemssav...@gmail.comwrote:

 Nope, upgraded to 2.0.5 and still get the same problem, I actually
 simplified the problem a little in my first post, there's a composite
 primary key involved as I need to partition ids into groups

 So the full CQL statements are:


 CREATE KEYSPACE test WITH replication = {'class':'SimpleStrategy',
 'replication_factor':3};



 CREATE TABLE test.documents (group text,id bigint,data
 maptext,text,PRIMARY KEY (group, id));



 INSERT INTO test.documents(id,group,data) VALUES
 (0,'test',{'count':'0'});

 INSERT INTO test.documents(id,group,data) VALUES
 (1,'test',{'count':'1'});

 INSERT INTO test.documents(id,group,data) VALUES
 (2,'test',{'count':'2'});



 SELECT id,data FROM test.documents WHERE group='test' AND id IN (0,1,2);



 Thanks for your help.



 Kind regards,



 /Dave


 On Thu, Mar 13, 2014 at 2:00 PM, David Savage 
 davemssav...@gmail.comwrote:

  Hmmm that maybe the problem, I'm currently testing with 2.0.2 which
 got dragged in by the cassandra unit library I'm using for testing [1] I
 will try to fix my build dependencies and retry, thx.

 /Dave

 [1] https://github.com/jsevellec/cassandra-unit


 On Thu, Mar 13, 2014 at 1:56 PM, Laing, Michael 
 michael.la...@nytimes.com wrote:

 I have no problem doing this w 2.0.5 - what version of C* are you
 using? Or maybe I don't understand your data model... attach 'creates' if
 you don't mind.

 ml


 On Thu, Mar 13, 2014 at 9:24 AM, David Savage davemssav...@gmail.com
  wrote:

 Hi Peter,

 Thanks for the help, unfortunately I'm not sure that's the problem,
 the id is the primary key on the documents table and the timestamp
 is the primary key on the eventlog table


 Kind regards,



 Dave

 On Thursday, 13 March 2014, Peter Lin wool...@gmail.com wrote:


 it's not clear to me if your id column is the KEY or just a
 regular column with secondary index.

 queries that have IN on non primary key columns isn't supported
 yet. not sure if that answers your question.


 On Thu, Mar 13, 2014 at 7:12 AM, David Savage 
 davemssav...@gmail.com wrote:

 Hi there,

 I'm experimenting using cassandra and have run across an error
 message which I need a little more information on.

 The use case I'm experimenting with is a series of document
 updates (documents being an arbitrary map of key value pairs), I 
 would like
 to find the latest document updates after a specified time period. I 
 don't
 want to store many copies of the documents (one per update) as the 
 updates
 are often only to single keys in the map so that would involve a lot 
 of
 duplicated data.

 The solution I've found that seems to fit best in terms of
 performance is to have two tables.

 One that has an event log of timeuuid - docid and a second that
 stores the documents themselves stored by docid - mapstring, 
 string. I
 then run two queries, 

Re: CQL Select Map using an IN relationship

2014-03-13 Thread David Savage
Thanks for the explanations.

To confirm I understand, Michaels explanation seems to say that that :

* the partition key supports =/IN but not ,=,,=
* the range key (or clustering column) supports =,,=,,= but not IN. Is
that correct?

Jacks explanation seems to say that by grouping the two columns in the
primary key ((key1, key2)) this will prevent data from being partitioned
across nodes in the cluster, is that correct?

Also in another response thread Sylvian seemed to hint that it's historical
that IN is not supported on the range key / clustering column [1]. If I've
understood that correctly I'm happy to raise a jira ticket to track this so
it can be fixed.

Thanks for your help.

Kind regards,

Dave

[1] Please let me know if I should pick one of these terms for clarity...


On Thu, Mar 13, 2014 at 6:08 PM, Jack Krupansky j...@basetechnology.comwrote:

   range key is formally known as clustering column. One or more
 clustering columns can be specified to identify individual rows in a
 partition. Without clustering columns, one partition is one row. So, it's a
 matter of whether you want your rows to be in the same partition or
 distributed.

 -- Jack Krupansky

  *From:* Laing, Michael michael.la...@nytimes.com
 *Sent:* Thursday, March 13, 2014 1:39 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: CQL Select Map using an IN relationship

  Think of them as:


 PRIMARY KEY (partition_key[, range_key])


 where the partition_key can be compounded as:


 (partition_key0 [, partition_key1, ...])


 and the optional range_key can be compounded as:


 range_key0 [, range_key1 ...]


 If you do this: PRIMARY KEY (key1, key2) - then key1 is the partition_key
 and key2 is the range_key and queries will work that hash to key1 (the
 partition) using = or IN and specify a range on key2.

 But if you do this: PRIMARY key ((key1, key2)) then (key1, key2) is the
 compound partition key - there is no range key - and you can specify = on
 key1 and = or IN on key2 (but not a range).

 Anyway that's what I remember! Hope it helps.

 ml


 On Thu, Mar 13, 2014 at 11:27 AM, David Savage davemssav...@gmail.comwrote:

 Great that works, thx! I probably would have never found that...

 It now makes me wonder in general when to use PRIMARY KEY (key1, key2) or
 PRIMARY KEY ((key1, key2)), any examples would be welcome if you have the
 time.

 Kind regards,

 Dave


 On Thu, Mar 13, 2014 at 2:56 PM, Laing, Michael 
 michael.la...@nytimes.com wrote:

 Create your table like this and it will work:

 CREATE TABLE test.documents (group text,id bigint,data
 maptext,text,PRIMARY KEY ((group, id)));

 The extra parens catenate 'group' and 'id' into the partition key - IN
 will work on the last component of a partition key.

 ml


 On Thu, Mar 13, 2014 at 10:40 AM, David Savage 
 davemssav...@gmail.comwrote:

 Nope, upgraded to 2.0.5 and still get the same problem, I actually
 simplified the problem a little in my first post, there's a composite
 primary key involved as I need to partition ids into groups

 So the full CQL statements are:


 CREATE KEYSPACE test WITH replication = {'class':'SimpleStrategy',
 'replication_factor':3};



 CREATE TABLE test.documents (group text,id bigint,data
 maptext,text,PRIMARY KEY (group, id));



 INSERT INTO test.documents(id,group,data) VALUES
 (0,'test',{'count':'0'});

 INSERT INTO test.documents(id,group,data) VALUES
 (1,'test',{'count':'1'});

 INSERT INTO test.documents(id,group,data) VALUES
 (2,'test',{'count':'2'});



 SELECT id,data FROM test.documents WHERE group='test' AND id IN (0,1,2);



 Thanks for your help.



 Kind regards,



 /Dave


 On Thu, Mar 13, 2014 at 2:00 PM, David Savage 
 davemssav...@gmail.comwrote:

  Hmmm that maybe the problem, I'm currently testing with 2.0.2 which
 got dragged in by the cassandra unit library I'm using for testing [1] I
 will try to fix my build dependencies and retry, thx.

 /Dave

 [1] https://github.com/jsevellec/cassandra-unit


 On Thu, Mar 13, 2014 at 1:56 PM, Laing, Michael 
 michael.la...@nytimes.com wrote:

 I have no problem doing this w 2.0.5 - what version of C* are you
 using? Or maybe I don't understand your data model... attach 'creates' if
 you don't mind.

 ml


 On Thu, Mar 13, 2014 at 9:24 AM, David Savage davemssav...@gmail.com
  wrote:

 Hi Peter,

 Thanks for the help, unfortunately I'm not sure that's the problem,
 the id is the primary key on the documents table and the timestamp
 is the primary key on the eventlog table


 Kind regards,



 Dave

 On Thursday, 13 March 2014, Peter Lin wool...@gmail.com wrote:


 it's not clear to me if your id column is the KEY or just a
 regular column with secondary index.

 queries that have IN on non primary key columns isn't supported
 yet. not sure if that answers your question.


 On Thu, Mar 13, 2014 at 7:12 AM, David Savage 
 davemssav...@gmail.com wrote:

 Hi there,

 I'm experimenting using cassandra and have run across an error
 message which I need a little more 

Re: Problems with adding datacenter and schema version disagreement

2014-03-13 Thread olek.stas...@gmail.com
Huh,
you mean json dump?
Regards
Aleksander

2014-03-13 18:59 GMT+01:00 Robert Coli rc...@eventbrite.com:
 On Thu, Mar 13, 2014 at 2:05 AM, olek.stas...@gmail.com
 olek.stas...@gmail.com wrote:

 Bump, are there any solutions to bring my cluster back to schema
 consistency?
 I've 6 node cluster with exactly six versions of schema, how to deal with
 it?


 The simplest way, which is most likely to actually work, is to down all
 nodes, nuke schema, and reload it from a dump.

 =Rob



Need help understanding hinted_handoff_throttle_in_kb

2014-03-13 Thread Oleg Dulin
I came across something on the cassandra it that made me concerned.

Default value for hinted_handoff_throttle_in_kb is 1024, one Meg per
second. I have four nodes and rf=2. I have hints timeout set to 24, to
avoid having to do repairs if I took longer than that to reboot a node.

What got me thinking though is that if I'm generating gigabytes worth of
hints during the day and across four nodes the throttle becomes 250k per
second, that is too slow to replay all of my hints properly. Is tht right ?

I need to understand this setting better. I would like to make sure that
all of my hints get replayed. What is a recommended setting ?

Any input is greatly appreciated.

Regards,
Oleg



Re: Problems with adding datacenter and schema version disagreement

2014-03-13 Thread Robert Coli
On Thu, Mar 13, 2014 at 1:20 PM, olek.stas...@gmail.com 
olek.stas...@gmail.com wrote:

 Huh,
 you mean json dump?


If you're using cassandra-cli, I mean the output of show schema;

If you're using CQLsh, there is an analogous way to show all schema.

1) dump schema to a file via one of the above tools
2) stop cassandra and nuke system keyspaces everywhere
3) start cassandra, coalesce cluster
4) load schema

=Rob


1.2: Why can't I see what is in hints CF ?

2014-03-13 Thread Oleg Dulin

Check this out:

[default@system] list hints limit 10;
Using default cell limit of 100
null
TimedOutException()
	at 
org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12932) 


at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
	at 
org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:734) 

	at 
org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:718) 


at org.apache.cassandra.cli.CliClient.executeList(CliClient.java:1495)
at 
org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:279)
	at 
org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.java:213) 


at org.apache.cassandra.cli.CliMain.main(CliMain.java:339)


My nodes are accumulating hints and I am wondering what in the world is 
going on...


--
Regards,
Oleg Dulin
http://www.olegdulin.com




Re: How to guarantee consistency between counter and materialized view?

2014-03-13 Thread Oleg Dulin
Robert Coli rc...@eventbrite.com wrote:
 On Tue, Mar 11, 2014 at 4:30 PM, ziju feng pkdog...@gmail.com wrote:
 
 Is there any way to guarantee a counter's value
 
 no.
 
 =Rob

I wouldn't use cassandra for counters... Use something like redis if that
is what you want.



Re: CQL Select Map using an IN relationship

2014-03-13 Thread Laing, Michael
These are my personal opinions, reflecting both my long experience w
database systems, and my newness to Cassandra...

[tl;dr]

The Cassandra contributors, having made its history, tend to describe it in
terms of implementation rather than action. And its implementation has a
history, all relatively recent, that many know, but which to newcomers like
me is obscure and, frankly, not particularly relevant.

Note: we are all trying to understand Crimea now, and to really understand,
you have to ingest several hundred years of history. Luckily, Cassandra has
not been around quite so long!

But Cassandra's history creeps into the nomenclature of CQL3. So what might
logically be called a 'hash key' is called a 'partition key', what is
called a 'clustering key' might be better termed a 'range key' IMHO.

The 'official' terms in the nomenclature are important to know, they are
just not descriptive of the actions one takes as a user of them. However,
they have meaning to those who have 'lived' the history of Cassandra, and
form an important bridge to the past.

As a new user I found them non-intuitive. Amazon has done a much better job
with DynamoDB - muddled, however, by bad syntax choices.

But you adjust and mentally map... I am still bumfuzzled when people talk
of slices and other C* cruft but just let it slide by like lectures from my
mother. That and thrift can just fade into history with gopher and lynx as
far as I am concerned - CQL3 is where it's at.

But another thing to remember is that performance is king - and to get
performance you fly 'close to the metal': Cassandra does that and you
should know the code paths, the physical structures, and the
characteristics of your 'metal' to understand how to build high-performing
apps.

***

The answer to both asterisks is Yes. You should use the term 'clustering
column' because that is what is in the docs - but you should think 'range
key' for how you use it. Similarly 'partition key' : 'hash key'.

Good luck,

ml