Re: Problems with adding datacenter and schema version disagreement
Bump, are there any solutions to bring my cluster back to schema consistency? I've 6 node cluster with exactly six versions of schema, how to deal with it? regards Aleksander 2014-03-11 14:36 GMT+01:00 olek.stas...@gmail.com olek.stas...@gmail.com: Didn't help :) thanks and regards Aleksander 2014-03-11 14:14 GMT+01:00 Duncan Sands duncan.sa...@gmail.com: On 11/03/14 14:00, olek.stas...@gmail.com wrote: I plan to install 2.0.6 as soon as it will be available in datastax rpm repo. But how to deal with schema inconsistency on such scale? Does it get better if you restart all the nodes? In my case restarting just some of the nodes didn't help, but restarting all nodes did. Ciao, Duncan.
select query returns wrong value if use DESC option
Hi. I am using Cassandra 2.0.6 version. There is a case that select query returns wrong value if use DESC option. My test procedure is as follows: -- cqlsh:test CREATE TABLE mytable (key int, range int, PRIMARY KEY (key, range)); cqlsh:test INSERT INTO mytable (key, range) VALUES (0, 0); cqlsh:test SELECT * FROM mytable WHERE key = 0 AND range = 0; key | range -+--- 0 | 0 (1 rows) cqlsh:test SELECT * FROM mytable WHERE key = 0 AND range = 0 ORDER BY range ASC; key | range -+--- 0 | 0 (1 rows) cqlsh:test SELECT * FROM mytable WHERE key = 0 AND range = 0 ORDER BY range DESC; (0 rows) -- Why returns value is 0 rows if using DESC option? I expected the same 1 row as the return value of other queries. Does anyone has a similar issue? Thanks, Katsutoshi
CQL Select Map using an IN relationship
Hi there, I'm experimenting using cassandra and have run across an error message which I need a little more information on. The use case I'm experimenting with is a series of document updates (documents being an arbitrary map of key value pairs), I would like to find the latest document updates after a specified time period. I don't want to store many copies of the documents (one per update) as the updates are often only to single keys in the map so that would involve a lot of duplicated data. The solution I've found that seems to fit best in terms of performance is to have two tables. One that has an event log of timeuuid - docid and a second that stores the documents themselves stored by docid - mapstring, string. I then run two queries, one to select ids that have changed after a certain time: SELECT id FROM eventlog WHERE timestamp=minTimeuuid($minimumTime) and then a second to select the actual documents themselves SELECT id, data FROM documents WHERE id IN (0, 1, 2, 3, 4, 5, 6, 7...) However this then explodes on query with the error message: Cannot restrict PRIMARY KEY part id by IN relation as a collection is selected by the query Detective work lead me to these lines in org.apache.cassandra.cql3.statementsSelectStatement: // We only support IN for the last name and for compact storage so far // TODO: #3885 allows us to extend to non compact as well, but that remains to be done if (i != stmt.columnRestrictions.length - 1) throw new InvalidRequestException(String.format(PRIMARY KEY part %s cannot be restricted by IN relation, cname)); else if (stmt.selectACollection()) throw new InvalidRequestException(String.format(Cannot restrict PRIMARY KEY part %s by IN relation as a collection is selected by the query, cname)); It seems like #3885 will allow support for the first IF block above, but I don't think it will allow the second, am I correct? Any pointers on how I can work around this would be greatly appreciated. Kind regards, Dave
Re: CQL Select Map using an IN relationship
it's not clear to me if your id column is the KEY or just a regular column with secondary index. queries that have IN on non primary key columns isn't supported yet. not sure if that answers your question. On Thu, Mar 13, 2014 at 7:12 AM, David Savage davemssav...@gmail.comwrote: Hi there, I'm experimenting using cassandra and have run across an error message which I need a little more information on. The use case I'm experimenting with is a series of document updates (documents being an arbitrary map of key value pairs), I would like to find the latest document updates after a specified time period. I don't want to store many copies of the documents (one per update) as the updates are often only to single keys in the map so that would involve a lot of duplicated data. The solution I've found that seems to fit best in terms of performance is to have two tables. One that has an event log of timeuuid - docid and a second that stores the documents themselves stored by docid - mapstring, string. I then run two queries, one to select ids that have changed after a certain time: SELECT id FROM eventlog WHERE timestamp=minTimeuuid($minimumTime) and then a second to select the actual documents themselves SELECT id, data FROM documents WHERE id IN (0, 1, 2, 3, 4, 5, 6, 7...) However this then explodes on query with the error message: Cannot restrict PRIMARY KEY part id by IN relation as a collection is selected by the query Detective work lead me to these lines in org.apache.cassandra.cql3.statementsSelectStatement: // We only support IN for the last name and for compact storage so far // TODO: #3885 allows us to extend to non compact as well, but that remains to be done if (i != stmt.columnRestrictions.length - 1) throw new InvalidRequestException(String.format(PRIMARY KEY part %s cannot be restricted by IN relation, cname)); else if (stmt.selectACollection()) throw new InvalidRequestException(String.format(Cannot restrict PRIMARY KEY part %s by IN relation as a collection is selected by the query, cname)); It seems like #3885 will allow support for the first IF block above, but I don't think it will allow the second, am I correct? Any pointers on how I can work around this would be greatly appreciated. Kind regards, Dave
Re: select query returns wrong value if use DESC option
Consider filing a jira. Cql is the standard interface to cassandra everything is heavily tested. On Thursday, March 13, 2014, Katsutoshi Nagaoka nagapad.0...@gmail.com wrote: Hi. I am using Cassandra 2.0.6 version. There is a case that select query returns wrong value if use DESC option. My test procedure is as follows: -- cqlsh:test CREATE TABLE mytable (key int, range int, PRIMARY KEY (key, range)); cqlsh:test INSERT INTO mytable (key, range) VALUES (0, 0); cqlsh:test SELECT * FROM mytable WHERE key = 0 AND range = 0; key | range -+--- 0 | 0 (1 rows) cqlsh:test SELECT * FROM mytable WHERE key = 0 AND range = 0 ORDER BY range ASC; key | range -+--- 0 | 0 (1 rows) cqlsh:test SELECT * FROM mytable WHERE key = 0 AND range = 0 ORDER BY range DESC; (0 rows) -- Why returns value is 0 rows if using DESC option? I expected the same 1 row as the return value of other queries. Does anyone has a similar issue? Thanks, Katsutoshi -- Sorry this was sent from mobile. Will do less grammar and spell check than usual.
Re: CQL Select Map using an IN relationship
Hi Peter, Thanks for the help, unfortunately I'm not sure that's the problem, the id is the primary key on the documents table and the timestamp is the primary key on the eventlog table Kind regards, Dave On Thursday, 13 March 2014, Peter Lin wool...@gmail.com wrote: it's not clear to me if your id column is the KEY or just a regular column with secondary index. queries that have IN on non primary key columns isn't supported yet. not sure if that answers your question. On Thu, Mar 13, 2014 at 7:12 AM, David Savage davemssav...@gmail.comwrote: Hi there, I'm experimenting using cassandra and have run across an error message which I need a little more information on. The use case I'm experimenting with is a series of document updates (documents being an arbitrary map of key value pairs), I would like to find the latest document updates after a specified time period. I don't want to store many copies of the documents (one per update) as the updates are often only to single keys in the map so that would involve a lot of duplicated data. The solution I've found that seems to fit best in terms of performance is to have two tables. One that has an event log of timeuuid - docid and a second that stores the documents themselves stored by docid - mapstring, string. I then run two queries, one to select ids that have changed after a certain time: SELECT id FROM eventlog WHERE timestamp=minTimeuuid($minimumTime) and then a second to select the actual documents themselves SELECT id, data FROM documents WHERE id IN (0, 1, 2, 3, 4, 5, 6, 7...) However this then explodes on query with the error message: Cannot restrict PRIMARY KEY part id by IN relation as a collection is selected by the query Detective work lead me to these lines in org.apache.cassandra.cql3.statementsSelectStatement: // We only support IN for the last name and for compact storage so far // TODO: #3885 allows us to extend to non compact as well, but that remains to be done if (i != stmt.columnRestrictions.length - 1) throw new InvalidRequestException(String.format(PRIMARY KEY part %s cannot be restricted by IN relation, cname)); else if (stmt.selectACollection()) throw new InvalidRequestException(String.format(Cannot restrict PRIMARY KEY part %s by IN relation as a collection is selected by the query, cname)); It seems like #3885 will allow support for the first IF block above, but I don't think it will allow the second, am I correct? Any pointers on how I can work around this would be greatly appreciated. Kind regards, Dave
Re: Opscenter help?
On 3/13/14, 12:14 AM, Jack Krupansky wrote: Please do use Stack Overflow - that is the appropriate forum for OpsCenter support (unless you are a DataStax customer). Use the OpsCenter tag: http://stackoverflow.com/tags/opscenter/info Unfortunately, as a new user, I cannot use the opscenter tag. I don't have a good enough reputation yet. Thanks for the pointer anyway. -- Drew from Zhrodague post-apocalyptic ad-hoc industrialist d...@zhrodague.net
Re:
Thanks , Edward and David, Your contribution lead me to the conclusion. Unknown to me, the partition key had 1 value. So of course all info was stored on a single node. Having replication factor 3 lead to having 99 % CPU on 3 machines. Regarding RAM we have so much CPU power that I actualy run a tomcat server on each node and tomcat prepares data for queries, sends them, retrieves results, processes them and sends responses to clients. We are running 6 Intel® Xeon® E3-1270 v3 Quad-Core Haswell Hyper-Threading. So there is CPU power there to handle high java heap IMO. Thank you all for your prompt responses. On Thursday, March 13, 2014 1:38 AM, David McNelis dmcne...@gmail.com wrote: Not knowing anything about your data structure (to expand on what Edward said), you could be running into something where you've got some hot keys that are getting the majority of writes during those heavily loads more specifically I might look for a single key that you're writing, since you're RF=3 and you have 3 nodes specifically that are causing problems. On Wed, Mar 12, 2014 at 7:28 PM, Russ Bradberry rbradbe...@gmail.com wrote: I wouldn't go above 8G unless you have a very powerful machine that can keep the GC pauses low. Sent from my iPhone On Mar 12, 2014, at 7:11 PM, Edward Capriolo edlinuxg...@gmail.com wrote: That is too much ram for cassandra make that 6g to 10g. The uneven perf could be because your requests do not shard evenly. On Wednesday, March 12, 2014, Batranut Bogdan batra...@yahoo.com wrote: Hello all, The environment: I have a 6 node Cassandra cluster. On each node I have: - 32 G RAM - 24 G RAM for cassa - ~150 - 200 MB/s disk speed - tomcat 6 with axis2 webservice that uses the datastax java driver to make asynch reads / writes - replication factor for the keyspace is 3 All nodes in the same data center The clients that read / write are in the same datacenter so network is Gigabit. Writes are performed via exposed methods from Axis2 WS . The Cassandra Java driver uses the round robin load balancing policy so all the nodes in the cluster should be hit with write requests under heavy write or read load from multiple clients. I am monitoring all nodes with JConsole from another box. The problem: When wrinting to a particular column family, only 3 nodes have high CPU load ~ 80 - 99 %. The remaining 3 are at ~2 - 10 % CPU. During writes, reads timeout. I need more speed for both writes of reads. Due to the fact that 3 nodes barely have CPU activity leads me to think that the whole potential for C* is not touched. I am running out of ideas... If further details about the environment I can provide them. Thank you very much. -- Sorry this was sent from mobile. Will do less grammar and spell check than usual.
Re: Opscenter help?
You don't need any reputation points to ask a new question with an existing tag - just type opscenter in the Tags box under the question. Otherwise, how would any new user ever be able to ask a question and have it tagged?! -- Jack Krupansky -Original Message- From: Drew from Zhrodague Sent: Thursday, March 13, 2014 9:29 AM To: user@cassandra.apache.org Subject: Re: Opscenter help? On 3/13/14, 12:14 AM, Jack Krupansky wrote: Please do use Stack Overflow - that is the appropriate forum for OpsCenter support (unless you are a DataStax customer). Use the OpsCenter tag: http://stackoverflow.com/tags/opscenter/info Unfortunately, as a new user, I cannot use the opscenter tag. I don't have a good enough reputation yet. Thanks for the pointer anyway. -- Drew from Zhrodague post-apocalyptic ad-hoc industrialist d...@zhrodague.net
Re: CQL Select Map using an IN relationship
I have no problem doing this w 2.0.5 - what version of C* are you using? Or maybe I don't understand your data model... attach 'creates' if you don't mind. ml On Thu, Mar 13, 2014 at 9:24 AM, David Savage davemssav...@gmail.comwrote: Hi Peter, Thanks for the help, unfortunately I'm not sure that's the problem, the id is the primary key on the documents table and the timestamp is the primary key on the eventlog table Kind regards, Dave On Thursday, 13 March 2014, Peter Lin wool...@gmail.com wrote: it's not clear to me if your id column is the KEY or just a regular column with secondary index. queries that have IN on non primary key columns isn't supported yet. not sure if that answers your question. On Thu, Mar 13, 2014 at 7:12 AM, David Savage davemssav...@gmail.comwrote: Hi there, I'm experimenting using cassandra and have run across an error message which I need a little more information on. The use case I'm experimenting with is a series of document updates (documents being an arbitrary map of key value pairs), I would like to find the latest document updates after a specified time period. I don't want to store many copies of the documents (one per update) as the updates are often only to single keys in the map so that would involve a lot of duplicated data. The solution I've found that seems to fit best in terms of performance is to have two tables. One that has an event log of timeuuid - docid and a second that stores the documents themselves stored by docid - mapstring, string. I then run two queries, one to select ids that have changed after a certain time: SELECT id FROM eventlog WHERE timestamp=minTimeuuid($minimumTime) and then a second to select the actual documents themselves SELECT id, data FROM documents WHERE id IN (0, 1, 2, 3, 4, 5, 6, 7…) However this then explodes on query with the error message: Cannot restrict PRIMARY KEY part id by IN relation as a collection is selected by the query Detective work lead me to these lines in org.apache.cassandra.cql3.statementsSelectStatement: // We only support IN for the last name and for compact storage so far // TODO: #3885 allows us to extend to non compact as well, but that remains to be done if (i != stmt.columnRestrictions.length - 1) throw new InvalidRequestException(String.format(PRIMARY KEY part %s cannot be restricted by IN relation, cname)); else if (stmt.selectACollection()) throw new InvalidRequestException(String.format(Cannot restrict PRIMARY KEY part %s by IN relation as a collection is selected by the query, cname)); It seems like #3885 will allow support for the first IF block above, but I don't think it will allow the second, am I correct? Any pointers on how I can work around this would be greatly appreciated. Kind regards, Dave
Re: CQL Select Map using an IN relationship
Hmmm that maybe the problem, I'm currently testing with 2.0.2 which got dragged in by the cassandra unit library I'm using for testing [1] I will try to fix my build dependencies and retry, thx. /Dave [1] https://github.com/jsevellec/cassandra-unit On Thu, Mar 13, 2014 at 1:56 PM, Laing, Michael michael.la...@nytimes.comwrote: I have no problem doing this w 2.0.5 - what version of C* are you using? Or maybe I don't understand your data model... attach 'creates' if you don't mind. ml On Thu, Mar 13, 2014 at 9:24 AM, David Savage davemssav...@gmail.comwrote: Hi Peter, Thanks for the help, unfortunately I'm not sure that's the problem, the id is the primary key on the documents table and the timestamp is the primary key on the eventlog table Kind regards, Dave On Thursday, 13 March 2014, Peter Lin wool...@gmail.com wrote: it's not clear to me if your id column is the KEY or just a regular column with secondary index. queries that have IN on non primary key columns isn't supported yet. not sure if that answers your question. On Thu, Mar 13, 2014 at 7:12 AM, David Savage davemssav...@gmail.comwrote: Hi there, I'm experimenting using cassandra and have run across an error message which I need a little more information on. The use case I'm experimenting with is a series of document updates (documents being an arbitrary map of key value pairs), I would like to find the latest document updates after a specified time period. I don't want to store many copies of the documents (one per update) as the updates are often only to single keys in the map so that would involve a lot of duplicated data. The solution I've found that seems to fit best in terms of performance is to have two tables. One that has an event log of timeuuid - docid and a second that stores the documents themselves stored by docid - mapstring, string. I then run two queries, one to select ids that have changed after a certain time: SELECT id FROM eventlog WHERE timestamp=minTimeuuid($minimumTime) and then a second to select the actual documents themselves SELECT id, data FROM documents WHERE id IN (0, 1, 2, 3, 4, 5, 6, 7...) However this then explodes on query with the error message: Cannot restrict PRIMARY KEY part id by IN relation as a collection is selected by the query Detective work lead me to these lines in org.apache.cassandra.cql3.statementsSelectStatement: // We only support IN for the last name and for compact storage so far // TODO: #3885 allows us to extend to non compact as well, but that remains to be done if (i != stmt.columnRestrictions.length - 1) throw new InvalidRequestException(String.format(PRIMARY KEY part %s cannot be restricted by IN relation, cname)); else if (stmt.selectACollection()) throw new InvalidRequestException(String.format(Cannot restrict PRIMARY KEY part %s by IN relation as a collection is selected by the query, cname)); It seems like #3885 will allow support for the first IF block above, but I don't think it will allow the second, am I correct? Any pointers on how I can work around this would be greatly appreciated. Kind regards, Dave
Re: Opscenter help?
On 3/13/14, 9:49 AM, Jack Krupansky wrote: You don't need any reputation points to ask a new question with an existing tag - just type opscenter in the Tags box under the question. Otherwise, how would any new user ever be able to ask a question and have it tagged?! I dunno, I don't use SE often - it draws a red box and says I need 300 points to be able to type 'opscenter' in the tags box. -- Drew from Zhrodague post-apocalyptic ad-hoc industrialist d...@zhrodague.net
Re: Opscenter help?
I'm happy to help here as well :) Can you give some more information? Specifically: What exact versions of EL5 and EL6 have you tried? What version of OpsCenter are you using? What file/dependency is rpm/yum saying conflicts with sudo? Also, you can find the OpsCenter documentation here http://www.datastax.com/documentation/opscenter/4.1/index.html, although this isn't an issue I've seen before. -Nick On Wed, Mar 12, 2014 at 1:51 PM, Drew from Zhrodague drewzhroda...@zhrodague.net wrote: I am having a hard time installing the Datastax Opscenter agents on EL6 and EL5 hosts. Where is an appropriate place to ask for help? Datastax has move their forums to Stack Exchange, which seems to be a waste of time, as I don't have enough reputation points to properly tag my questions. The agent installation seems to be broken: [] agent rpm conflicts with sudo [] install from opscenter does not work, even if manually installing the rpm (requres --force, conflicts with sudo) [] error message re: log4j #noconf [] Could not find the main class: opsagent.opsagent. Program will exit. [] No other (helpful/more in-depth) documentation exists -- Drew from Zhrodague post-apocalyptic ad-hoc industrialist d...@zhrodague.net
Re: CQL Select Map using an IN relationship
Create your table like this and it will work: CREATE TABLE test.documents (group text,id bigint,data maptext,text,PRIMARY KEY ((group, id))); The extra parens catenate 'group' and 'id' into the partition key - IN will work on the last component of a partition key. ml On Thu, Mar 13, 2014 at 10:40 AM, David Savage davemssav...@gmail.comwrote: Nope, upgraded to 2.0.5 and still get the same problem, I actually simplified the problem a little in my first post, there's a composite primary key involved as I need to partition ids into groups So the full CQL statements are: CREATE KEYSPACE test WITH replication = {'class':'SimpleStrategy', 'replication_factor':3}; CREATE TABLE test.documents (group text,id bigint,data maptext,text,PRIMARY KEY (group, id)); INSERT INTO test.documents(id,group,data) VALUES (0,'test',{'count':'0'}); INSERT INTO test.documents(id,group,data) VALUES (1,'test',{'count':'1'}); INSERT INTO test.documents(id,group,data) VALUES (2,'test',{'count':'2'}); SELECT id,data FROM test.documents WHERE group='test' AND id IN (0,1,2); Thanks for your help. Kind regards, /Dave On Thu, Mar 13, 2014 at 2:00 PM, David Savage davemssav...@gmail.comwrote: Hmmm that maybe the problem, I'm currently testing with 2.0.2 which got dragged in by the cassandra unit library I'm using for testing [1] I will try to fix my build dependencies and retry, thx. /Dave [1] https://github.com/jsevellec/cassandra-unit On Thu, Mar 13, 2014 at 1:56 PM, Laing, Michael michael.la...@nytimes.com wrote: I have no problem doing this w 2.0.5 - what version of C* are you using? Or maybe I don't understand your data model... attach 'creates' if you don't mind. ml On Thu, Mar 13, 2014 at 9:24 AM, David Savage davemssav...@gmail.comwrote: Hi Peter, Thanks for the help, unfortunately I'm not sure that's the problem, the id is the primary key on the documents table and the timestamp is the primary key on the eventlog table Kind regards, Dave On Thursday, 13 March 2014, Peter Lin wool...@gmail.com wrote: it's not clear to me if your id column is the KEY or just a regular column with secondary index. queries that have IN on non primary key columns isn't supported yet. not sure if that answers your question. On Thu, Mar 13, 2014 at 7:12 AM, David Savage davemssav...@gmail.comwrote: Hi there, I'm experimenting using cassandra and have run across an error message which I need a little more information on. The use case I'm experimenting with is a series of document updates (documents being an arbitrary map of key value pairs), I would like to find the latest document updates after a specified time period. I don't want to store many copies of the documents (one per update) as the updates are often only to single keys in the map so that would involve a lot of duplicated data. The solution I've found that seems to fit best in terms of performance is to have two tables. One that has an event log of timeuuid - docid and a second that stores the documents themselves stored by docid - mapstring, string. I then run two queries, one to select ids that have changed after a certain time: SELECT id FROM eventlog WHERE timestamp=minTimeuuid($minimumTime) and then a second to select the actual documents themselves SELECT id, data FROM documents WHERE id IN (0, 1, 2, 3, 4, 5, 6, 7…) However this then explodes on query with the error message: Cannot restrict PRIMARY KEY part id by IN relation as a collection is selected by the query Detective work lead me to these lines in org.apache.cassandra.cql3.statementsSelectStatement: // We only support IN for the last name and for compact storage so far // TODO: #3885 allows us to extend to non compact as well, but that remains to be done if (i != stmt.columnRestrictions.length - 1) throw new InvalidRequestException(String.format(PRIMARY KEY part %s cannot be restricted by IN relation, cname)); else if (stmt.selectACollection()) throw new InvalidRequestException(String.format(Cannot restrict PRIMARY KEY part %s by IN relation as a collection is selected by the query, cname)); It seems like #3885 will allow support for the first IF block above, but I don't think it will allow the second, am I correct? Any pointers on how I can work around this would be greatly appreciated. Kind regards, Dave
Re: Opscenter help?
I have seen the conflicts with sudo error but that was with 3.X rpm on the amazon ami, i was how ever able to install it from the tar ball. As Nick has pointed out, the versions of OS and Opscenter will help in looking at this. Thanks Rahul On Thu, Mar 13, 2014 at 7:56 PM, Nick Bailey n...@datastax.com wrote: I'm happy to help here as well :) Can you give some more information? Specifically: What exact versions of EL5 and EL6 have you tried? What version of OpsCenter are you using? What file/dependency is rpm/yum saying conflicts with sudo? Also, you can find the OpsCenter documentation here http://www.datastax.com/documentation/opscenter/4.1/index.html, although this isn't an issue I've seen before. -Nick On Wed, Mar 12, 2014 at 1:51 PM, Drew from Zhrodague drewzhroda...@zhrodague.net wrote: I am having a hard time installing the Datastax Opscenter agents on EL6 and EL5 hosts. Where is an appropriate place to ask for help? Datastax has move their forums to Stack Exchange, which seems to be a waste of time, as I don't have enough reputation points to properly tag my questions. The agent installation seems to be broken: [] agent rpm conflicts with sudo [] install from opscenter does not work, even if manually installing the rpm (requres --force, conflicts with sudo) [] error message re: log4j #noconf [] Could not find the main class: opsagent.opsagent. Program will exit. [] No other (helpful/more in-depth) documentation exists -- Drew from Zhrodague post-apocalyptic ad-hoc industrialist d...@zhrodague.net
Re: CQL Select Map using an IN relationship
probably a good idea to open a jira ticket to explain this better in the docs. the downside of moving so fast is the docs often fall behind and users have to dig around to figure things out. not everyone wants to read the CQL3 antlr grammar to figure things out. On Thu, Mar 13, 2014 at 11:27 AM, David Savage davemssav...@gmail.comwrote: Great that works, thx! I probably would have never found that... It now makes me wonder in general when to use PRIMARY KEY (key1, key2) or PRIMARY KEY ((key1, key2)), any examples would be welcome if you have the time. Kind regards, Dave On Thu, Mar 13, 2014 at 2:56 PM, Laing, Michael michael.la...@nytimes.com wrote: Create your table like this and it will work: CREATE TABLE test.documents (group text,id bigint,data maptext,text,PRIMARY KEY ((group, id))); The extra parens catenate 'group' and 'id' into the partition key - IN will work on the last component of a partition key. ml On Thu, Mar 13, 2014 at 10:40 AM, David Savage davemssav...@gmail.comwrote: Nope, upgraded to 2.0.5 and still get the same problem, I actually simplified the problem a little in my first post, there's a composite primary key involved as I need to partition ids into groups So the full CQL statements are: CREATE KEYSPACE test WITH replication = {'class':'SimpleStrategy', 'replication_factor':3}; CREATE TABLE test.documents (group text,id bigint,data maptext,text,PRIMARY KEY (group, id)); INSERT INTO test.documents(id,group,data) VALUES (0,'test',{'count':'0'}); INSERT INTO test.documents(id,group,data) VALUES (1,'test',{'count':'1'}); INSERT INTO test.documents(id,group,data) VALUES (2,'test',{'count':'2'}); SELECT id,data FROM test.documents WHERE group='test' AND id IN (0,1,2); Thanks for your help. Kind regards, /Dave On Thu, Mar 13, 2014 at 2:00 PM, David Savage davemssav...@gmail.comwrote: Hmmm that maybe the problem, I'm currently testing with 2.0.2 which got dragged in by the cassandra unit library I'm using for testing [1] I will try to fix my build dependencies and retry, thx. /Dave [1] https://github.com/jsevellec/cassandra-unit On Thu, Mar 13, 2014 at 1:56 PM, Laing, Michael michael.la...@nytimes.com wrote: I have no problem doing this w 2.0.5 - what version of C* are you using? Or maybe I don't understand your data model... attach 'creates' if you don't mind. ml On Thu, Mar 13, 2014 at 9:24 AM, David Savage davemssav...@gmail.comwrote: Hi Peter, Thanks for the help, unfortunately I'm not sure that's the problem, the id is the primary key on the documents table and the timestamp is the primary key on the eventlog table Kind regards, Dave On Thursday, 13 March 2014, Peter Lin wool...@gmail.com wrote: it's not clear to me if your id column is the KEY or just a regular column with secondary index. queries that have IN on non primary key columns isn't supported yet. not sure if that answers your question. On Thu, Mar 13, 2014 at 7:12 AM, David Savage davemssav...@gmail.com wrote: Hi there, I'm experimenting using cassandra and have run across an error message which I need a little more information on. The use case I'm experimenting with is a series of document updates (documents being an arbitrary map of key value pairs), I would like to find the latest document updates after a specified time period. I don't want to store many copies of the documents (one per update) as the updates are often only to single keys in the map so that would involve a lot of duplicated data. The solution I've found that seems to fit best in terms of performance is to have two tables. One that has an event log of timeuuid - docid and a second that stores the documents themselves stored by docid - mapstring, string. I then run two queries, one to select ids that have changed after a certain time: SELECT id FROM eventlog WHERE timestamp=minTimeuuid($minimumTime) and then a second to select the actual documents themselves SELECT id, data FROM documents WHERE id IN (0, 1, 2, 3, 4, 5, 6, 7...) However this then explodes on query with the error message: Cannot restrict PRIMARY KEY part id by IN relation as a collection is selected by the query Detective work lead me to these lines in org.apache.cassandra.cql3.statementsSelectStatement: // We only support IN for the last name and for compact storage so far // TODO: #3885 allows us to extend to non compact as well, but that remains to be done if (i != stmt.columnRestrictions.length - 1) throw new InvalidRequestException(String.format(PRIMARY KEY part %s cannot be restricted by IN relation, cname)); else if (stmt.selectACollection()) throw new InvalidRequestException(String.format(Cannot restrict PRIMARY KEY part %s by IN relation as a collection is selected by the query,
Re: 750Gb compaction task
M — Sent from Mailbox for iPhone On Thu, Mar 13, 2014 at 1:28 AM, Plotnik, Alexey aplot...@rhonda.ru wrote: After rebalance and cleanup I have leveled CF (SSTable size = 100MB) and a compaction Task that is going to process ~750GB: root@da1-node1:~# nodetool compactionstats pending tasks: 10556 compaction typekeyspace column family completed total unit progress Compaction cafs_chunks chunks 41015024065 808740269082 bytes 5.07% I have no space for this operation, I have 300 Gb only. Is it possible to resolve this situation?
Re: Dead node seen as UP by replacement node
And the token value as suggested is tokenvalueoddeadnode-1 ? On Thu, Mar 13, 2014 at 9:29 PM, Paulo Ricardo Motta Gomes paulo.mo...@chaordicsystems.com wrote: Nope, they have different IPs. I'm using the procedure described here to replace a dead node: http://www.datastax.com/docs/1.1/cluster_management#replacing-a-dead-node Dead node token: X (IP: Y) Replacement node token: X-1 (IP: Z) So, as soon as the replacement node (Z) is started, it sees the dead node (Y) as UP, and tries to stream data from it during the join process. About 10 minutes later, the failure detector of Z detects Y as down, but since it was trying to fetch data from him, it fails the join/bootstrap process altogether.
Re: CQL Select Map using an IN relationship
On Thu, Mar 13, 2014 at 12:12 PM, David Savage davemssav...@gmail.comwrote: Hi there, I'm experimenting using cassandra and have run across an error message which I need a little more information on. The use case I'm experimenting with is a series of document updates (documents being an arbitrary map of key value pairs), I would like to find the latest document updates after a specified time period. I don't want to store many copies of the documents (one per update) as the updates are often only to single keys in the map so that would involve a lot of duplicated data. The solution I've found that seems to fit best in terms of performance is to have two tables. One that has an event log of timeuuid - docid and a second that stores the documents themselves stored by docid - mapstring, string. I then run two queries, one to select ids that have changed after a certain time: SELECT id FROM eventlog WHERE timestamp=minTimeuuid($minimumTime) and then a second to select the actual documents themselves SELECT id, data FROM documents WHERE id IN (0, 1, 2, 3, 4, 5, 6, 7...) However this then explodes on query with the error message: Cannot restrict PRIMARY KEY part id by IN relation as a collection is selected by the query Detective work lead me to these lines in org.apache.cassandra.cql3.statementsSelectStatement: // We only support IN for the last name and for compact storage so far // TODO: #3885 allows us to extend to non compact as well, but that remains to be done if (i != stmt.columnRestrictions.length - 1) throw new InvalidRequestException(String.format(PRIMARY KEY part %s cannot be restricted by IN relation, cname)); else if (stmt.selectACollection()) throw new InvalidRequestException(String.format(Cannot restrict PRIMARY KEY part %s by IN relation as a collection is selected by the query, cname)); It seems like #3885 will allow support for the first IF block above, but I don't think it will allow the second, am I correct? Right, #3885 is about the first one. Tbh, the 2nd limitation is kind of historical and unless I'm forgetting something, we should be able to lift that pretty easily. If you don't mind opening a JIRA ticket, I'll have a look at removing said limitation. -- Sylvain Any pointers on how I can work around this would be greatly appreciated. Kind regards, Dave
Re: Dead node seen as UP by replacement node
Yes, exactly. On Thu, Mar 13, 2014 at 1:27 PM, Rahul Menon ra...@apigee.com wrote: And the token value as suggested is tokenvalueoddeadnode-1 ? On Thu, Mar 13, 2014 at 9:29 PM, Paulo Ricardo Motta Gomes paulo.mo...@chaordicsystems.com wrote: Nope, they have different IPs. I'm using the procedure described here to replace a dead node: http://www.datastax.com/docs/1.1/cluster_management#replacing-a-dead-node Dead node token: X (IP: Y) Replacement node token: X-1 (IP: Z) So, as soon as the replacement node (Z) is started, it sees the dead node (Y) as UP, and tries to stream data from it during the join process. About 10 minutes later, the failure detector of Z detects Y as down, but since it was trying to fetch data from him, it fails the join/bootstrap process altogether. -- *Paulo Motta* Chaordic | *Platform* *www.chaordic.com.br http://www.chaordic.com.br/* +55 48 3232.3200 +55 83 9690-1314
Re: CQL Select Map using an IN relationship
Think of them as: PRIMARY KEY (partition_key[, range_key]) where the partition_key can be compounded as: (partition_key0 [, partition_key1, ...]) and the optional range_key can be compounded as: range_key0 [, range_key1 ...] If you do this: PRIMARY KEY (key1, key2) - then key1 is the partition_key and key2 is the range_key and queries will work that hash to key1 (the partition) using = or IN and specify a range on key2. But if you do this: PRIMARY key ((key1, key2)) then (key1, key2) is the compound partition key - there is no range key - and you can specify = on key1 and = or IN on key2 (but not a range). Anyway that's what I remember! Hope it helps. ml On Thu, Mar 13, 2014 at 11:27 AM, David Savage davemssav...@gmail.comwrote: Great that works, thx! I probably would have never found that... It now makes me wonder in general when to use PRIMARY KEY (key1, key2) or PRIMARY KEY ((key1, key2)), any examples would be welcome if you have the time. Kind regards, Dave On Thu, Mar 13, 2014 at 2:56 PM, Laing, Michael michael.la...@nytimes.com wrote: Create your table like this and it will work: CREATE TABLE test.documents (group text,id bigint,data maptext,text,PRIMARY KEY ((group, id))); The extra parens catenate 'group' and 'id' into the partition key - IN will work on the last component of a partition key. ml On Thu, Mar 13, 2014 at 10:40 AM, David Savage davemssav...@gmail.comwrote: Nope, upgraded to 2.0.5 and still get the same problem, I actually simplified the problem a little in my first post, there's a composite primary key involved as I need to partition ids into groups So the full CQL statements are: CREATE KEYSPACE test WITH replication = {'class':'SimpleStrategy', 'replication_factor':3}; CREATE TABLE test.documents (group text,id bigint,data maptext,text,PRIMARY KEY (group, id)); INSERT INTO test.documents(id,group,data) VALUES (0,'test',{'count':'0'}); INSERT INTO test.documents(id,group,data) VALUES (1,'test',{'count':'1'}); INSERT INTO test.documents(id,group,data) VALUES (2,'test',{'count':'2'}); SELECT id,data FROM test.documents WHERE group='test' AND id IN (0,1,2); Thanks for your help. Kind regards, /Dave On Thu, Mar 13, 2014 at 2:00 PM, David Savage davemssav...@gmail.comwrote: Hmmm that maybe the problem, I'm currently testing with 2.0.2 which got dragged in by the cassandra unit library I'm using for testing [1] I will try to fix my build dependencies and retry, thx. /Dave [1] https://github.com/jsevellec/cassandra-unit On Thu, Mar 13, 2014 at 1:56 PM, Laing, Michael michael.la...@nytimes.com wrote: I have no problem doing this w 2.0.5 - what version of C* are you using? Or maybe I don't understand your data model... attach 'creates' if you don't mind. ml On Thu, Mar 13, 2014 at 9:24 AM, David Savage davemssav...@gmail.comwrote: Hi Peter, Thanks for the help, unfortunately I'm not sure that's the problem, the id is the primary key on the documents table and the timestamp is the primary key on the eventlog table Kind regards, Dave On Thursday, 13 March 2014, Peter Lin wool...@gmail.com wrote: it's not clear to me if your id column is the KEY or just a regular column with secondary index. queries that have IN on non primary key columns isn't supported yet. not sure if that answers your question. On Thu, Mar 13, 2014 at 7:12 AM, David Savage davemssav...@gmail.com wrote: Hi there, I'm experimenting using cassandra and have run across an error message which I need a little more information on. The use case I'm experimenting with is a series of document updates (documents being an arbitrary map of key value pairs), I would like to find the latest document updates after a specified time period. I don't want to store many copies of the documents (one per update) as the updates are often only to single keys in the map so that would involve a lot of duplicated data. The solution I've found that seems to fit best in terms of performance is to have two tables. One that has an event log of timeuuid - docid and a second that stores the documents themselves stored by docid - mapstring, string. I then run two queries, one to select ids that have changed after a certain time: SELECT id FROM eventlog WHERE timestamp=minTimeuuid($minimumTime) and then a second to select the actual documents themselves SELECT id, data FROM documents WHERE id IN (0, 1, 2, 3, 4, 5, 6, 7…) However this then explodes on query with the error message: Cannot restrict PRIMARY KEY part id by IN relation as a collection is selected by the query Detective work lead me to these lines in org.apache.cassandra.cql3.statementsSelectStatement: // We only support IN for the last name and for compact storage so far // TODO: #3885 allows us to extend to non compact as well, but that remains to be done if (i !=
Re: 750Gb compaction task
On Wed, Mar 12, 2014 at 10:27 PM, Plotnik, Alexey aplot...@rhonda.ruwrote: I have no space for this operation, I have 300 Gb only. Is it possible to resolve this situation? compactionstats shows non-compressed size. As long as you have compression enabled, which is the default, you should be fine. =Rob
Re: Problems with adding datacenter and schema version disagreement
On Thu, Mar 13, 2014 at 2:05 AM, olek.stas...@gmail.com olek.stas...@gmail.com wrote: Bump, are there any solutions to bring my cluster back to schema consistency? I've 6 node cluster with exactly six versions of schema, how to deal with it? The simplest way, which is most likely to actually work, is to down all nodes, nuke schema, and reload it from a dump. =Rob
Re: CQL Select Map using an IN relationship
“range key” is formally known as “clustering column”. One or more clustering columns can be specified to identify individual rows in a partition. Without clustering columns, one partition is one row. So, it’s a matter of whether you want your rows to be in the same partition or distributed. -- Jack Krupansky From: Laing, Michael Sent: Thursday, March 13, 2014 1:39 PM To: user@cassandra.apache.org Subject: Re: CQL Select Map using an IN relationship Think of them as: PRIMARY KEY (partition_key[, range_key]) where the partition_key can be compounded as: (partition_key0 [, partition_key1, ...]) and the optional range_key can be compounded as: range_key0 [, range_key1 ...] If you do this: PRIMARY KEY (key1, key2) - then key1 is the partition_key and key2 is the range_key and queries will work that hash to key1 (the partition) using = or IN and specify a range on key2. But if you do this: PRIMARY key ((key1, key2)) then (key1, key2) is the compound partition key - there is no range key - and you can specify = on key1 and = or IN on key2 (but not a range). Anyway that's what I remember! Hope it helps. ml On Thu, Mar 13, 2014 at 11:27 AM, David Savage davemssav...@gmail.com wrote: Great that works, thx! I probably would have never found that... It now makes me wonder in general when to use PRIMARY KEY (key1, key2) or PRIMARY KEY ((key1, key2)), any examples would be welcome if you have the time. Kind regards, Dave On Thu, Mar 13, 2014 at 2:56 PM, Laing, Michael michael.la...@nytimes.com wrote: Create your table like this and it will work: CREATE TABLE test.documents (group text,id bigint,data maptext,text,PRIMARY KEY ((group, id))); The extra parens catenate 'group' and 'id' into the partition key - IN will work on the last component of a partition key. ml On Thu, Mar 13, 2014 at 10:40 AM, David Savage davemssav...@gmail.com wrote: Nope, upgraded to 2.0.5 and still get the same problem, I actually simplified the problem a little in my first post, there's a composite primary key involved as I need to partition ids into groups So the full CQL statements are: CREATE KEYSPACE test WITH replication = {'class':'SimpleStrategy', 'replication_factor':3}; CREATE TABLE test.documents (group text,id bigint,data maptext,text,PRIMARY KEY (group, id)); INSERT INTO test.documents(id,group,data) VALUES (0,'test',{'count':'0'}); INSERT INTO test.documents(id,group,data) VALUES (1,'test',{'count':'1'}); INSERT INTO test.documents(id,group,data) VALUES (2,'test',{'count':'2'}); SELECT id,data FROM test.documents WHERE group='test' AND id IN (0,1,2); Thanks for your help. Kind regards, /Dave On Thu, Mar 13, 2014 at 2:00 PM, David Savage davemssav...@gmail.com wrote: Hmmm that maybe the problem, I'm currently testing with 2.0.2 which got dragged in by the cassandra unit library I'm using for testing [1] I will try to fix my build dependencies and retry, thx. /Dave [1] https://github.com/jsevellec/cassandra-unit On Thu, Mar 13, 2014 at 1:56 PM, Laing, Michael michael.la...@nytimes.com wrote: I have no problem doing this w 2.0.5 - what version of C* are you using? Or maybe I don't understand your data model... attach 'creates' if you don't mind. ml On Thu, Mar 13, 2014 at 9:24 AM, David Savage davemssav...@gmail.com wrote: Hi Peter, Thanks for the help, unfortunately I'm not sure that's the problem, the id is the primary key on the documents table and the timestamp is the primary key on the eventlog table Kind regards, Dave On Thursday, 13 March 2014, Peter Lin wool...@gmail.com wrote: it's not clear to me if your id column is the KEY or just a regular column with secondary index. queries that have IN on non primary key columns isn't supported yet. not sure if that answers your question. On Thu, Mar 13, 2014 at 7:12 AM, David Savage davemssav...@gmail.com wrote: Hi there, I'm experimenting using cassandra and have run across an error message which I need a little more information on. The use case I'm experimenting with is a series of document updates (documents being an arbitrary map of key value pairs), I would like to find the latest document updates after a specified time period. I don't want to store many copies of the documents (one per update) as the updates are often only to single keys in the map so that would involve a lot of duplicated data. The solution I've found that seems to fit best in terms of performance is to have two tables. One that has an event log of timeuuid - docid and a second that stores the documents themselves stored by docid -
Re: CQL Select Map using an IN relationship
I have found that range_key communicates better what you can actually do with them, whereas clustering is more passive. ml On Thu, Mar 13, 2014 at 2:08 PM, Jack Krupansky j...@basetechnology.comwrote: “range key” is formally known as “clustering column”. One or more clustering columns can be specified to identify individual rows in a partition. Without clustering columns, one partition is one row. So, it’s a matter of whether you want your rows to be in the same partition or distributed. -- Jack Krupansky *From:* Laing, Michael michael.la...@nytimes.com *Sent:* Thursday, March 13, 2014 1:39 PM *To:* user@cassandra.apache.org *Subject:* Re: CQL Select Map using an IN relationship Think of them as: PRIMARY KEY (partition_key[, range_key]) where the partition_key can be compounded as: (partition_key0 [, partition_key1, ...]) and the optional range_key can be compounded as: range_key0 [, range_key1 ...] If you do this: PRIMARY KEY (key1, key2) - then key1 is the partition_key and key2 is the range_key and queries will work that hash to key1 (the partition) using = or IN and specify a range on key2. But if you do this: PRIMARY key ((key1, key2)) then (key1, key2) is the compound partition key - there is no range key - and you can specify = on key1 and = or IN on key2 (but not a range). Anyway that's what I remember! Hope it helps. ml On Thu, Mar 13, 2014 at 11:27 AM, David Savage davemssav...@gmail.comwrote: Great that works, thx! I probably would have never found that... It now makes me wonder in general when to use PRIMARY KEY (key1, key2) or PRIMARY KEY ((key1, key2)), any examples would be welcome if you have the time. Kind regards, Dave On Thu, Mar 13, 2014 at 2:56 PM, Laing, Michael michael.la...@nytimes.com wrote: Create your table like this and it will work: CREATE TABLE test.documents (group text,id bigint,data maptext,text,PRIMARY KEY ((group, id))); The extra parens catenate 'group' and 'id' into the partition key - IN will work on the last component of a partition key. ml On Thu, Mar 13, 2014 at 10:40 AM, David Savage davemssav...@gmail.comwrote: Nope, upgraded to 2.0.5 and still get the same problem, I actually simplified the problem a little in my first post, there's a composite primary key involved as I need to partition ids into groups So the full CQL statements are: CREATE KEYSPACE test WITH replication = {'class':'SimpleStrategy', 'replication_factor':3}; CREATE TABLE test.documents (group text,id bigint,data maptext,text,PRIMARY KEY (group, id)); INSERT INTO test.documents(id,group,data) VALUES (0,'test',{'count':'0'}); INSERT INTO test.documents(id,group,data) VALUES (1,'test',{'count':'1'}); INSERT INTO test.documents(id,group,data) VALUES (2,'test',{'count':'2'}); SELECT id,data FROM test.documents WHERE group='test' AND id IN (0,1,2); Thanks for your help. Kind regards, /Dave On Thu, Mar 13, 2014 at 2:00 PM, David Savage davemssav...@gmail.comwrote: Hmmm that maybe the problem, I'm currently testing with 2.0.2 which got dragged in by the cassandra unit library I'm using for testing [1] I will try to fix my build dependencies and retry, thx. /Dave [1] https://github.com/jsevellec/cassandra-unit On Thu, Mar 13, 2014 at 1:56 PM, Laing, Michael michael.la...@nytimes.com wrote: I have no problem doing this w 2.0.5 - what version of C* are you using? Or maybe I don't understand your data model... attach 'creates' if you don't mind. ml On Thu, Mar 13, 2014 at 9:24 AM, David Savage davemssav...@gmail.com wrote: Hi Peter, Thanks for the help, unfortunately I'm not sure that's the problem, the id is the primary key on the documents table and the timestamp is the primary key on the eventlog table Kind regards, Dave On Thursday, 13 March 2014, Peter Lin wool...@gmail.com wrote: it's not clear to me if your id column is the KEY or just a regular column with secondary index. queries that have IN on non primary key columns isn't supported yet. not sure if that answers your question. On Thu, Mar 13, 2014 at 7:12 AM, David Savage davemssav...@gmail.com wrote: Hi there, I'm experimenting using cassandra and have run across an error message which I need a little more information on. The use case I'm experimenting with is a series of document updates (documents being an arbitrary map of key value pairs), I would like to find the latest document updates after a specified time period. I don't want to store many copies of the documents (one per update) as the updates are often only to single keys in the map so that would involve a lot of duplicated data. The solution I've found that seems to fit best in terms of performance is to have two tables. One that has an event log of timeuuid - docid and a second that stores the documents themselves stored by docid - mapstring, string. I then run two queries,
Re: CQL Select Map using an IN relationship
Thanks for the explanations. To confirm I understand, Michaels explanation seems to say that that : * the partition key supports =/IN but not ,=,,= * the range key (or clustering column) supports =,,=,,= but not IN. Is that correct? Jacks explanation seems to say that by grouping the two columns in the primary key ((key1, key2)) this will prevent data from being partitioned across nodes in the cluster, is that correct? Also in another response thread Sylvian seemed to hint that it's historical that IN is not supported on the range key / clustering column [1]. If I've understood that correctly I'm happy to raise a jira ticket to track this so it can be fixed. Thanks for your help. Kind regards, Dave [1] Please let me know if I should pick one of these terms for clarity... On Thu, Mar 13, 2014 at 6:08 PM, Jack Krupansky j...@basetechnology.comwrote: range key is formally known as clustering column. One or more clustering columns can be specified to identify individual rows in a partition. Without clustering columns, one partition is one row. So, it's a matter of whether you want your rows to be in the same partition or distributed. -- Jack Krupansky *From:* Laing, Michael michael.la...@nytimes.com *Sent:* Thursday, March 13, 2014 1:39 PM *To:* user@cassandra.apache.org *Subject:* Re: CQL Select Map using an IN relationship Think of them as: PRIMARY KEY (partition_key[, range_key]) where the partition_key can be compounded as: (partition_key0 [, partition_key1, ...]) and the optional range_key can be compounded as: range_key0 [, range_key1 ...] If you do this: PRIMARY KEY (key1, key2) - then key1 is the partition_key and key2 is the range_key and queries will work that hash to key1 (the partition) using = or IN and specify a range on key2. But if you do this: PRIMARY key ((key1, key2)) then (key1, key2) is the compound partition key - there is no range key - and you can specify = on key1 and = or IN on key2 (but not a range). Anyway that's what I remember! Hope it helps. ml On Thu, Mar 13, 2014 at 11:27 AM, David Savage davemssav...@gmail.comwrote: Great that works, thx! I probably would have never found that... It now makes me wonder in general when to use PRIMARY KEY (key1, key2) or PRIMARY KEY ((key1, key2)), any examples would be welcome if you have the time. Kind regards, Dave On Thu, Mar 13, 2014 at 2:56 PM, Laing, Michael michael.la...@nytimes.com wrote: Create your table like this and it will work: CREATE TABLE test.documents (group text,id bigint,data maptext,text,PRIMARY KEY ((group, id))); The extra parens catenate 'group' and 'id' into the partition key - IN will work on the last component of a partition key. ml On Thu, Mar 13, 2014 at 10:40 AM, David Savage davemssav...@gmail.comwrote: Nope, upgraded to 2.0.5 and still get the same problem, I actually simplified the problem a little in my first post, there's a composite primary key involved as I need to partition ids into groups So the full CQL statements are: CREATE KEYSPACE test WITH replication = {'class':'SimpleStrategy', 'replication_factor':3}; CREATE TABLE test.documents (group text,id bigint,data maptext,text,PRIMARY KEY (group, id)); INSERT INTO test.documents(id,group,data) VALUES (0,'test',{'count':'0'}); INSERT INTO test.documents(id,group,data) VALUES (1,'test',{'count':'1'}); INSERT INTO test.documents(id,group,data) VALUES (2,'test',{'count':'2'}); SELECT id,data FROM test.documents WHERE group='test' AND id IN (0,1,2); Thanks for your help. Kind regards, /Dave On Thu, Mar 13, 2014 at 2:00 PM, David Savage davemssav...@gmail.comwrote: Hmmm that maybe the problem, I'm currently testing with 2.0.2 which got dragged in by the cassandra unit library I'm using for testing [1] I will try to fix my build dependencies and retry, thx. /Dave [1] https://github.com/jsevellec/cassandra-unit On Thu, Mar 13, 2014 at 1:56 PM, Laing, Michael michael.la...@nytimes.com wrote: I have no problem doing this w 2.0.5 - what version of C* are you using? Or maybe I don't understand your data model... attach 'creates' if you don't mind. ml On Thu, Mar 13, 2014 at 9:24 AM, David Savage davemssav...@gmail.com wrote: Hi Peter, Thanks for the help, unfortunately I'm not sure that's the problem, the id is the primary key on the documents table and the timestamp is the primary key on the eventlog table Kind regards, Dave On Thursday, 13 March 2014, Peter Lin wool...@gmail.com wrote: it's not clear to me if your id column is the KEY or just a regular column with secondary index. queries that have IN on non primary key columns isn't supported yet. not sure if that answers your question. On Thu, Mar 13, 2014 at 7:12 AM, David Savage davemssav...@gmail.com wrote: Hi there, I'm experimenting using cassandra and have run across an error message which I need a little more
Re: Problems with adding datacenter and schema version disagreement
Huh, you mean json dump? Regards Aleksander 2014-03-13 18:59 GMT+01:00 Robert Coli rc...@eventbrite.com: On Thu, Mar 13, 2014 at 2:05 AM, olek.stas...@gmail.com olek.stas...@gmail.com wrote: Bump, are there any solutions to bring my cluster back to schema consistency? I've 6 node cluster with exactly six versions of schema, how to deal with it? The simplest way, which is most likely to actually work, is to down all nodes, nuke schema, and reload it from a dump. =Rob
Need help understanding hinted_handoff_throttle_in_kb
I came across something on the cassandra it that made me concerned. Default value for hinted_handoff_throttle_in_kb is 1024, one Meg per second. I have four nodes and rf=2. I have hints timeout set to 24, to avoid having to do repairs if I took longer than that to reboot a node. What got me thinking though is that if I'm generating gigabytes worth of hints during the day and across four nodes the throttle becomes 250k per second, that is too slow to replay all of my hints properly. Is tht right ? I need to understand this setting better. I would like to make sure that all of my hints get replayed. What is a recommended setting ? Any input is greatly appreciated. Regards, Oleg
Re: Problems with adding datacenter and schema version disagreement
On Thu, Mar 13, 2014 at 1:20 PM, olek.stas...@gmail.com olek.stas...@gmail.com wrote: Huh, you mean json dump? If you're using cassandra-cli, I mean the output of show schema; If you're using CQLsh, there is an analogous way to show all schema. 1) dump schema to a file via one of the above tools 2) stop cassandra and nuke system keyspaces everywhere 3) start cassandra, coalesce cluster 4) load schema =Rob
1.2: Why can't I see what is in hints CF ?
Check this out: [default@system] list hints limit 10; Using default cell limit of 100 null TimedOutException() at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12932) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:734) at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:718) at org.apache.cassandra.cli.CliClient.executeList(CliClient.java:1495) at org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:279) at org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.java:213) at org.apache.cassandra.cli.CliMain.main(CliMain.java:339) My nodes are accumulating hints and I am wondering what in the world is going on... -- Regards, Oleg Dulin http://www.olegdulin.com
Re: How to guarantee consistency between counter and materialized view?
Robert Coli rc...@eventbrite.com wrote: On Tue, Mar 11, 2014 at 4:30 PM, ziju feng pkdog...@gmail.com wrote: Is there any way to guarantee a counter's value no. =Rob I wouldn't use cassandra for counters... Use something like redis if that is what you want.
Re: CQL Select Map using an IN relationship
These are my personal opinions, reflecting both my long experience w database systems, and my newness to Cassandra... [tl;dr] The Cassandra contributors, having made its history, tend to describe it in terms of implementation rather than action. And its implementation has a history, all relatively recent, that many know, but which to newcomers like me is obscure and, frankly, not particularly relevant. Note: we are all trying to understand Crimea now, and to really understand, you have to ingest several hundred years of history. Luckily, Cassandra has not been around quite so long! But Cassandra's history creeps into the nomenclature of CQL3. So what might logically be called a 'hash key' is called a 'partition key', what is called a 'clustering key' might be better termed a 'range key' IMHO. The 'official' terms in the nomenclature are important to know, they are just not descriptive of the actions one takes as a user of them. However, they have meaning to those who have 'lived' the history of Cassandra, and form an important bridge to the past. As a new user I found them non-intuitive. Amazon has done a much better job with DynamoDB - muddled, however, by bad syntax choices. But you adjust and mentally map... I am still bumfuzzled when people talk of slices and other C* cruft but just let it slide by like lectures from my mother. That and thrift can just fade into history with gopher and lynx as far as I am concerned - CQL3 is where it's at. But another thing to remember is that performance is king - and to get performance you fly 'close to the metal': Cassandra does that and you should know the code paths, the physical structures, and the characteristics of your 'metal' to understand how to build high-performing apps. *** The answer to both asterisks is Yes. You should use the term 'clustering column' because that is what is in the docs - but you should think 'range key' for how you use it. Similarly 'partition key' : 'hash key'. Good luck, ml