Re: DataType protocol ID error for TIMESTAMPs when upgrading from 1.2.11 to 2.0.9
On Tue, Jul 22, 2014 at 1:26 AM, Robert Coli rc...@eventbrite.com wrote: I'm pretty sure reversed comparator timestamps are a common type of schema, given that there are blog posts recommending their use, so I struggle to understand how this was not detected by unit tests. As Karl has suggested, client driver maintainers have opted to workaround the issue. At gocql, when we ran into this issue, we began a discussion thread to see if this was likely to be a client side or a server side bug. Because we didn't get a response from the discussion, we thought that the most pragmatic thing to do was to implement a workaround in the client. Potentially other driver maintainers have taken a similar course of action. As for the unit tests, I think this issue was only reproducible when upgrading a schema to 2.0.x - are you suggesting that there was/is test coverage for this scenario in the server?
Error :AssertionError = firstTokenIndex(TokenMetadata.java:845)
hi all, I trying add a node to a cassandra ring with only one seed-node. I have the seed in EC2 and I have this error when I start cassandra in the other node ERROR [Thrift:389] 2014-07-22 08:25:39,838 CassandraDaemon.java (line 191) Exception in thread Thread[Thrift:389,5,main] java.lang.AssertionError at org.apache.cassandra.locator.TokenMetadata.firstTokenIndex(TokenMetadata.java:845) at org.apache.cassandra.locator.TokenMetadata.firstToken(TokenMetadata.java:859) at org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:106) at org.apache.cassandra.service.StorageService.getNaturalEndpoints(StorageService.java:2681) at org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:376) at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:191) at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:866) at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:849) at org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:749) at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3690) at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3678) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) ERROR [Thrift:390] 2014-07-22 08:25:41,169 CassandraDaemon.java (line 191) Exception in thread Thread[Thrift:390,5,main] java.lang.AssertionError at org.apache.cassandra.locator.TokenMetadata.firstTokenIndex(TokenMetadata.java:845) at org.apache.cassandra.locator.TokenMetadata.firstToken(TokenMetadata.java:859) at org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:106) at org.apache.cassandra.service.StorageService.getNaturalEndpoints(StorageService.java:2681) at org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:376) at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:191) at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:866) at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:849) at org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:749) at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3690) at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3678) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) ERROR [Thrift:391] 2014-07-22 08:25:44,578 CassandraDaemon.java (line 191) Exception in thread Thread[Thrift:391,5,main] java.lang.AssertionError at org.apache.cassandra.locator.TokenMetadata.firstTokenIndex(TokenMetadata.java:845) at org.apache.cassandra.locator.TokenMetadata.firstToken(TokenMetadata.java:859) at org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:106) at org.apache.cassandra.service.StorageService.getNaturalEndpoints(StorageService.java:2681) at org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:376) at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:191) at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:866) at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:849) at org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:749) at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3690) at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3678) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) -- I do an AMI from the original seed cassandra instance EC2 and delete all data an config listen
I want either all the DML statements within the batch succeed or rollback all. is it possible?
Hi all, In the user guide of Cassandra i got the information about the batch for atomic DML operations. I want either all the DML statements within the batch succeed or rollback all. is it possible? another question in my can i use joins in Cassandra or any other way to achieve it. Regards Tarkeshwar
Re: Authentication exception
Verified all clocks are in sync. On Mon, Jul 21, 2014 at 10:03 PM, Rahul Menon ra...@apigee.com wrote: I could you perhaps check your ntp? On Tue, Jul 22, 2014 at 3:35 AM, Jeremy Jongsma jer...@barchart.com wrote: I routinely get this exception from cqlsh on one of my clusters: cql.cassandra.ttypes.AuthenticationException: AuthenticationException(why='org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 2 responses.') The system_auth keyspace is set to replicate X times given X nodes in each datacenter, and at the time of the exception all nodes are reporting as online and healthy. After a short period (i.e. 30 minutes), it will let me in again. What could be the cause of this?
JSON to Cassandra ?
Hi guys, I know this topic as already been spoken many times, and I read a lot of these discussions. Yet, I have not been able to find a good way to do what I want. We are receiving messages from our app that is a complex, dynamic, nested JSON (can be a few or thousands of attributes). JSON is variable and can contain nested arrays or sub-JSONs. Please, consider this example: JSON { struct-id: 141241321, nested-1-1: { value-1-1-1: 36d1f74d-1663-418d-8b1b-665bbb2d9ecb, value-1-1-2: 5, value-1-1-3: 0.5, value-1-1-4: [foo, bar, foobar], nested-2-1: { test-2-1-1: whatever, test-2-1-2: 42 } }, nested-1-2: { value-1-2-1: [{ id: 1, deeply-nested: { data-1: test, data-2: 4023 } }, { id: 2, data-3: that's enough data }] } } We would like to store those messages to Cassandra and then run SPARK jobs over it. Basically, storing it as a text (full JSON in one column) would work but wouldn't be optimised since I might want to count how many times value-1-1-3 is bigger or equal to 1, I would have to read all the JSON before answering this. I read a lot of things about people using composite columns and dynamic composite columns, but no precise example. I am also aware of collections support, yet nested collections are not supported currently. I would like to have: - 1 column per attribute - typed values - something that would be able to parse and store any valid JSON (with nested arrays of JSON or whatever). - The most efficient model to use alongside with spark to query anything inside. What would be the possible CQL schemas to create such a data structure ? What are the defaults of the following schema ? Cassandra CREATE TABLE test-schema ( struct-id int, nested-1-1#value-1-1-1 string, nested-1-1#value-1-1-2 int, nested-1-1#value-1-1-3 float, nested-1-1#value-1-1-4#array0 string, nested-1-1#value-1-1-4#array1 string, nested-1-1#value-1-1-4#array2 string, nested-1-1#nested-2-1#test-2-1-1 string, nested-1-1#nested-2-1#test-2-1-2 int, nested-1-2#value-1-2-1#array0#id int, nested-1-2#value-1-2-1#array0#deeply-nested#data-1 string, nested-1-2#value-1-2-1#array0#deeply-nested#data-2 int, nested-1-2#id int, nested-1-2#data-3 string, PRIMARY KEY (struct-id) ) I could use: nested-1-1#value-1-1-4 liststring, instead of: nested-1-1#value-1-1-4#array0 string, nested-1-1#value-1-1-4#array1 string, nested-1-1#value-1-1-4#array2 string, yet it wouldn't work here: nested-1-2#value-1-2-1#array0#deeply-nested#data-1 string, nested-1-2#value-1-2-1#array0#deeply-nested#data-2 int, nested-1-2#value-1-2-1#array1#id int, nested-1-2#value-1-2-1#array1#data-3 string, since this is a nested structure inside the list. To create this schema, could we imagine that the app logging this try to write to the corresponding column, for each JSON attribute, and if the column is missing, catch the error, create the column and reprocess write ? This exception would happen for each new field, only once and would modify the schema. Any thought that would help us (and probably more people) ? Alain
Re: I want either all the DML statements within the batch succeed or rollback all. is it possible?
No joins in Cassandra. But... with DataStax Enterprise (DSE) which integrates Solr with Cassandra, limited Join support is available. In particular, an outer join between two tables provided that they share identical partition key values, so that the joined data is guaranteed to be on the same node. For example, you could join a “customer” table to a “customer-order” table. Or a “user” table to a “user-comment” table. That said, the primary focus should always be to denormalize or flatten your data, sometimes with materialized views, to the extent possible since arbitrary, open-ended SQL-like joins can be horrendously expensive. -- Jack Krupansky From: M.Tarkeshwar Rao Sent: Tuesday, July 22, 2014 9:45 AM To: user@cassandra.apache.org Subject: I want either all the DML statements within the batch succeed or rollback all. is it possible? Hi all, In the user guide of Cassandra i got the information about the batch for atomic DML operations. I want either all the DML statements within the batch succeed or rollback all. is it possible? another question in my can i use joins in Cassandra or any other way to achieve it. Regards Tarkeshwar
Re: JSON to Cassandra ?
DSE, with Solr integration, does provide “field input transformers” so that you can parse a column in JSON or any other format and then split it into any number of Solr fields, including dynamic fields, which would then let you query elements of that JSON. -- Jack Krupansky From: Alain RODRIGUEZ Sent: Tuesday, July 22, 2014 11:29 AM To: user@cassandra.apache.org Subject: Re: JSON to Cassandra ? Hi, This seems to fit, even if I would need to have to look on how these fields can be queried and indexed. Also, I would need to see if those UDF can be modified once created and how they behave in this use case. Yet, 2.1 is currently in beta, and we won't switch to this version immediately (even if we could take profit of this and improved counters also...) since we are using C*1.2 and are giving a try at DSE 4.5. In both cases, we are far from using 2.1. How does people use to do this without UDF ? Thanks for the pointer though, will probably help someday :-). 2014-07-22 16:30 GMT+02:00 Jack Krupansky j...@basetechnology.com: Sounds like user-defined types (UDF) in Cassandra 2.1: https://issues.apache.org/jira/browse/CASSANDRA-5590 But... be careful to make sure that you aren’t using this powerful (and dangerous) feature as a crutch merely to avoid disciplined data modeling. -- Jack Krupansky From: Alain RODRIGUEZ Sent: Tuesday, July 22, 2014 9:56 AM To: user@cassandra.apache.org Subject: JSON to Cassandra ? Hi guys, I know this topic as already been spoken many times, and I read a lot of these discussions. Yet, I have not been able to find a good way to do what I want. We are receiving messages from our app that is a complex, dynamic, nested JSON (can be a few or thousands of attributes). JSON is variable and can contain nested arrays or sub-JSONs. Please, consider this example: JSON { struct-id: 141241321, nested-1-1: { value-1-1-1: 36d1f74d-1663-418d-8b1b-665bbb2d9ecb, value-1-1-2: 5, value-1-1-3: 0.5, value-1-1-4: [foo, bar, foobar], nested-2-1: { test-2-1-1: whatever, test-2-1-2: 42 } }, nested-1-2: { value-1-2-1: [{ id: 1, deeply-nested: { data-1: test, data-2: 4023 } }, { id: 2, data-3: that's enough data }] } } We would like to store those messages to Cassandra and then run SPARK jobs over it. Basically, storing it as a text (full JSON in one column) would work but wouldn't be optimised since I might want to count how many times value-1-1-3 is bigger or equal to 1, I would have to read all the JSON before answering this. I read a lot of things about people using composite columns and dynamic composite columns, but no precise example. I am also aware of collections support, yet nested collections are not supported currently. I would like to have: - 1 column per attribute - typed values - something that would be able to parse and store any valid JSON (with nested arrays of JSON or whatever). - The most efficient model to use alongside with spark to query anything inside. What would be the possible CQL schemas to create such a data structure ? What are the defaults of the following schema ? Cassandra CREATE TABLE test-schema ( struct-id int, nested-1-1#value-1-1-1 string, nested-1-1#value-1-1-2 int, nested-1-1#value-1-1-3 float, nested-1-1#value-1-1-4#array0 string, nested-1-1#value-1-1-4#array1 string, nested-1-1#value-1-1-4#array2 string, nested-1-1#nested-2-1#test-2-1-1 string, nested-1-1#nested-2-1#test-2-1-2 int, nested-1-2#value-1-2-1#array0#id int, nested-1-2#value-1-2-1#array0#deeply-nested#data-1 string, nested-1-2#value-1-2-1#array0#deeply-nested#data-2 int, nested-1-2#id int, nested-1-2#data-3 string, PRIMARY KEY (struct-id) ) I could use: nested-1-1#value-1-1-4 liststring, instead of: nested-1-1#value-1-1-4#array0 string, nested-1-1#value-1-1-4#array1 string, nested-1-1#value-1-1-4#array2 string, yet it wouldn't work here: nested-1-2#value-1-2-1#array0#deeply-nested#data-1 string, nested-1-2#value-1-2-1#array0#deeply-nested#data-2 int, nested-1-2#value-1-2-1#array1#id int, nested-1-2#value-1-2-1#array1#data-3 string, since this is a nested structure inside the list. To create this schema, could we imagine that the app logging this try to write to the corresponding column, for each JSON attribute, and if the column is missing, catch the error, create the column and reprocess write ? This exception would happen for each new field, only once and would modify the schema. Any thought that would help us (and probably more people) ? Alain
Re: Which way to Cassandraville?
Correction, I mean vendor specific. Proprietary is OK so long as there aren't any lock-in tricks or they can be dodged easy. Jim C. On 07/22/2014 12:12 PM, jcllings wrote: Does it have an annotation scheme or arrangement so I don't have to put proprietary stuff in my Java? Jim C. On 07/20/2014 06:24 PM, Kevin Burton wrote: I just finished reading Cassandra: The Definitive Guide which seems pretty out of date and while very informative as to the technology that Cassandra uses, was not very helpful from the perspective of an application developer. Very very out of date… Having said that, what Java clients should I be looking at? I'd recommend the Datastax Java Driver. Works really well for us and if you're familiar with JDBC it will be easy to get up and running fast. They are supporting it pretty aggressively too… the custom data type stuff is already supported in 2.1. Are there any reasonably mature PoJo mapping techs for Cassandra analogous to Hibernate? One was just posted to the list… I would say there are 2-3 … I posted on the same question and there's a thread around my email address if you want to search for it. I personally ended up writing my own that just used a velocity code generator so I could control the byte code output easily. I can't say that I'm looking forward to yet another *QL variant but I guess CQL is going to be a necessity. It's very close to an abbreviated SQL92 with a few less features. You won't have a problem. -- Founder/CEO Spinn3r.com http://Spinn3r.com Location: *San Francisco, CA* blog:* *http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com signature.asc Description: OpenPGP digital signature
Re: Which way to Cassandraville?
So It seems that: 1. There are indeed a few (3-4) mapping schemes. 2. CQL isn't very hard and represents a subset of (ANSI?) SQ92. Both of these are validated based on further research and list guidance. It appears that learning Cassandra from an application developers perspective essentially means learning what you can't do at all and learning what you can't do directly that you could do with an RDMBS. This and keys and maybe a thing or two about replication strategies and you should be good to go. Does this seem accurate? What kinds of things would it be good to know for an interview? Jim C. signature.asc Description: OpenPGP digital signature
Re: Which way to Cassandraville?
What kinds of things would it be good to know for an interview? The underlying storage engine and how CQL3 maps to it. It's more than important, it's crucial. Knowing what you do and what you can't with CQL3 is not sufficient. On Tue, Jul 22, 2014 at 9:20 PM, jcllings jclli...@gmail.com wrote: So It seems that: 1. There are indeed a few (3-4) mapping schemes. 2. CQL isn't very hard and represents a subset of (ANSI?) SQ92. Both of these are validated based on further research and list guidance. It appears that learning Cassandra from an application developers perspective essentially means learning what you can't do at all and learning what you can't do directly that you could do with an RDMBS. This and keys and maybe a thing or two about replication strategies and you should be good to go. Does this seem accurate? What kinds of things would it be good to know for an interview? Jim C.
Running Cassandra Server in an OSGi container
Hello - I have a use case where I need to run the Cassandra Server as an OSGi bundle. I have been able to embed all of the Cassandra dependencies in an OSGi bundle and run it on Karaf container, but I am not happy with the approach I have thus far. Since CassandraDaemon has System.exit() calls in it, if these execute it will bring down my entire OSGi container rather than just the bundle Cassandra is running in. I hacked up a copy of CassandraDaemon enough to get it to run in the bundle with no System.exit() calls, but the Cassandra StorageService is not aware of it, i.e., I cannot call the StorageService.registerDaemon(...) method because my copy of CassandraDaemon does not extend Apache's. hence I am getting exceptions when I do shutdown my container or restart the bundle because the StorageService and my CassandraDaemon are not linked. I am considering trying to extend Apache's CassandraDaemon and override its setup() method with a SecurityManager that disables System.exit() calls. This too sounds hacky. Does anyone have any better suggestions? Or know of an existing open source project that has successfully embedded CassandraServer in an OSGi bundle? I am using Cassandra v2.0.7 and am currently using CQL (vs. Thrift). Thanks - Hugh
Re: Which way to Cassandraville?
You can also try http://caffinitas.org - Open Source Java object mapper for C* using Datastax's Java Driver licensed using APL2. It is intended to be a bit close to what JPA does. Although, it cannot support JPA features 1:1 since there are fundamental differences between RDBMS and NoSQL/C*. But it has other features that traditional RDBMS do not have. CQL in general is relatively close to SQL (CQL is SQL minus joins and subqueries, plus collections) - with C* 2.1 you can add plus user types Regarding an interview: 1. knowledge of query-driven data model 2. knowledge of C* cluster organization / how data is distributed 3. knowledge of consistency (levels) 4. knowledge of C* read and write path Robert Am 22.07.2014 um 21:20 schrieb jcllings jclli...@gmail.com: So It seems that: 1. There are indeed a few (3-4) mapping schemes. 2. CQL isn't very hard and represents a subset of (ANSI?) SQ92. Both of these are validated based on further research and list guidance. It appears that learning Cassandra from an application developers perspective essentially means learning what you can't do at all and learning what you can't do directly that you could do with an RDMBS. This and keys and maybe a thing or two about replication strategies and you should be good to go. Does this seem accurate? What kinds of things would it be good to know for an interview? Jim C. signature.asc Description: Message signed with OpenPGP using GPGMail
Re: Which way to Cassandraville?
OK to clarify, I don't mean as an Administrator but an application developer. If you use an ORM how important is CQL3? The object being to eliminate any *QL from Java code. Perhaps this technology isn't as mature as I thought. Jim C. On 07/22/2014 12:42 PM, DuyHai Doan wrote: What kinds of things would it be good to know for an interview? The underlying storage engine and how CQL3 maps to it. It's more than important, it's crucial. Knowing what you do and what you can't with CQL3 is not sufficient. On Tue, Jul 22, 2014 at 9:20 PM, jcllings jclli...@gmail.com mailto:jclli...@gmail.com wrote: So It seems that: 1. There are indeed a few (3-4) mapping schemes. 2. CQL isn't very hard and represents a subset of (ANSI?) SQ92. Both of these are validated based on further research and list guidance. It appears that learning Cassandra from an application developers perspective essentially means learning what you can't do at all and learning what you can't do directly that you could do with an RDMBS. This and keys and maybe a thing or two about replication strategies and you should be good to go. Does this seem accurate? What kinds of things would it be good to know for an interview? Jim C. signature.asc Description: OpenPGP digital signature
Re: Which way to Cassandraville?
Having an ORM says nothing about the maturity of a database, it says more about the community and their willingness to create one. The database itself has nothing to do with the creation of the ORM. Atop everything else, as was stated, knowing how to model your queries is the most important thing, more than knowing how to use the driver. Cassandra has a very specific way that it stores data, if you attempt to store data the way you do in any other RDBMS there is a good chance you will have a very hard time. Also, this: http://my.safaribooksonline.com/book/databases/9780133440195 We wrote it for 1.2 but most all of the information still applies. The performance gains you get from Cassandra come at a cost, that cost being that you need to know what you are doing. On July 22, 2014 at 4:01:21 PM, jcllings (jclli...@gmail.com) wrote: OK to clarify, I don't mean as an Administrator but an application developer. If you use an ORM how important is CQL3? The object being to eliminate any *QL from Java code. Perhaps this technology isn't as mature as I thought. Jim C. On 07/22/2014 12:42 PM, DuyHai Doan wrote: What kinds of things would it be good to know for an interview? The underlying storage engine and how CQL3 maps to it. It's more than important, it's crucial. Knowing what you do and what you can't with CQL3 is not sufficient. On Tue, Jul 22, 2014 at 9:20 PM, jcllings jclli...@gmail.com wrote: So It seems that: 1. There are indeed a few (3-4) mapping schemes. 2. CQL isn't very hard and represents a subset of (ANSI?) SQ92. Both of these are validated based on further research and list guidance. It appears that learning Cassandra from an application developers perspective essentially means learning what you can't do at all and learning what you can't do directly that you could do with an RDMBS. This and keys and maybe a thing or two about replication strategies and you should be good to go. Does this seem accurate? What kinds of things would it be good to know for an interview? Jim C.
Re: Which way to Cassandraville?
Let me respond with another question: How important is SQL for a JPA developer? Mappers eliminate the boring and error-prone stuff like execute SELECT, read fields, call setters etc. They can automatically perform conversions, apply optimizations, etc etc etc. Mappers do not remove the need of a developer to think about that what (s)he's coding. IMO mappers help and make life easier. Period. Means: you should always know what the thing does to read/write your data. Practically not down to the details - but the concepts and pitfalls should be known. If you don't you will get into trouble - sooner or later. Robert PS: I avoid the abbreviation ORM - it includes the term relational ;) Am 22.07.2014 um 22:00 schrieb jcllings jclli...@gmail.com: OK to clarify, I don't mean as an Administrator but an application developer. If you use an ORM how important is CQL3? The object being to eliminate any *QL from Java code. Perhaps this technology isn't as mature as I thought. Jim C. On 07/22/2014 12:42 PM, DuyHai Doan wrote: What kinds of things would it be good to know for an interview? The underlying storage engine and how CQL3 maps to it. It's more than important, it's crucial. Knowing what you do and what you can't with CQL3 is not sufficient. On Tue, Jul 22, 2014 at 9:20 PM, jcllings jclli...@gmail.com wrote: So It seems that: 1. There are indeed a few (3-4) mapping schemes. 2. CQL isn't very hard and represents a subset of (ANSI?) SQ92. Both of these are validated based on further research and list guidance. It appears that learning Cassandra from an application developers perspective essentially means learning what you can't do at all and learning what you can't do directly that you could do with an RDMBS. This and keys and maybe a thing or two about replication strategies and you should be good to go. Does this seem accurate? What kinds of things would it be good to know for an interview? Jim C. signature.asc Description: Message signed with OpenPGP using GPGMail
Re: Running Cassandra Server in an OSGi container
What's your intention to do this? There are unit test integrations using C* daemon. A related bug that prevented proper shutdown has been closed for C* 2.1-rc1: https://issues.apache.org/jira/browse/CASSANDRA-5635 It's perfectly fine to embed C* for unit tests. But I'd definitely not recommend to use C* within a container in a real production environment. Not just because of the few System.exit calls in CassandraDaemon but also of the other places where System.exit is called for very good reasons. These reasons include system/node failure scenarios (for example disk failures). C* is designed to run in its own JVM process using dedicated hardware resources on multiple servers using commodity hardware without any virtualization or any shared storage. And it just works great with that. There are good reasons to move computation near to the data - but that's always a separate OS process on C* nodes. Examples are Hadoop and Spark. Am 22.07.2014 um 21:45 schrieb Rodgers, Hugh hugh.rodg...@lmco.com: Hello – I have a use case where I need to run the Cassandra Server as an OSGi bundle. I have been able to embed all of the Cassandra dependencies in an OSGi bundle and run it on Karaf container, but I am not happy with the approach I have thus far. Since CassandraDaemon has System.exit() calls in it, if these execute it will bring down my entire OSGi container rather than just the bundle Cassandra is running in. I hacked up a copy of CassandraDaemon enough to get it to run in the bundle with no System.exit() calls, but the Cassandra StorageService is not “aware” of it, i.e., I cannot call the StorageService.registerDaemon(…) method because my copy of CassandraDaemon does not extend Apache’s. hence I am getting exceptions when I do shutdown my container or restart the bundle because the StorageService and my CassandraDaemon are not “linked”. I am considering trying to extend Apache’s CassandraDaemon and override its setup() method with a SecurityManager that disables System.exit() calls. This too sounds “hacky”. Does anyone have any better suggestions? Or know of an existing open source project that has successfully embedded CassandraServer in an OSGi bundle? I am using Cassandra v2.0.7 and am currently using CQL (vs. Thrift). Thanks – Hugh signature.asc Description: Message signed with OpenPGP using GPGMail
Re: Which way to Cassandraville?
On 07/22/2014 01:11 PM, Robert Stupp wrote: Let me respond with another question: How important is SQL for a JPA developer? ... IMO mappers help and make life easier. Period. Means: you should always know what the thing does to read/write your data. Practically not down to the details - but the concepts and pitfalls should be known. If you don't you will get into trouble - sooner or later. Robert PS: I avoid the abbreviation ORM - it includes the term relational ;) Agreed. That is why in previous posts I've been calling it PoJo Mapping. When someone suggests I try on yet another hat, though, I get a little excited. ;-) In this case I've been wearing the ORM / RDBMS hat for long enough that I actually don't think about it much. So your point is made. I've already been wearing the hat in question. I surmise if you are using a mapper, it should be more a matter of knowing how the annotations map to the back-end rather than the CQL. This may make the transition easier, because as you say, it eliminates the cruft. Jim C. signature.asc Description: OpenPGP digital signature
Re: Which way to Cassandraville?
I surmise if you are using a mapper, it should be more a matter of knowing how the annotations map to the back-end rather than the CQL It would be too easy. You should also know how the CQL3 maps to underlying data storage. On Tue, Jul 22, 2014 at 10:33 PM, jcllings jclli...@gmail.com wrote: On 07/22/2014 01:11 PM, Robert Stupp wrote: Let me respond with another question: How important is SQL for a JPA developer? ... IMO mappers help and make life easier. Period. Means: you should always know what the thing does to read/write your data. Practically not down to the details - but the concepts and pitfalls should be known. If you don't you will get into trouble - sooner or later. Robert PS: I avoid the abbreviation ORM - it includes the term relational ;) Agreed. That is why in previous posts I've been calling it PoJo Mapping. When someone suggests I try on yet another hat, though, I get a little excited. ;-) In this case I've been wearing the ORM / RDBMS hat for long enough that I actually don't think about it much. So your point is made. I've already been wearing the hat in question. I surmise if you are using a mapper, it should be more a matter of knowing how the annotations map to the back-end rather than the CQL. This may make the transition easier, because as you say, it eliminates the cruft. Jim C.
Re: Which way to Cassandraville?
Yep - too easy. It does not matter what you use (CQL3, Pojo Mapper ;) or whatever). And I guess it's easier for a pure Java coder knowing nothing about C* to start with a mapper. But in the end you should know what's going on - since you will be in the position to fix bugs and performance issues. And I think there's no opposition when I say that it's better to prevent bugs ;) The easiest way to learn things is just to start using it - play with it - make tests - dig around - build a prototype - benchmarks - performance tests - again and again. But throw away your prototype - start from scratch - with the lessons learned in mind :) Am 22.07.2014 um 22:37 schrieb DuyHai Doan doanduy...@gmail.com: I surmise if you are using a mapper, it should be more a matter of knowing how the annotations map to the back-end rather than the CQL It would be too easy. You should also know how the CQL3 maps to underlying data storage. On Tue, Jul 22, 2014 at 10:33 PM, jcllings jclli...@gmail.com wrote: On 07/22/2014 01:11 PM, Robert Stupp wrote: Let me respond with another question: How important is SQL for a JPA developer? ... IMO mappers help and make life easier. Period. Means: you should always know what the thing does to read/write your data. Practically not down to the details - but the concepts and pitfalls should be known. If you don't you will get into trouble - sooner or later. Robert PS: I avoid the abbreviation ORM - it includes the term relational ;) Agreed. That is why in previous posts I've been calling it PoJo Mapping. When someone suggests I try on yet another hat, though, I get a little excited. ;-) In this case I've been wearing the ORM / RDBMS hat for long enough that I actually don't think about it much. So your point is made. I've already been wearing the hat in question. I surmise if you are using a mapper, it should be more a matter of knowing how the annotations map to the back-end rather than the CQL. This may make the transition easier, because as you say, it eliminates the cruft. Jim C. signature.asc Description: Message signed with OpenPGP using GPGMail
Re: Which way to Cassandraville?
On 07/22/2014 01:37 PM, DuyHai Doan wrote: I surmise if you are using a mapper, it should be more a matter of knowing how the annotations map to the back-end rather than the CQL It would be too easy. You should also know how the CQL3 maps to underlying data storage. It would be if I intended to stop there. I was just picking a familiar starting point. The best employee at any interview, of course, is both omniscient and omnipotent. Ah...but then we would be merely leasing in in HIS universe. ;-) Jim C. signature.asc Description: OpenPGP digital signature
Re: Which way to Cassandraville?
Checkout datastax devcenter which is a GUI datamodelling tool for cql3 http://www.datastax.com/what-we-offer/products-services/devcenter On Sun, Jul 20, 2014 at 7:17 PM, jcllings jclli...@gmail.com wrote: So I'm a Java application developer and I'm trying to find entry points for learning to work with Cassandra. I just finished reading Cassandra: The Definitive Guide which seems pretty out of date and while very informative as to the technology that Cassandra uses, was not very helpful from the perspective of an application developer. Having said that, what Java clients should I be looking at? Are there any reasonably mature PoJo mapping techs for Cassandra analogous to Hibernate? I can't say that I'm looking forward to yet another *QL variant but I guess CQL is going to be a necessity. What, if any, GUI tools are available for working with Cassandra, for data modelling? Jim C. -- http://twitter.com/tjake
Re: Which way to Cassandraville?
Removing *QL from application code is not really an indicator of the maturity of a technology. ORMs and automatic type mapping in general tend to be very easy things for a developer to work with allowing for rapid prototypes, but those applications are often ill-suited to being deployed is high-volume environments. I have used a wide variety of ORMs over the last 15 years, hibernate being a favourite at which I am held to have some expertise, but when I am creating an app for the real world in situations where I can expect several million requests/day, I do not touch them. On Tue, Jul 22, 2014 at 5:10 PM, Jake Luciani jak...@gmail.com wrote: Checkout datastax devcenter which is a GUI datamodelling tool for cql3 http://www.datastax.com/what-we-offer/products-services/devcenter On Sun, Jul 20, 2014 at 7:17 PM, jcllings jclli...@gmail.com wrote: So I'm a Java application developer and I'm trying to find entry points for learning to work with Cassandra. I just finished reading Cassandra: The Definitive Guide which seems pretty out of date and while very informative as to the technology that Cassandra uses, was not very helpful from the perspective of an application developer. Having said that, what Java clients should I be looking at? Are there any reasonably mature PoJo mapping techs for Cassandra analogous to Hibernate? I can't say that I'm looking forward to yet another *QL variant but I guess CQL is going to be a necessity. What, if any, GUI tools are available for working with Cassandra, for data modelling? Jim C. -- http://twitter.com/tjake -- - michael dykman - mdyk...@gmail.com May the Source be with you.
Re: Running Cassandra Server in an OSGi container
I can give you some tips. Figure out what Cassandra does when it starts up. Best way to do that is to read the startup script. Then all you have to do is convince the OSGI container to do what ever prep is done by the script. Trick to that is usually figuring out where to do it. For example if there are environment variables set in the script for Cassandra, you should add them to the script for your OSGI container. If there are any -D options, you would have to use what ever mechanism your OSGI container uses to pass them. There might be a properties file for example or there might be actual -D settings, depending. You should probably make your best guess as to where to put the configuration files but watch the logs for errors to this effect, e.g. ERROR: Doh! Can't find the config dir / file / etc. Of course, if the Cassandra libs aren't OSGI-ified you would have to do that also. Jim C. On 07/22/2014 01:19 PM, Robert Stupp wrote: What's your intention to do this? There are unit test integrations using C* daemon. A related bug that prevented proper shutdown has been closed for C* 2.1-rc1: https://issues.apache.org/jira/browse/CASSANDRA-5635 It's perfectly fine to embed C* for unit tests. But I'd definitely not recommend to use C* within a container in a real production environment. Not just because of the few System.exit calls in CassandraDaemon but also of the other places where System.exit is called for very good reasons. These reasons include system/node failure scenarios (for example disk failures). C* is designed to run in its own JVM process using dedicated hardware resources on multiple servers using commodity hardware without any virtualization or any shared storage. And it just works great with that. There are good reasons to move computation near to the data - but that's always a separate OS process on C* nodes. Examples are Hadoop and Spark. Am 22.07.2014 um 21:45 schrieb Rodgers, Hugh hugh.rodg...@lmco.com mailto:hugh.rodg...@lmco.com: Hello – I have a use case where I need to run the Cassandra Server as an OSGi bundle. I have been able to embed all of the Cassandra dependencies in an OSGi bundle and run it on Karaf container, but I am not happy with the approach I have thus far. Since CassandraDaemon has System.exit() calls in it, if these execute it will bring down my entire OSGi container rather than just the bundle Cassandra is running in. I hacked up a copy of CassandraDaemon enough to get it to run in the bundle with no System.exit() calls, but the Cassandra StorageService is not “aware” of it, i.e., I cannot call the StorageService.registerDaemon(…) method because my copy of CassandraDaemon does not extend Apache’s. hence I am getting exceptions when I do shutdown my container or restart the bundle because the StorageService and my CassandraDaemon are not “linked”. I am considering trying to extend Apache’s CassandraDaemon and override its setup() method with a SecurityManager that disables System.exit() calls. This too sounds “hacky”. Does anyone have any better suggestions? Or know of an existing open source project that has successfully embedded CassandraServer in an OSGi bundle? I am using Cassandra v2.0.7 and am currently using CQL (vs. Thrift). Thanks – Hugh signature.asc Description: OpenPGP digital signature
Re: Running Cassandra Server in an OSGi container
BTW, I agree with other posters that it seems like an awfully weird thing to do. Perhaps you just want to run a client in an OSGI environment? Jim C. On 07/22/2014 02:39 PM, jcllings wrote: I can give you some tips. Figure out what Cassandra does when it starts up. Best way to do that is to read the startup script. Then all you have to do is convince the OSGI container to do what ever prep is done by the script. Trick to that is usually figuring out where to do it. For example if there are environment variables set in the script for Cassandra, you should add them to the script for your OSGI container. If there are any -D options, you would have to use what ever mechanism your OSGI container uses to pass them. There might be a properties file for example or there might be actual -D settings, depending. You should probably make your best guess as to where to put the configuration files but watch the logs for errors to this effect, e.g. ERROR: Doh! Can't find the config dir / file / etc. Of course, if the Cassandra libs aren't OSGI-ified you would have to do that also. Jim C. On 07/22/2014 01:19 PM, Robert Stupp wrote: What's your intention to do this? signature.asc Description: OpenPGP digital signature
Cassandra Scaling Alerts
We have been going through and setting up alerts on our Cassandra clusters. We have catastrophic alerts setup to let us know when things are super broken, but we are now looking at setting up alerts for letting us know when we need to start scaling vertically or horizontally. We have alerts on our system metrics. What are the recommended metrics from the JMX that are strong indicators of needing to scale?
Re: Which way to Cassandraville?
True - Hibernate, Eclipselink and others add plenty of synchronization overhead owed the fact that an entity instance does not need to be explicitly persisted to get persisted (just change the loaded instance and flush the session). That's very expensive (CPU and heap). Even though transaction synchronization adds another cost. Pure mapping as itself is not really expensive compared to what one would do to return a Pojo or persist a Pojo. Take a look at https://bitbucket.org/snazy/caffinitas/ - PersistenceSessionImpl.loadOne()/insert() add not much overhead during runtime - but you get the object ready to use. PS We are doing several million requests per day with Hibernate - but I spent a lot of work to optimize framework between business logic and JPA. It would not work out of the box. Am 22.07.2014 um 23:32 schrieb Michael Dykman mdyk...@gmail.com: Removing *QL from application code is not really an indicator of the maturity of a technology. ORMs and automatic type mapping in general tend to be very easy things for a developer to work with allowing for rapid prototypes, but those applications are often ill-suited to being deployed is high-volume environments. I have used a wide variety of ORMs over the last 15 years, hibernate being a favourite at which I am held to have some expertise, but when I am creating an app for the real world in situations where I can expect several million requests/day, I do not touch them. On Tue, Jul 22, 2014 at 5:10 PM, Jake Luciani jak...@gmail.com wrote: Checkout datastax devcenter which is a GUI datamodelling tool for cql3 http://www.datastax.com/what-we-offer/products-services/devcenter On Sun, Jul 20, 2014 at 7:17 PM, jcllings jclli...@gmail.com wrote: So I'm a Java application developer and I'm trying to find entry points for learning to work with Cassandra. I just finished reading Cassandra: The Definitive Guide which seems pretty out of date and while very informative as to the technology that Cassandra uses, was not very helpful from the perspective of an application developer. Having said that, what Java clients should I be looking at? Are there any reasonably mature PoJo mapping techs for Cassandra analogous to Hibernate? I can't say that I'm looking forward to yet another *QL variant but I guess CQL is going to be a necessity. What, if any, GUI tools are available for working with Cassandra, for data modelling? Jim C. -- http://twitter.com/tjake -- - michael dykman - mdyk...@gmail.com May the Source be with you. signature.asc Description: Message signed with OpenPGP using GPGMail
Re: Cassandra Scaling Alerts
I would look at load (disk space used) and system.compactions_in_progress. On Tue, Jul 22, 2014 at 3:49 PM, Arup Chakrabarti a...@pagerduty.com wrote: We have been going through and setting up alerts on our Cassandra clusters. We have catastrophic alerts setup to let us know when things are super broken, but we are now looking at setting up alerts for letting us know when we need to start scaling vertically or horizontally. We have alerts on our system metrics. What are the recommended metrics from the JMX that are strong indicators of needing to scale?
Re: Which way to Cassandraville?
The problem with Hibernate and the kind is that they try to do many thing at once. And support for JOINS bring a damned lots of complexity. You need to manage object graphs and circular references - statefull session - not thread-safe - not good fit for async multi threaded envs On Tue, Jul 22, 2014 at 11:56 PM, Robert Stupp sn...@snazy.de wrote: True - Hibernate, Eclipselink and others add plenty of synchronization overhead owed the fact that an entity instance does not need to be explicitly persisted to get persisted (just change the loaded instance and flush the session). That's very expensive (CPU and heap). Even though transaction synchronization adds another cost. Pure mapping as itself is not really expensive compared to what one would do to return a Pojo or persist a Pojo. Take a look at https://bitbucket.org/snazy/caffinitas/ - PersistenceSessionImpl.loadOne()/insert() add not much overhead during runtime - but you get the object ready to use. PS We are doing several million requests per day with Hibernate - but I spent a lot of work to optimize framework between business logic and JPA. It would not work out of the box. Am 22.07.2014 um 23:32 schrieb Michael Dykman mdyk...@gmail.com: Removing *QL from application code is not really an indicator of the maturity of a technology. ORMs and automatic type mapping in general tend to be very easy things for a developer to work with allowing for rapid prototypes, but those applications are often ill-suited to being deployed is high-volume environments. I have used a wide variety of ORMs over the last 15 years, hibernate being a favourite at which I am held to have some expertise, but when I am creating an app for the real world in situations where I can expect several million requests/day, I do not touch them. On Tue, Jul 22, 2014 at 5:10 PM, Jake Luciani jak...@gmail.com wrote: Checkout datastax devcenter which is a GUI datamodelling tool for cql3 http://www.datastax.com/what-we-offer/products-services/devcenter On Sun, Jul 20, 2014 at 7:17 PM, jcllings jclli...@gmail.com wrote: So I'm a Java application developer and I'm trying to find entry points for learning to work with Cassandra. I just finished reading Cassandra: The Definitive Guide which seems pretty out of date and while very informative as to the technology that Cassandra uses, was not very helpful from the perspective of an application developer. Having said that, what Java clients should I be looking at? Are there any reasonably mature PoJo mapping techs for Cassandra analogous to Hibernate? I can't say that I'm looking forward to yet another *QL variant but I guess CQL is going to be a necessity. What, if any, GUI tools are available for working with Cassandra, for data modelling? Jim C. -- http://twitter.com/tjake -- - michael dykman - mdyk...@gmail.com May the Source be with you.
Re: Cassandra Scaling Alerts
also pending read / write operations (nodetool tpstats) and I/O On Tue, Jul 22, 2014 at 11:59 PM, Shane Hansen shanemhan...@gmail.com wrote: I would look at load (disk space used) and system.compactions_in_progress. On Tue, Jul 22, 2014 at 3:49 PM, Arup Chakrabarti a...@pagerduty.com wrote: We have been going through and setting up alerts on our Cassandra clusters. We have catastrophic alerts setup to let us know when things are super broken, but we are now looking at setting up alerts for letting us know when we need to start scaling vertically or horizontally. We have alerts on our system metrics. What are the recommended metrics from the JMX that are strong indicators of needing to scale?
RE: EXTERNAL: Re: Running Cassandra Server in an OSGi container
What got our team on the path of trying to embed C* was the wiki page http://wiki.apache.org/cassandra/Embedding which implies this can be done. Also WSO2 Carbon and Achilles have both embedded C* (not in an OSGi container though, and Carbon is with an older C* version). We are wanting an unzip and run system and do not expect the user to have to do much, if any, C* configuration. From: Robert Stupp [mailto:sn...@snazy.de] Sent: Tuesday, July 22, 2014 1:19 PM To: user@cassandra.apache.org Subject: EXTERNAL: Re: Running Cassandra Server in an OSGi container What's your intention to do this? There are unit test integrations using C* daemon. A related bug that prevented proper shutdown has been closed for C* 2.1-rc1: https://issues.apache.org/jira/browse/CASSANDRA-5635 It's perfectly fine to embed C* for unit tests. But I'd definitely not recommend to use C* within a container in a real production environment. Not just because of the few System.exit calls in CassandraDaemon but also of the other places where System.exit is called for very good reasons. These reasons include system/node failure scenarios (for example disk failures). C* is designed to run in its own JVM process using dedicated hardware resources on multiple servers using commodity hardware without any virtualization or any shared storage. And it just works great with that. There are good reasons to move computation near to the data - but that's always a separate OS process on C* nodes. Examples are Hadoop and Spark. Am 22.07.2014 um 21:45 schrieb Rodgers, Hugh hugh.rodg...@lmco.commailto:hugh.rodg...@lmco.com: Hello - I have a use case where I need to run the Cassandra Server as an OSGi bundle. I have been able to embed all of the Cassandra dependencies in an OSGi bundle and run it on Karaf container, but I am not happy with the approach I have thus far. Since CassandraDaemon has System.exit() calls in it, if these execute it will bring down my entire OSGi container rather than just the bundle Cassandra is running in. I hacked up a copy of CassandraDaemon enough to get it to run in the bundle with no System.exit() calls, but the Cassandra StorageService is not aware of it, i.e., I cannot call the StorageService.registerDaemon(...) method because my copy of CassandraDaemon does not extend Apache's. hence I am getting exceptions when I do shutdown my container or restart the bundle because the StorageService and my CassandraDaemon are not linked. I am considering trying to extend Apache's CassandraDaemon and override its setup() method with a SecurityManager that disables System.exit() calls. This too sounds hacky. Does anyone have any better suggestions? Or know of an existing open source project that has successfully embedded CassandraServer in an OSGi bundle? I am using Cassandra v2.0.7 and am currently using CQL (vs. Thrift). Thanks - Hugh
Re: Which way to Cassandraville?
On Tue, Jul 22, 2014 at 1:10 PM, Russell Bradberry rbradbe...@gmail.com wrote: Having an ORM says nothing about the maturity of a database, it says more about the community and their willingness to create one. The database itself has nothing to do with the creation of the ORM. Except, as in this case, when one has baked what looks an awful lot like an ORM into the Database... ;D =Rob
Re: DataType protocol ID error for TIMESTAMPs when upgrading from 1.2.11 to 2.0.9
On Tue, Jul 22, 2014 at 1:53 AM, Ben Hood 0x6e6...@gmail.com wrote: As Karl has suggested, client driver maintainers have opted to workaround the issue. Indeed, reading up on the issue (and discussing it with folks) there are a number of mitigating factors, most significantly driver workarounds use of TimeUUIDs, which made this issue less common than reversed comparators use cases are. I still consider it a serious issue due to the nature of the regression, but it is fair to say not as serious as my initial reaction. As for the unit tests, I think this issue was only reproducible when upgrading a schema to 2.0.x - are you suggesting that there was/is test coverage for this scenario in the server? No, I was wondering why such a test, which tests for regression in very basic table access and appears to requires no distribution, does not currently exist. In this particular case, the answer to why not involves the idea that one needs to be able to test with a driver in order to expose it, and currently (as I understand it) only distributed tests use a driver. I believe that operators expect there to be a robust representative test schema that can be created on version X.Y.Z and be accessed on version X+1.y.0 which would exercise this core code and increase confidence that tables created in major version X will always be usable without exception in X+1. =Rob
Case Study from Migrating from RDBMS to Cassandra
Hi, Does anybody has the case study for Migrating from RDBMS to Cassandra ? Thanks
Why is the cassandra documentation such poor quality?
This document: https://wiki.apache.org/cassandra/Operations … for example. Is extremely out dated… does NOT reflect 2.x releases certainly. Mentions commands that are long since removed/deprecated. Instead of giving bad documentation, maybe remove this and mark it as obsolete. The datastax documentation… is … acceptable I guess. My main criticism there is that a lot of it it is in their blog. Kevin -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com
cluster rebalancing…
So , shouldn't it be easy to rebalance a cluster? I'm not super excited to type out 200 commands to move around individual tokens. I realize that this isn't a super easy solution, and that there are probably 2-3 different algorithms to pick here… but having this be the only option doesn't seem scalable. -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com
Re: cluster rebalancing…
You don't need to specify tokens. The new node gets them automatically. On Jul 22, 2014, at 7:03 PM, Kevin Burton bur...@spinn3r.com wrote: So , shouldn't it be easy to rebalance a cluster? I'm not super excited to type out 200 commands to move around individual tokens. I realize that this isn't a super easy solution, and that there are probably 2-3 different algorithms to pick here… but having this be the only option doesn't seem scalable. -- Founder/CEO Spinn3r.com Location: San Francisco, CA blog: http://burtonator.wordpress.com … or check out my Google+ profile
ONE consistency required 2 writes? huh?
Perhaps it's me but it seems this exception is wrong: Cassandra timeout during write query at consistency ONE (2 replica were required but only 1 acknowledged the write) .. but the documentation for ONE says: A write must be written to the commit log and memory table of at least one replica node. … so… in my situation… 1 replica DID ack the write… so why am I getting an exception? Maybe I'm jut not interpreting the exception correctly? -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com
Re: cluster rebalancing…
ok.. I think I get what's happening. This node is still joining the cluster. It wasn't totally clear that it was still joining as the only indicator is the little J ... On Tue, Jul 22, 2014 at 7:09 PM, Jonathan Haddad jonathan.had...@gmail.com wrote: You don't need to specify tokens. The new node gets them automatically. On Jul 22, 2014, at 7:03 PM, Kevin Burton bur...@spinn3r.com wrote: So , shouldn't it be easy to rebalance a cluster? I'm not super excited to type out 200 commands to move around individual tokens. I realize that this isn't a super easy solution, and that there are probably 2-3 different algorithms to pick here… but having this be the only option doesn't seem scalable. -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com
All writes fail with ONE consistency level when adding second node to cluster?
I'm super confused by this.. and disturbed that this was my failure scenario :-( I had one cassandra node for the alpha of my app… and now we're moving into beta… which means three replicas. So I added the second node… but my app immediately broke with: Cassandra timeout during write query at consistency ONE (2 replica were required but only 1 acknowledged the write) … but that makes no sense… if I'm at ONE and I have one acknowledged write, why does it matter that the second one hasn't ack'd yet… ? -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com
Re: All writes fail with ONE consistency level when adding second node to cluster?
ONE means write to one replica (in addition to the original). If you want to write to any of them, use ANY. Is that the right understanding? http://www.datastax.com/docs/1.0/dml/data_consistency Andrew On July 22, 2014 at 7:43:43 PM, Kevin Burton (bur...@spinn3r.com) wrote: I'm super confused by this.. and disturbed that this was my failure scenario :-( I had one cassandra node for the alpha of my app… and now we're moving into beta… which means three replicas. So I added the second node… but my app immediately broke with: Cassandra timeout during write query at consistency ONE (2 replica were required but only 1 acknowledged the write) … but that makes no sense… if I'm at ONE and I have one acknowledged write, why does it matter that the second one hasn't ack'd yet… ? -- Founder/CEO Spinn3r.com Location: San Francisco, CA blog: http://burtonator.wordpress.com … or check out my Google+ profile
Re: Case Study from Migrating from RDBMS to Cassandra
There's lots of info on migrating from a relational database to Cassandra here: http://www.datastax.com/relational-database-to-nosql On Tue, Jul 22, 2014 at 7:45 PM, Surbhi Gupta surbhi.gupt...@gmail.com wrote: Hi, Does anybody has the case study for Migrating from RDBMS to Cassandra ? Thanks
Re: All writes fail with ONE consistency level when adding second node to cluster?
WEIRD that it was working before… with one node. Granted that this is a rare config (one cassandra node) but it shouldn't work then. If you attempt to write ONE to a single cassandra node, there is no (in addition to) additional node to write to… So this should have failed. Bug? … and I know why this is failing… my cassandra node is joining the cluster now, but none of the ports are open. So all writes will fail… I have NO idea why the ports aren't open yet .. but it's not a firewall issue. On Tue, Jul 22, 2014 at 7:46 PM, Andrew redmu...@gmail.com wrote: ONE means write to one replica (in addition to the original). If you want to write to any of them, use ANY. Is that the right understanding? http://www.datastax.com/docs/1.0/dml/data_consistency Andrew On July 22, 2014 at 7:43:43 PM, Kevin Burton (bur...@spinn3r.com) wrote: I'm super confused by this.. and disturbed that this was my failure scenario :-( I had one cassandra node for the alpha of my app… and now we're moving into beta… which means three replicas. So I added the second node… but my app immediately broke with: Cassandra timeout during write query at consistency ONE (2 replica were required but only 1 acknowledged the write) … but that makes no sense… if I'm at ONE and I have one acknowledged write, why does it matter that the second one hasn't ack'd yet… ? -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com
Re: All writes fail with ONE consistency level when adding second node to cluster?
Yeah.. that's fascinating … so now I get something that's even worse: Cassandra timeout during write query at consistency ANY (2 replica were required but only 1 acknowledged the write) … the issue is that the new cassandra node has all its ports closed. Only the storage port is open. So obviously writes are going to fail to it. … is this by design? Perhaps it's not going to open the ports until the node joins the ring? It's currently joining … so… basically, my entire cluster is offline during this join? I assume this is either a bug or some weird state base on growing from 1-2 nodes? frustrating :-( On Tue, Jul 22, 2014 at 8:13 PM, graham sanderson gra...@vast.com wrote: Incorrect, ONE does not refer to the number of “other nodes, it just refers to the number of nodes. so ONE under normal circumstances would only require one node to acknowledge the write. The confusing error message you are getting is related to https://issues.apache.org/jira/browse/CASSANDRA-833… Kevin you are correct in that normally that error message would make no sense. I don’t have much experience adding/removing nodes, but I think what is happening is that your new node is in the middle of taken over ownership of a token range - while that happens C* is trying to write to both the old owner (your original node), AND (hence the 2 not 1 in the error message) the new owner (the new node) so that once the bootstrapping of the new node is complete, it is immediately safe to delete the (no longer owned data) from the old node. For whatever reason the write to the new node is timing out, causing the exception, and the error message is exposing the “2” which happens to be how many C* thinks it is waiting for at the time (i.e. how many it should be waiting for based on the consistency level (1) plus this extra node). On Jul 22, 2014, at 9:46 PM, Andrew redmu...@gmail.com wrote: ONE means write to one replica (in addition to the original). If you want to write to any of them, use ANY. Is that the right understanding? http://www.datastax.com/docs/1.0/dml/data_consistency Andrew On July 22, 2014 at 7:43:43 PM, Kevin Burton (bur...@spinn3r.com) wrote: I'm super confused by this.. and disturbed that this was my failure scenario :-( I had one cassandra node for the alpha of my app… and now we're moving into beta… which means three replicas. So I added the second node… but my app immediately broke with: Cassandra timeout during write query at consistency ONE (2 replica were required but only 1 acknowledged the write) … but that makes no sense… if I'm at ONE and I have one acknowledged write, why does it matter that the second one hasn't ack'd yet… ? -- Founder/CEO Spinn3r.com http://spinn3r.com/ Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com/ -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com
Re: All writes fail with ONE consistency level when adding second node to cluster?
and there are literally zero google hits on the query: Cassandra timeout during write query at consistency ANY (2 replica were required but only 1 acknowledged the write) .. so I imagine I'm the first to find this bug! Aren't I lucky! On Tue, Jul 22, 2014 at 8:46 PM, Kevin Burton bur...@spinn3r.com wrote: Yeah.. that's fascinating … so now I get something that's even worse: Cassandra timeout during write query at consistency ANY (2 replica were required but only 1 acknowledged the write) … the issue is that the new cassandra node has all its ports closed. Only the storage port is open. So obviously writes are going to fail to it. … is this by design? Perhaps it's not going to open the ports until the node joins the ring? It's currently joining … so… basically, my entire cluster is offline during this join? I assume this is either a bug or some weird state base on growing from 1-2 nodes? frustrating :-( On Tue, Jul 22, 2014 at 8:13 PM, graham sanderson gra...@vast.com wrote: Incorrect, ONE does not refer to the number of “other nodes, it just refers to the number of nodes. so ONE under normal circumstances would only require one node to acknowledge the write. The confusing error message you are getting is related to https://issues.apache.org/jira/browse/CASSANDRA-833… Kevin you are correct in that normally that error message would make no sense. I don’t have much experience adding/removing nodes, but I think what is happening is that your new node is in the middle of taken over ownership of a token range - while that happens C* is trying to write to both the old owner (your original node), AND (hence the 2 not 1 in the error message) the new owner (the new node) so that once the bootstrapping of the new node is complete, it is immediately safe to delete the (no longer owned data) from the old node. For whatever reason the write to the new node is timing out, causing the exception, and the error message is exposing the “2” which happens to be how many C* thinks it is waiting for at the time (i.e. how many it should be waiting for based on the consistency level (1) plus this extra node). On Jul 22, 2014, at 9:46 PM, Andrew redmu...@gmail.com wrote: ONE means write to one replica (in addition to the original). If you want to write to any of them, use ANY. Is that the right understanding? http://www.datastax.com/docs/1.0/dml/data_consistency Andrew On July 22, 2014 at 7:43:43 PM, Kevin Burton (bur...@spinn3r.com) wrote: I'm super confused by this.. and disturbed that this was my failure scenario :-( I had one cassandra node for the alpha of my app… and now we're moving into beta… which means three replicas. So I added the second node… but my app immediately broke with: Cassandra timeout during write query at consistency ONE (2 replica were required but only 1 acknowledged the write) … but that makes no sense… if I'm at ONE and I have one acknowledged write, why does it matter that the second one hasn't ack'd yet… ? -- Founder/CEO Spinn3r.com http://spinn3r.com/ Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com/ -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com
Re: All writes fail with ONE consistency level when adding second node to cluster?
I assumed you must have now switched to ANY which you probably didn’t want to do, and likely won’t help (and very few people use ANY which may explain the lack of google hits, plus this particular “Cassandra timeout during write query at consistency” error message comes from the datastax CQL java driver not C* itself. In any case… my original response was just to explain to you that your understanding of what ONE means in general was correct, and this incorrect looking error message was a weird case during adding a node. I have no idea what is going on with your bootstrapping node others may be able to help, but in the meanwhile I’d look for errors in the server log and google those and/or google for instructions on how to add nodes to a cassandra cluster on whatever version you are running. On Jul 22, 2014, at 10:47 PM, Kevin Burton bur...@spinn3r.com wrote: and there are literally zero google hits on the query: Cassandra timeout during write query at consistency ANY (2 replica were required but only 1 acknowledged the write) .. so I imagine I'm the first to find this bug! Aren't I lucky! On Tue, Jul 22, 2014 at 8:46 PM, Kevin Burton bur...@spinn3r.com wrote: Yeah.. that's fascinating … so now I get something that's even worse: Cassandra timeout during write query at consistency ANY (2 replica were required but only 1 acknowledged the write) … the issue is that the new cassandra node has all its ports closed. Only the storage port is open. So obviously writes are going to fail to it. … is this by design? Perhaps it's not going to open the ports until the node joins the ring? It's currently joining … so… basically, my entire cluster is offline during this join? I assume this is either a bug or some weird state base on growing from 1-2 nodes? frustrating :-( On Tue, Jul 22, 2014 at 8:13 PM, graham sanderson gra...@vast.com wrote: Incorrect, ONE does not refer to the number of “other nodes, it just refers to the number of nodes. so ONE under normal circumstances would only require one node to acknowledge the write. The confusing error message you are getting is related to https://issues.apache.org/jira/browse/CASSANDRA-833… Kevin you are correct in that normally that error message would make no sense. I don’t have much experience adding/removing nodes, but I think what is happening is that your new node is in the middle of taken over ownership of a token range - while that happens C* is trying to write to both the old owner (your original node), AND (hence the 2 not 1 in the error message) the new owner (the new node) so that once the bootstrapping of the new node is complete, it is immediately safe to delete the (no longer owned data) from the old node. For whatever reason the write to the new node is timing out, causing the exception, and the error message is exposing the “2” which happens to be how many C* thinks it is waiting for at the time (i.e. how many it should be waiting for based on the consistency level (1) plus this extra node). On Jul 22, 2014, at 9:46 PM, Andrew redmu...@gmail.com wrote: ONE means write to one replica (in addition to the original). If you want to write to any of them, use ANY. Is that the right understanding? http://www.datastax.com/docs/1.0/dml/data_consistency Andrew On July 22, 2014 at 7:43:43 PM, Kevin Burton (bur...@spinn3r.com) wrote: I'm super confused by this.. and disturbed that this was my failure scenario :-( I had one cassandra node for the alpha of my app… and now we're moving into beta… which means three replicas. So I added the second node… but my app immediately broke with: Cassandra timeout during write query at consistency ONE (2 replica were required but only 1 acknowledged the write) … but that makes no sense… if I'm at ONE and I have one acknowledged write, why does it matter that the second one hasn't ack'd yet… ? -- Founder/CEO Spinn3r.com Location: San Francisco, CA blog: http://burtonator.wordpress.com … or check out my Google+ profile -- Founder/CEO Spinn3r.com Location: San Francisco, CA blog: http://burtonator.wordpress.com … or check out my Google+ profile -- Founder/CEO Spinn3r.com Location: San Francisco, CA blog: http://burtonator.wordpress.com … or check out my Google+ profile smime.p7s Description: S/MIME cryptographic signature
Re: All writes fail with ONE consistency level when adding second node to cluster?
Thanks of the feedback… In hindsight.. I think what happened was that the new node started up… and the driver wanted to write records to it… but the ports weren't up. so I wonder if this is a bug in the datastax driver. On bootstrap, and when joining, does cassandra always keep the ports offline and then only open them up with the node has joined? On Tue, Jul 22, 2014 at 8:55 PM, graham sanderson gra...@vast.com wrote: I assumed you must have now switched to ANY which you probably didn’t want to do, and likely won’t help (and very few people use ANY which may explain the lack of google hits, plus this particular “Cassandra timeout during write query at consistency” error message comes from the datastax CQL java driver not C* itself. In any case… my original response was just to explain to you that your understanding of what ONE means in general was correct, and this incorrect looking error message was a weird case during adding a node. I have no idea what is going on with your bootstrapping node others may be able to help, but in the meanwhile I’d look for errors in the server log and google those and/or google for instructions on how to add nodes to a cassandra cluster on whatever version you are running. On Jul 22, 2014, at 10:47 PM, Kevin Burton bur...@spinn3r.com wrote: and there are literally zero google hits on the query: Cassandra timeout during write query at consistency ANY (2 replica were required but only 1 acknowledged the write) .. so I imagine I'm the first to find this bug! Aren't I lucky! On Tue, Jul 22, 2014 at 8:46 PM, Kevin Burton bur...@spinn3r.com wrote: Yeah.. that's fascinating … so now I get something that's even worse: Cassandra timeout during write query at consistency ANY (2 replica were required but only 1 acknowledged the write) … the issue is that the new cassandra node has all its ports closed. Only the storage port is open. So obviously writes are going to fail to it. … is this by design? Perhaps it's not going to open the ports until the node joins the ring? It's currently joining … so… basically, my entire cluster is offline during this join? I assume this is either a bug or some weird state base on growing from 1-2 nodes? frustrating :-( On Tue, Jul 22, 2014 at 8:13 PM, graham sanderson gra...@vast.com wrote: Incorrect, ONE does not refer to the number of “other nodes, it just refers to the number of nodes. so ONE under normal circumstances would only require one node to acknowledge the write. The confusing error message you are getting is related to https://issues.apache.org/jira/browse/CASSANDRA-833… Kevin you are correct in that normally that error message would make no sense. I don’t have much experience adding/removing nodes, but I think what is happening is that your new node is in the middle of taken over ownership of a token range - while that happens C* is trying to write to both the old owner (your original node), AND (hence the 2 not 1 in the error message) the new owner (the new node) so that once the bootstrapping of the new node is complete, it is immediately safe to delete the (no longer owned data) from the old node. For whatever reason the write to the new node is timing out, causing the exception, and the error message is exposing the “2” which happens to be how many C* thinks it is waiting for at the time (i.e. how many it should be waiting for based on the consistency level (1) plus this extra node). On Jul 22, 2014, at 9:46 PM, Andrew redmu...@gmail.com wrote: ONE means write to one replica (in addition to the original). If you want to write to any of them, use ANY. Is that the right understanding? http://www.datastax.com/docs/1.0/dml/data_consistency Andrew On July 22, 2014 at 7:43:43 PM, Kevin Burton (bur...@spinn3r.com) wrote: I'm super confused by this.. and disturbed that this was my failure scenario :-( I had one cassandra node for the alpha of my app… and now we're moving into beta… which means three replicas. So I added the second node… but my app immediately broke with: Cassandra timeout during write query at consistency ONE (2 replica were required but only 1 acknowledged the write) … but that makes no sense… if I'm at ONE and I have one acknowledged write, why does it matter that the second one hasn't ack'd yet… ? -- Founder/CEO Spinn3r.com http://spinn3r.com/ Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com/ -- Founder/CEO Spinn3r.com http://spinn3r.com/ Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com/ -- Founder/CEO Spinn3r.com http://spinn3r.com/ Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile
Re: Case Study from Migrating from RDBMS to Cassandra
Thansk Shane, Howover i am looking for any Proof of Concepts kind of document . Does anybody has complete end to end document which contains the application overview, How they have migrated from RDBMS to Cassandra? What are the things to consider? How they have converted data model and after the new data model? How they have loaded the data into cassadnra ? Performance test after and before migartion etc. Thanks Surbhi On 23 July 2014 08:51, Shane Hansen shanemhan...@gmail.com wrote: There's lots of info on migrating from a relational database to Cassandra here: http://www.datastax.com/relational-database-to-nosql On Tue, Jul 22, 2014 at 7:45 PM, Surbhi Gupta surbhi.gupt...@gmail.com wrote: Hi, Does anybody has the case study for Migrating from RDBMS to Cassandra ? Thanks
Re: All writes fail with ONE consistency level when adding second node to cluster?
I looked into this; ONE means it must be written to one replica—i.e., a node the data is supposed to be written to. ANY means a hinted handoff will “count”. So as long as it writes to any node on the cluster—even one that it’s not supposed to be on—it will be a success. Good to know. Andrew On July 22, 2014 at 8:13:57 PM, graham sanderson (gra...@vast.com) wrote: Incorrect, ONE does not refer to the number of “other nodes, it just refers to the number of nodes. so ONE under normal circumstances would only require one node to acknowledge the write. The confusing error message you are getting is related to https://issues.apache.org/jira/browse/CASSANDRA-833… Kevin you are correct in that normally that error message would make no sense. I don’t have much experience adding/removing nodes, but I think what is happening is that your new node is in the middle of taken over ownership of a token range - while that happens C* is trying to write to both the old owner (your original node), AND (hence the 2 not 1 in the error message) the new owner (the new node) so that once the bootstrapping of the new node is complete, it is immediately safe to delete the (no longer owned data) from the old node. For whatever reason the write to the new node is timing out, causing the exception, and the error message is exposing the “2” which happens to be how many C* thinks it is waiting for at the time (i.e. how many it should be waiting for based on the consistency level (1) plus this extra node). On Jul 22, 2014, at 9:46 PM, Andrew redmu...@gmail.com wrote: ONE means write to one replica (in addition to the original). If you want to write to any of them, use ANY. Is that the right understanding? http://www.datastax.com/docs/1.0/dml/data_consistency Andrew On July 22, 2014 at 7:43:43 PM, Kevin Burton (bur...@spinn3r.com) wrote: I'm super confused by this.. and disturbed that this was my failure scenario :-( I had one cassandra node for the alpha of my app… and now we're moving into beta… which means three replicas. So I added the second node… but my app immediately broke with: Cassandra timeout during write query at consistency ONE (2 replica were required but only 1 acknowledged the write) … but that makes no sense… if I'm at ONE and I have one acknowledged write, why does it matter that the second one hasn't ack'd yet… ? -- Founder/CEO Spinn3r.com Location: San Francisco, CA blog: http://burtonator.wordpress.com … or check out my Google+ profile