Re: DataType protocol ID error for TIMESTAMPs when upgrading from 1.2.11 to 2.0.9

2014-07-22 Thread Ben Hood
On Tue, Jul 22, 2014 at 1:26 AM, Robert Coli rc...@eventbrite.com wrote:
 I'm pretty sure reversed comparator timestamps are a common type of schema,
 given that there are blog posts recommending their use, so I struggle to
 understand how this was not detected by unit tests.

As Karl has suggested, client driver maintainers have opted to
workaround the issue. At gocql, when we ran into this issue, we began
a discussion thread to see if this was likely to be a client side or a
server side bug. Because we didn't get a response from the discussion,
we thought that the most pragmatic thing to do was to implement a
workaround in the client. Potentially other driver maintainers have
taken a similar course of action.

As for the unit tests, I think this issue was only reproducible when
upgrading a schema to 2.0.x - are you suggesting that there was/is
test coverage for this scenario in the server?


Error :AssertionError = firstTokenIndex(TokenMetadata.java:845)

2014-07-22 Thread Miguel Angel Martin junquera
hi all,

I trying add a node to a cassandra ring with only one seed-node. I have the
seed in EC2 and I have this error  when I start cassandra in the other node







ERROR [Thrift:389] 2014-07-22 08:25:39,838 CassandraDaemon.java (line 191)
Exception in thread Thread[Thrift:389,5,main]
java.lang.AssertionError
at
org.apache.cassandra.locator.TokenMetadata.firstTokenIndex(TokenMetadata.java:845)
at
org.apache.cassandra.locator.TokenMetadata.firstToken(TokenMetadata.java:859)
at
org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:106)
at
org.apache.cassandra.service.StorageService.getNaturalEndpoints(StorageService.java:2681)
at
org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:376)
at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:191)
at
org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:866)
at
org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:849)
at
org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:749)
at
org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3690)
at
org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3678)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
ERROR [Thrift:390] 2014-07-22 08:25:41,169 CassandraDaemon.java (line 191)
Exception in thread Thread[Thrift:390,5,main]
java.lang.AssertionError
at
org.apache.cassandra.locator.TokenMetadata.firstTokenIndex(TokenMetadata.java:845)
at
org.apache.cassandra.locator.TokenMetadata.firstToken(TokenMetadata.java:859)
at
org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:106)
at
org.apache.cassandra.service.StorageService.getNaturalEndpoints(StorageService.java:2681)
at
org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:376)
at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:191)
at
org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:866)
at
org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:849)
at
org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:749)
at
org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3690)
at
org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3678)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
ERROR [Thrift:391] 2014-07-22 08:25:44,578 CassandraDaemon.java (line 191)
Exception in thread Thread[Thrift:391,5,main]
java.lang.AssertionError
at
org.apache.cassandra.locator.TokenMetadata.firstTokenIndex(TokenMetadata.java:845)
at
org.apache.cassandra.locator.TokenMetadata.firstToken(TokenMetadata.java:859)
at
org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:106)
at
org.apache.cassandra.service.StorageService.getNaturalEndpoints(StorageService.java:2681)
at
org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:376)
at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:191)
at
org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:866)
at
org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:849)
at
org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:749)
at
org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3690)
at
org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3678)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)



--


I do an AMI from the original seed cassandra  instance EC2 and delete all
data an config listen 

I want either all the DML statements within the batch succeed or rollback all. is it possible?

2014-07-22 Thread M.Tarkeshwar Rao
Hi all,

In the user guide of  Cassandra i got the information about the batch for
atomic DML operations.
I want either all the DML statements within the batch succeed or rollback
all.

is it possible?

another question in my can i use joins in Cassandra or any other way to
achieve it.


Regards
Tarkeshwar


Re: Authentication exception

2014-07-22 Thread Jeremy Jongsma
Verified all clocks are in sync.


On Mon, Jul 21, 2014 at 10:03 PM, Rahul Menon ra...@apigee.com wrote:

 I could you perhaps check your ntp?


 On Tue, Jul 22, 2014 at 3:35 AM, Jeremy Jongsma jer...@barchart.com
 wrote:

 I routinely get this exception from cqlsh on one of my clusters:

 cql.cassandra.ttypes.AuthenticationException:
 AuthenticationException(why='org.apache.cassandra.exceptions.ReadTimeoutException:
 Operation timed out - received only 2 responses.')

 The system_auth keyspace is set to replicate X times given X nodes in
 each datacenter, and at the time of the exception all nodes are reporting
 as online and healthy. After a short period (i.e. 30 minutes), it will let
 me in again.

 What could be the cause of this?





JSON to Cassandra ?

2014-07-22 Thread Alain RODRIGUEZ
Hi guys, I know this topic as already been spoken many times, and I read a
lot of these discussions.

Yet, I have not been able to find a good way to do what I want.

We are receiving messages from our app that is a complex, dynamic, nested
JSON (can be a few or thousands of attributes). JSON is variable and can
contain nested arrays or sub-JSONs.

Please, consider this example:

JSON

{
struct-id: 141241321,
nested-1-1: {
value-1-1-1: 36d1f74d-1663-418d-8b1b-665bbb2d9ecb,
value-1-1-2: 5,
value-1-1-3: 0.5,
value-1-1-4: [foo, bar, foobar],
nested-2-1: {
test-2-1-1: whatever,
test-2-1-2: 42
}
},
nested-1-2: {
value-1-2-1: [{
id: 1,
deeply-nested: {
data-1: test,
data-2: 4023
}
},
{
id: 2,
data-3: that's enough data
}]
}
}

We would like to store those messages to Cassandra and then run SPARK jobs
over it. Basically, storing it as a text (full JSON in one column) would
work but wouldn't be optimised since I might want to count how many times
value-1-1-3 is bigger or equal to 1, I would have to read all the JSON
before answering this. I read a lot of things about people using composite
columns and dynamic composite columns, but no precise example. I am also
aware of collections support, yet nested collections are not supported
currently.

I would like to have:

- 1 column per attribute
- typed values
- something that would be able to parse and store any valid JSON (with
nested arrays of JSON or whatever).
- The most efficient model to use alongside with spark to query anything
inside.

What would be the possible CQL schemas to create such a data structure ?

What are the defaults of the following schema ?

Cassandra

CREATE TABLE test-schema (
struct-id int,
nested-1-1#value-1-1-1 string,
nested-1-1#value-1-1-2 int,
nested-1-1#value-1-1-3 float,
nested-1-1#value-1-1-4#array0 string,
nested-1-1#value-1-1-4#array1 string,
nested-1-1#value-1-1-4#array2 string,
nested-1-1#nested-2-1#test-2-1-1 string,
nested-1-1#nested-2-1#test-2-1-2 int,
nested-1-2#value-1-2-1#array0#id int,
nested-1-2#value-1-2-1#array0#deeply-nested#data-1 string,
nested-1-2#value-1-2-1#array0#deeply-nested#data-2 int,
nested-1-2#id int,
nested-1-2#data-3 string,
PRIMARY KEY (struct-id)
)

I could use:

nested-1-1#value-1-1-4 liststring,

instead of:

nested-1-1#value-1-1-4#array0 string,
nested-1-1#value-1-1-4#array1 string,
nested-1-1#value-1-1-4#array2 string,

yet it wouldn't work here:

nested-1-2#value-1-2-1#array0#deeply-nested#data-1 string,
nested-1-2#value-1-2-1#array0#deeply-nested#data-2 int,
nested-1-2#value-1-2-1#array1#id int,
nested-1-2#value-1-2-1#array1#data-3 string,

since this is a nested structure inside the list.



To create this schema, could we imagine that the app logging this try to
write to the corresponding column, for each JSON attribute, and if the
column is missing, catch the error, create the column and reprocess write ?

This exception would happen for each new field, only once and would modify
the schema.

Any thought that would help us (and probably more people) ?

Alain


Re: I want either all the DML statements within the batch succeed or rollback all. is it possible?

2014-07-22 Thread Jack Krupansky
No joins in Cassandra. But... with DataStax Enterprise (DSE) which integrates 
Solr with Cassandra, limited Join support is available. In particular, an outer 
join between two tables provided that they share identical partition key 
values, so that the joined data is guaranteed to be on the same node. For 
example, you could join a “customer” table to a “customer-order” table. Or a 
“user” table to a “user-comment” table.

That said, the primary focus should always be to denormalize or flatten your 
data, sometimes with materialized views, to the extent possible since 
arbitrary, open-ended SQL-like joins can be horrendously expensive.

-- Jack Krupansky

From: M.Tarkeshwar Rao 
Sent: Tuesday, July 22, 2014 9:45 AM
To: user@cassandra.apache.org 
Subject: I want either all the DML statements within the batch succeed or 
rollback all. is it possible?

Hi all,  

In the user guide of  Cassandra i got the information about the batch for 
atomic DML operations.
I want either all the DML statements within the batch succeed or rollback all.

is it possible? 

another question in my can i use joins in Cassandra or any other way to achieve 
it.


Regards
Tarkeshwar

Re: JSON to Cassandra ?

2014-07-22 Thread Jack Krupansky
DSE, with Solr integration, does provide “field input transformers” so that you 
can parse a column in JSON or any other format and then split it into any 
number of Solr fields, including dynamic fields, which would then let you query 
elements of that JSON.

-- Jack Krupansky

From: Alain RODRIGUEZ 
Sent: Tuesday, July 22, 2014 11:29 AM
To: user@cassandra.apache.org 
Subject: Re: JSON to Cassandra ?

Hi, 

This seems to fit, even if I would need to have to look on how these fields can 
be queried and indexed. Also, I would need to see if those UDF can be modified 
once created and how they behave in this use case.

Yet, 2.1 is currently in beta, and we won't switch to this version immediately 
(even if we could take profit of this and improved counters also...) since we 
are using C*1.2 and are giving a try at DSE 4.5. In both cases, we are far from 
using 2.1. How does people use to do this without UDF ?

Thanks for the pointer though, will probably help someday :-).



2014-07-22 16:30 GMT+02:00 Jack Krupansky j...@basetechnology.com:

  Sounds like user-defined types (UDF) in Cassandra 2.1:
  https://issues.apache.org/jira/browse/CASSANDRA-5590

  But... be careful to make sure that you aren’t using this powerful (and 
dangerous) feature as a crutch merely to avoid disciplined data modeling.

  -- Jack Krupansky

  From: Alain RODRIGUEZ 
  Sent: Tuesday, July 22, 2014 9:56 AM
  To: user@cassandra.apache.org 
  Subject: JSON to Cassandra ?

  Hi guys, I know this topic as already been spoken many times, and I read a 
lot of these discussions. 

  Yet, I have not been able to find a good way to do what I want.

  We are receiving messages from our app that is a complex, dynamic, nested 
JSON (can be a few or thousands of attributes). JSON is variable and can 
contain nested arrays or sub-JSONs.

  Please, consider this example:

  JSON

  {
  struct-id: 141241321,
  nested-1-1: {
  value-1-1-1: 36d1f74d-1663-418d-8b1b-665bbb2d9ecb,
  value-1-1-2: 5,
  value-1-1-3: 0.5,
  value-1-1-4: [foo, bar, foobar],
  nested-2-1: {
  test-2-1-1: whatever,
  test-2-1-2: 42
  }
  },
  nested-1-2: {
  value-1-2-1: [{
  id: 1,
  deeply-nested: {
  data-1: test,
  data-2: 4023
  }
  },
  {
  id: 2,
  data-3: that's enough data
  }]
  }
  }

  We would like to store those messages to Cassandra and then run SPARK jobs 
over it. Basically, storing it as a text (full JSON in one column) would work 
but wouldn't be optimised since I might want to count how many times 
value-1-1-3 is bigger or equal to 1, I would have to read all the JSON before 
answering this. I read a lot of things about people using composite columns and 
dynamic composite columns, but no precise example. I am also aware of 
collections support, yet nested collections are not supported currently.

  I would like to have:

  - 1 column per attribute
  - typed values
  - something that would be able to parse and store any valid JSON (with nested 
arrays of JSON or whatever).
  - The most efficient model to use alongside with spark to query anything 
inside.

  What would be the possible CQL schemas to create such a data structure ?

  What are the defaults of the following schema ?

  Cassandra

  CREATE TABLE test-schema (
  struct-id int,
  nested-1-1#value-1-1-1 string,
  nested-1-1#value-1-1-2 int,
  nested-1-1#value-1-1-3 float,
  nested-1-1#value-1-1-4#array0 string,
  nested-1-1#value-1-1-4#array1 string,
  nested-1-1#value-1-1-4#array2 string,
  nested-1-1#nested-2-1#test-2-1-1 string,
  nested-1-1#nested-2-1#test-2-1-2 int,
  nested-1-2#value-1-2-1#array0#id int,
  nested-1-2#value-1-2-1#array0#deeply-nested#data-1 string,
  nested-1-2#value-1-2-1#array0#deeply-nested#data-2 int,
  nested-1-2#id int,
  nested-1-2#data-3 string,
  PRIMARY KEY (struct-id)
  )

  I could use:

  nested-1-1#value-1-1-4 liststring,


  instead of:

  nested-1-1#value-1-1-4#array0 string,
  nested-1-1#value-1-1-4#array1 string,
  nested-1-1#value-1-1-4#array2 string,

  yet it wouldn't work here:

  nested-1-2#value-1-2-1#array0#deeply-nested#data-1 string,
  nested-1-2#value-1-2-1#array0#deeply-nested#data-2 int,
  nested-1-2#value-1-2-1#array1#id int,
  nested-1-2#value-1-2-1#array1#data-3 string,

  since this is a nested structure inside the list.



  To create this schema, could we imagine that the app logging this try to 
write to the corresponding column, for each JSON attribute, and if the column 
is missing, catch the error, create the column and reprocess write ?

  This exception would happen for each new field, only once and would modify 
the schema.

  Any thought that would help us (and probably more people) ?

  Alain


Re: Which way to Cassandraville?

2014-07-22 Thread jcllings
Correction, I mean vendor specific. Proprietary is OK so long as there
aren't any lock-in tricks or they can be dodged easy.

Jim C.

On 07/22/2014 12:12 PM, jcllings wrote:
 Does it have an annotation scheme or arrangement so I don't have to
 put proprietary stuff in my Java?

 Jim C.

 On 07/20/2014 06:24 PM, Kevin Burton wrote:


 I just finished reading Cassandra: The Definitive Guide which seems
 pretty out of date and while very informative as to the
 technology that
 Cassandra uses, was not very helpful from the perspective of an
 application developer.

 Very very out of date… 
  

 Having said that, what Java clients should I be looking at? 


 I'd recommend the Datastax Java Driver.  Works really well for us and
 if you're familiar with JDBC it will be easy to get up and running fast.

 They are supporting it pretty aggressively too… the custom data type
 stuff is already supported in 2.1.
  

  Are there
 any reasonably mature PoJo mapping techs for Cassandra analogous to
 Hibernate?


 One was just posted to the list… I would say there are 2-3 … I posted
 on the same question and there's a thread around my email address if
 you want to search for it.

 I personally ended up writing my own that just used a velocity code
 generator so I could control the byte code output easily.

  

 I can't say that I'm looking forward to yet another *QL
 variant but I guess CQL is going to be a necessity.  


 It's very close to an abbreviated SQL92 with a few less features.

 You won't have a problem. 


 -- 
 Founder/CEO Spinn3r.com http://Spinn3r.com
 Location: *San Francisco, CA*
 blog:* *http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com




signature.asc
Description: OpenPGP digital signature


Re: Which way to Cassandraville?

2014-07-22 Thread jcllings
So It seems that:

1. There are indeed a few (3-4) mapping schemes.
2. CQL isn't very hard and represents a subset of  (ANSI?) SQ92.

Both of these are validated based on further research and list guidance.

It appears that learning Cassandra from an application developers
perspective essentially means learning what you can't do at all and
learning what you can't do directly that you could do with an RDMBS.
This and keys and maybe a thing or two about replication strategies and
you should be good to go.  Does this seem accurate?

What kinds of things would it be good to know for an interview?

Jim C.



signature.asc
Description: OpenPGP digital signature


Re: Which way to Cassandraville?

2014-07-22 Thread DuyHai Doan
What kinds of things would it be good to know for an interview?

 The underlying storage engine and how CQL3 maps to it. It's more than
important, it's crucial. Knowing what you do and what you can't with CQL3
is not sufficient.




On Tue, Jul 22, 2014 at 9:20 PM, jcllings jclli...@gmail.com wrote:

 So It seems that:

 1. There are indeed a few (3-4) mapping schemes.
 2. CQL isn't very hard and represents a subset of  (ANSI?) SQ92.

 Both of these are validated based on further research and list guidance.

 It appears that learning Cassandra from an application developers
 perspective essentially means learning what you can't do at all and
 learning what you can't do directly that you could do with an RDMBS.
 This and keys and maybe a thing or two about replication strategies and
 you should be good to go.  Does this seem accurate?

 What kinds of things would it be good to know for an interview?

 Jim C.




Running Cassandra Server in an OSGi container

2014-07-22 Thread Rodgers, Hugh
Hello -

I have a use case where I need to run the Cassandra Server as an OSGi bundle. I 
have been able to embed all of the Cassandra dependencies in an OSGi bundle and 
run it on Karaf container, but I am not happy with the approach I have thus far.

Since CassandraDaemon has System.exit() calls in it, if these execute it will 
bring down my entire OSGi container rather than just the bundle Cassandra is 
running in. I hacked up a copy of CassandraDaemon enough to get it to run in 
the bundle with no System.exit() calls, but the Cassandra StorageService is not 
aware of it, i.e., I cannot call the StorageService.registerDaemon(...) 
method because my copy of CassandraDaemon does not extend Apache's. hence I am 
getting exceptions when I do shutdown my container or restart the bundle 
because the StorageService and my CassandraDaemon are not linked.

I am considering trying to extend Apache's CassandraDaemon and override its 
setup() method with a SecurityManager that disables System.exit() calls. This 
too sounds hacky.

Does anyone have any better suggestions? Or know of an existing open source 
project that has successfully embedded CassandraServer in an OSGi bundle?

I am using Cassandra v2.0.7 and am currently using CQL (vs. Thrift).

Thanks -

Hugh



Re: Which way to Cassandraville?

2014-07-22 Thread Robert Stupp
You can also try http://caffinitas.org - Open Source Java object mapper for C* 
using Datastax's Java Driver licensed using APL2. It is intended to be a bit 
close to what JPA does.
Although, it cannot support JPA features 1:1 since there are fundamental 
differences between RDBMS and NoSQL/C*.
But it has other features that traditional RDBMS do not have.

CQL in general is relatively close to SQL (CQL is SQL minus joins and 
subqueries, plus collections) - with C* 2.1 you can add plus user types

Regarding an interview:
1. knowledge of query-driven data model
2. knowledge of C* cluster organization / how data is distributed
3. knowledge of consistency (levels)
4. knowledge of C* read and write path

Robert

Am 22.07.2014 um 21:20 schrieb jcllings jclli...@gmail.com:

 So It seems that:
 
 1. There are indeed a few (3-4) mapping schemes.
 2. CQL isn't very hard and represents a subset of  (ANSI?) SQ92.
 
 Both of these are validated based on further research and list guidance.
 
 It appears that learning Cassandra from an application developers
 perspective essentially means learning what you can't do at all and
 learning what you can't do directly that you could do with an RDMBS.
 This and keys and maybe a thing or two about replication strategies and
 you should be good to go.  Does this seem accurate?
 
 What kinds of things would it be good to know for an interview?
 
 Jim C.
 



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Which way to Cassandraville?

2014-07-22 Thread jcllings
OK to clarify, I don't mean as an Administrator but an application
developer.  If you use an ORM how important is CQL3?  The object being
to eliminate any *QL from Java code.
Perhaps this technology isn't as mature as I thought.

Jim C.

On 07/22/2014 12:42 PM, DuyHai Doan wrote:
 What kinds of things would it be good to know for an interview?

  The underlying storage engine and how CQL3 maps to it. It's more than
 important, it's crucial. Knowing what you do and what you can't with
 CQL3 is not sufficient.




 On Tue, Jul 22, 2014 at 9:20 PM, jcllings jclli...@gmail.com
 mailto:jclli...@gmail.com wrote:

 So It seems that:

 1. There are indeed a few (3-4) mapping schemes.
 2. CQL isn't very hard and represents a subset of  (ANSI?) SQ92.

 Both of these are validated based on further research and list
 guidance.

 It appears that learning Cassandra from an application developers
 perspective essentially means learning what you can't do at all and
 learning what you can't do directly that you could do with an RDMBS.
 This and keys and maybe a thing or two about replication
 strategies and
 you should be good to go.  Does this seem accurate?

 What kinds of things would it be good to know for an interview?

 Jim C.





signature.asc
Description: OpenPGP digital signature


Re: Which way to Cassandraville?

2014-07-22 Thread Russell Bradberry
Having an ORM says nothing about the maturity of a database, it says more about 
the community and their willingness to create one.  The database itself has 
nothing to do with the creation of the ORM.  Atop everything else, as was 
stated, knowing how to model your queries is the most important thing, more 
than knowing how to use the driver.  Cassandra has a very specific way that it 
stores data, if you attempt to store data the way you do in any other RDBMS 
there is a good chance you will have a very hard time.  

Also, this: http://my.safaribooksonline.com/book/databases/9780133440195

We wrote it for 1.2 but most all of the information still applies.

The performance gains you get from Cassandra come at a cost, that cost being 
that you need to know what you are doing.


On July 22, 2014 at 4:01:21 PM, jcllings (jclli...@gmail.com) wrote:

OK to clarify, I don't mean as an Administrator but an application developer.  
If you use an ORM how important is CQL3?  The object being to eliminate any *QL 
from Java code.
Perhaps this technology isn't as mature as I thought.

Jim C.

On 07/22/2014 12:42 PM, DuyHai Doan wrote:
What kinds of things would it be good to know for an interview?

 The underlying storage engine and how CQL3 maps to it. It's more than 
important, it's crucial. Knowing what you do and what you can't with CQL3 is 
not sufficient.




On Tue, Jul 22, 2014 at 9:20 PM, jcllings jclli...@gmail.com wrote:
So It seems that:

1. There are indeed a few (3-4) mapping schemes.
2. CQL isn't very hard and represents a subset of  (ANSI?) SQ92.

Both of these are validated based on further research and list guidance.

It appears that learning Cassandra from an application developers
perspective essentially means learning what you can't do at all and
learning what you can't do directly that you could do with an RDMBS.
This and keys and maybe a thing or two about replication strategies and
you should be good to go.  Does this seem accurate?

What kinds of things would it be good to know for an interview?

Jim C.





Re: Which way to Cassandraville?

2014-07-22 Thread Robert Stupp
Let me respond with another question: How important is SQL for a JPA developer?

Mappers eliminate the boring and error-prone stuff like execute SELECT, read 
fields, call setters etc. They can automatically perform conversions, apply 
optimizations, etc etc etc.
Mappers do not remove the need of a developer to think about that what (s)he's 
coding.
IMO mappers help and make life easier. Period.

Means: you should always know what the thing does to read/write your data. 
Practically not down to the details - but the concepts and pitfalls should be 
known.
If you don't you will get into trouble - sooner or later.

Robert

PS: I avoid the abbreviation ORM - it includes the term relational ;)


Am 22.07.2014 um 22:00 schrieb jcllings jclli...@gmail.com:

 OK to clarify, I don't mean as an Administrator but an application developer. 
  If you use an ORM how important is CQL3?  The object being to eliminate any 
 *QL from Java code. 
 Perhaps this technology isn't as mature as I thought. 
 
 Jim C.
 
 On 07/22/2014 12:42 PM, DuyHai Doan wrote:
 What kinds of things would it be good to know for an interview?
 
  The underlying storage engine and how CQL3 maps to it. It's more than 
 important, it's crucial. Knowing what you do and what you can't with CQL3 is 
 not sufficient.
 
 
 
 
 On Tue, Jul 22, 2014 at 9:20 PM, jcllings jclli...@gmail.com wrote:
 So It seems that:
 
 1. There are indeed a few (3-4) mapping schemes.
 2. CQL isn't very hard and represents a subset of  (ANSI?) SQ92.
 
 Both of these are validated based on further research and list guidance.
 
 It appears that learning Cassandra from an application developers
 perspective essentially means learning what you can't do at all and
 learning what you can't do directly that you could do with an RDMBS.
 This and keys and maybe a thing or two about replication strategies and
 you should be good to go.  Does this seem accurate?
 
 What kinds of things would it be good to know for an interview?
 
 Jim C.
 
 
 



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Running Cassandra Server in an OSGi container

2014-07-22 Thread Robert Stupp
What's your intention to do this?

There are unit test integrations using C* daemon. A related bug that prevented 
proper shutdown has been closed for C* 2.1-rc1: 
https://issues.apache.org/jira/browse/CASSANDRA-5635
It's perfectly fine to embed C* for unit tests.

But I'd definitely not recommend to use C* within a container in a real 
production environment.
Not just because of the few System.exit calls in CassandraDaemon but also of 
the other places where System.exit is called for very good reasons. These 
reasons include system/node failure scenarios (for example disk failures).

C* is designed to run in its own JVM process using dedicated hardware resources 
on multiple servers using commodity hardware without any virtualization or any 
shared storage. And it just works great with that.

There are good reasons to move computation near to the data - but that's always 
a separate OS process on C* nodes. Examples are Hadoop and Spark.

Am 22.07.2014 um 21:45 schrieb Rodgers, Hugh hugh.rodg...@lmco.com:

 Hello –
  
 I have a use case where I need to run the Cassandra Server as an OSGi bundle. 
 I have been able to embed all of the Cassandra dependencies in an OSGi bundle 
 and run it on Karaf container, but I am not happy with the approach I have 
 thus far.
  
 Since CassandraDaemon has System.exit() calls in it, if these execute it will 
 bring down my entire OSGi container rather than just the bundle Cassandra is 
 running in. I hacked up a copy of CassandraDaemon enough to get it to run in 
 the bundle with no System.exit() calls, but the Cassandra StorageService is 
 not “aware” of it, i.e., I cannot call the StorageService.registerDaemon(…) 
 method because my copy of CassandraDaemon does not extend Apache’s. hence I 
 am getting exceptions when I do shutdown my container or restart the bundle 
 because the StorageService and my CassandraDaemon are not “linked”.
  
 I am considering trying to extend Apache’s CassandraDaemon and override its 
 setup() method with a SecurityManager that disables System.exit() calls. This 
 too sounds “hacky”.
  
 Does anyone have any better suggestions? Or know of an existing open source 
 project that has successfully embedded CassandraServer in an OSGi bundle?
  
 I am using Cassandra v2.0.7 and am currently using CQL (vs. Thrift).
  
 Thanks –
  
 Hugh



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Which way to Cassandraville?

2014-07-22 Thread jcllings

On 07/22/2014 01:11 PM, Robert Stupp wrote:
 Let me respond with another question: How important is SQL for a JPA
 developer?
 ...
 IMO mappers help and make life easier. Period.

 Means: you should always know what the thing does to read/write your
 data. Practically not down to the details - but the concepts and
 pitfalls should be known.
 If you don't you will get into trouble - sooner or later.

 Robert

 PS: I avoid the abbreviation ORM - it includes the term relational ;)


Agreed. That is why in previous posts I've been calling it PoJo
Mapping.  When someone suggests I try on yet another hat, though, I get
a little excited. ;-) 

In this case I've been wearing the ORM / RDBMS hat for long enough that
I actually don't think about it much. So your point is made. I've
already been wearing the hat in question.  I surmise if you are using a
mapper, it should be more a matter of knowing how the annotations map to
the back-end rather than the CQL. This may make the transition easier,
because as you say, it eliminates the cruft.


Jim C.


signature.asc
Description: OpenPGP digital signature


Re: Which way to Cassandraville?

2014-07-22 Thread DuyHai Doan
I surmise if you are using a mapper, it should be more a matter of knowing
how the annotations map to the back-end rather than the CQL

 It would be too easy. You should also know how the CQL3 maps to underlying
data storage.




On Tue, Jul 22, 2014 at 10:33 PM, jcllings jclli...@gmail.com wrote:


 On 07/22/2014 01:11 PM, Robert Stupp wrote:

 Let me respond with another question: How important is SQL for a JPA
 developer?
 ...

  IMO mappers help and make life easier. Period.


  Means: you should always know what the thing does to read/write your
 data. Practically not down to the details - but the concepts and pitfalls
 should be known.
 If you don't you will get into trouble - sooner or later.

  Robert

  PS: I avoid the abbreviation ORM - it includes the term relational ;)


 Agreed. That is why in previous posts I've been calling it PoJo
 Mapping.  When someone suggests I try on yet another hat, though, I get a
 little excited. ;-)

 In this case I've been wearing the ORM / RDBMS hat for long enough that I
 actually don't think about it much. So your point is made. I've already
 been wearing the hat in question.  I surmise if you are using a mapper, it
 should be more a matter of knowing how the annotations map to the back-end
 rather than the CQL. This may make the transition easier, because as you
 say, it eliminates the cruft.


 Jim C.



Re: Which way to Cassandraville?

2014-07-22 Thread Robert Stupp
Yep - too easy. It does not matter what you use (CQL3, Pojo Mapper ;) or 
whatever). And I guess it's easier for a pure Java coder knowing nothing about 
C* to start with a mapper. But in the end you should know what's going on - 
since you will be in the position to fix bugs and performance issues. And I 
think there's no opposition when I say that it's better to prevent bugs ;)
The easiest way to learn things is just to start using it - play with it - make 
tests - dig around - build a prototype - benchmarks - performance tests - again 
and again. But throw away your prototype - start from scratch - with the 
lessons learned in mind :)

Am 22.07.2014 um 22:37 schrieb DuyHai Doan doanduy...@gmail.com:

 I surmise if you are using a mapper, it should be more a matter of knowing 
 how the annotations map to the back-end rather than the CQL
 
  It would be too easy. You should also know how the CQL3 maps to underlying 
 data storage.
 
  
 
 
 On Tue, Jul 22, 2014 at 10:33 PM, jcllings jclli...@gmail.com wrote:
 
 On 07/22/2014 01:11 PM, Robert Stupp wrote:
 Let me respond with another question: How important is SQL for a JPA 
 developer?
 ...
 
 IMO mappers help and make life easier. Period.
 
 Means: you should always know what the thing does to read/write your data. 
 Practically not down to the details - but the concepts and pitfalls should 
 be known.
 If you don't you will get into trouble - sooner or later.
 
 Robert
 
 PS: I avoid the abbreviation ORM - it includes the term relational ;)
 
 
 Agreed. That is why in previous posts I've been calling it PoJo Mapping.  
 When someone suggests I try on yet another hat, though, I get a little 
 excited. ;-)  
 
 In this case I've been wearing the ORM / RDBMS hat for long enough that I 
 actually don't think about it much. So your point is made. I've already been 
 wearing the hat in question.  I surmise if you are using a mapper, it should 
 be more a matter of knowing how the annotations map to the back-end rather 
 than the CQL. This may make the transition easier, because as you say, it 
 eliminates the cruft. 
 
 
 Jim C. 
 



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Which way to Cassandraville?

2014-07-22 Thread jcllings

On 07/22/2014 01:37 PM, DuyHai Doan wrote:
 I surmise if you are using a mapper, it should be more a matter of
 knowing how the annotations map to the back-end rather than the CQL

  It would be too easy. You should also know how the CQL3 maps to
 underlying data storage.

It would be if I intended to stop there. I was just picking a familiar
starting point. The best employee at any interview, of course, is both
omniscient and omnipotent.
Ah...but then we would be merely leasing in in HIS universe. ;-)

Jim C.



signature.asc
Description: OpenPGP digital signature


Re: Which way to Cassandraville?

2014-07-22 Thread Jake Luciani
Checkout datastax devcenter which is a GUI datamodelling tool for cql3

http://www.datastax.com/what-we-offer/products-services/devcenter


On Sun, Jul 20, 2014 at 7:17 PM, jcllings jclli...@gmail.com wrote:

 So I'm a Java application developer and I'm trying to find entry points
 for learning to work with Cassandra.
 I just finished reading Cassandra: The Definitive Guide which seems
 pretty out of date and while very informative as to the technology that
 Cassandra uses, was not very helpful from the perspective of an
 application developer.

 Having said that, what Java clients should I be looking at?  Are there
 any reasonably mature PoJo mapping techs for Cassandra analogous to
 Hibernate? I can't say that I'm looking forward to yet another *QL
 variant but I guess CQL is going to be a necessity.  What, if any, GUI
 tools are available for working with Cassandra, for data modelling?

 Jim C.




-- 
http://twitter.com/tjake


Re: Which way to Cassandraville?

2014-07-22 Thread Michael Dykman
Removing *QL from application code is not really an indicator of the
maturity of a technology. ORMs and automatic type mapping in general
tend to be very easy things for a developer to work with allowing for
rapid prototypes, but those applications are often ill-suited to being
deployed is high-volume environments.

I have used a wide variety of ORMs over the last 15 years, hibernate
being a favourite at which I am held to have some expertise, but when
I am creating an app for the real world in situations where I can
expect several million requests/day, I do not touch them.


On Tue, Jul 22, 2014 at 5:10 PM, Jake Luciani jak...@gmail.com wrote:
 Checkout datastax devcenter which is a GUI datamodelling tool for cql3

 http://www.datastax.com/what-we-offer/products-services/devcenter


 On Sun, Jul 20, 2014 at 7:17 PM, jcllings jclli...@gmail.com wrote:

 So I'm a Java application developer and I'm trying to find entry points
 for learning to work with Cassandra.
 I just finished reading Cassandra: The Definitive Guide which seems
 pretty out of date and while very informative as to the technology that
 Cassandra uses, was not very helpful from the perspective of an
 application developer.

 Having said that, what Java clients should I be looking at?  Are there
 any reasonably mature PoJo mapping techs for Cassandra analogous to
 Hibernate? I can't say that I'm looking forward to yet another *QL
 variant but I guess CQL is going to be a necessity.  What, if any, GUI
 tools are available for working with Cassandra, for data modelling?

 Jim C.




 --
 http://twitter.com/tjake



-- 
 - michael dykman
 - mdyk...@gmail.com

 May the Source be with you.


Re: Running Cassandra Server in an OSGi container

2014-07-22 Thread jcllings
I can give you some tips.

Figure out what Cassandra does when it starts up. Best way to do that is
to read the startup script.  Then all you have to do is convince the
OSGI container to do what ever prep is done by the script.  Trick to
that is usually figuring out where to do it. For example if there are
environment variables set in the script for Cassandra, you should add
them to the script for your OSGI container.  If there are any -D
options, you would have to use what ever mechanism your OSGI container
uses to pass them.  There might be a properties file for example or
there might be actual -D settings, depending.  You should probably make
your best guess as to where to put the configuration files but watch the
logs for errors to this effect, e.g. ERROR: Doh! Can't find the config
dir / file / etc.  Of course, if the Cassandra libs aren't OSGI-ified
you would have to do that also.

Jim C.

On 07/22/2014 01:19 PM, Robert Stupp wrote:
 What's your intention to do this?

 There are unit test integrations using C* daemon. A related bug that
 prevented proper shutdown has been closed for C*
 2.1-rc1: https://issues.apache.org/jira/browse/CASSANDRA-5635
 It's perfectly fine to embed C* for unit tests.

 But I'd definitely not recommend to use C* within a container in a
 real production environment.
 Not just because of the few System.exit calls in CassandraDaemon but
 also of the other places where System.exit is called for very good
 reasons. These reasons include system/node failure scenarios (for
 example disk failures).

 C* is designed to run in its own JVM process using dedicated hardware
 resources on multiple servers using commodity hardware without any
 virtualization or any shared storage. And it just works great with that.

 There are good reasons to move computation near to the data - but
 that's always a separate OS process on C* nodes. Examples are Hadoop
 and Spark.

 Am 22.07.2014 um 21:45 schrieb Rodgers, Hugh hugh.rodg...@lmco.com
 mailto:hugh.rodg...@lmco.com:

 Hello –
  
 I have a use case where I need to run the Cassandra Server as an OSGi
 bundle. I have been able to embed all of the Cassandra dependencies
 in an OSGi bundle and run it on Karaf container, but I am not happy
 with the approach I have thus far.
  
 Since CassandraDaemon has System.exit() calls in it, if these execute
 it will bring down my entire OSGi container rather than just the
 bundle Cassandra is running in. I hacked up a copy of CassandraDaemon
 enough to get it to run in the bundle with no System.exit() calls,
 but the Cassandra StorageService is not “aware” of it, i.e., I cannot
 call the StorageService.registerDaemon(…) method because my copy of
 CassandraDaemon does not extend Apache’s. hence I am getting
 exceptions when I do shutdown my container or restart the bundle
 because the StorageService and my CassandraDaemon are not “linked”.
  
 I am considering trying to extend Apache’s CassandraDaemon and
 override its setup() method with a SecurityManager that disables
 System.exit() calls. This too sounds “hacky”.
  
 Does anyone have any better suggestions? Or know of an existing open
 source project that has successfully embedded CassandraServer in an
 OSGi bundle?
  
 I am using Cassandra v2.0.7 and am currently using CQL (vs. Thrift).
  
 Thanks –
  
 Hugh




signature.asc
Description: OpenPGP digital signature


Re: Running Cassandra Server in an OSGi container

2014-07-22 Thread jcllings
BTW, I agree with other posters that it seems like an awfully weird
thing to do.  Perhaps you just want to run a client in an OSGI environment?

Jim C.

On 07/22/2014 02:39 PM, jcllings wrote:
 I can give you some tips.

 Figure out what Cassandra does when it starts up. Best way to do that
 is to read the startup script.  Then all you have to do is convince
 the OSGI container to do what ever prep is done by the script.  Trick
 to that is usually figuring out where to do it. For example if there
 are environment variables set in the script for Cassandra, you should
 add them to the script for your OSGI container.  If there are any -D
 options, you would have to use what ever mechanism your OSGI container
 uses to pass them.  There might be a properties file for example or
 there might be actual -D settings, depending.  You should probably
 make your best guess as to where to put the configuration files but
 watch the logs for errors to this effect, e.g. ERROR: Doh! Can't find
 the config dir / file / etc.  Of course, if the Cassandra libs aren't
 OSGI-ified you would have to do that also.

 Jim C.

 On 07/22/2014 01:19 PM, Robert Stupp wrote:
 What's your intention to do this?




signature.asc
Description: OpenPGP digital signature


Cassandra Scaling Alerts

2014-07-22 Thread Arup Chakrabarti
We have been going through and setting up alerts on our Cassandra clusters.
We have catastrophic alerts setup to let us know when things are super
broken, but we are now looking at setting up alerts for letting us know
when we need to start scaling vertically or horizontally.

We have alerts on our system metrics. What are the recommended metrics from
the JMX that are strong indicators of needing to scale?


Re: Which way to Cassandraville?

2014-07-22 Thread Robert Stupp
True - Hibernate, Eclipselink and others add plenty of synchronization 
overhead owed the fact that an entity instance does not need to be explicitly 
persisted to get persisted (just change the loaded instance and flush the 
session). That's very expensive (CPU and heap). Even though transaction 
synchronization adds another cost.

Pure mapping as itself is not really expensive compared to what one would do to 
return a Pojo or persist a Pojo. Take a look at 
https://bitbucket.org/snazy/caffinitas/ - 
PersistenceSessionImpl.loadOne()/insert() add not much overhead during runtime 
- but you get the object ready to use.


PS We are doing several million requests per day with Hibernate - but I spent a 
lot of work to optimize framework between business logic and JPA. It would 
not work out of the box.


Am 22.07.2014 um 23:32 schrieb Michael Dykman mdyk...@gmail.com:

 Removing *QL from application code is not really an indicator of the
 maturity of a technology. ORMs and automatic type mapping in general
 tend to be very easy things for a developer to work with allowing for
 rapid prototypes, but those applications are often ill-suited to being
 deployed is high-volume environments.
 
 I have used a wide variety of ORMs over the last 15 years, hibernate
 being a favourite at which I am held to have some expertise, but when
 I am creating an app for the real world in situations where I can
 expect several million requests/day, I do not touch them.
 
 
 On Tue, Jul 22, 2014 at 5:10 PM, Jake Luciani jak...@gmail.com wrote:
 Checkout datastax devcenter which is a GUI datamodelling tool for cql3
 
 http://www.datastax.com/what-we-offer/products-services/devcenter
 
 
 On Sun, Jul 20, 2014 at 7:17 PM, jcllings jclli...@gmail.com wrote:
 
 So I'm a Java application developer and I'm trying to find entry points
 for learning to work with Cassandra.
 I just finished reading Cassandra: The Definitive Guide which seems
 pretty out of date and while very informative as to the technology that
 Cassandra uses, was not very helpful from the perspective of an
 application developer.
 
 Having said that, what Java clients should I be looking at?  Are there
 any reasonably mature PoJo mapping techs for Cassandra analogous to
 Hibernate? I can't say that I'm looking forward to yet another *QL
 variant but I guess CQL is going to be a necessity.  What, if any, GUI
 tools are available for working with Cassandra, for data modelling?
 
 Jim C.
 
 
 
 
 --
 http://twitter.com/tjake
 
 
 
 -- 
 - michael dykman
 - mdyk...@gmail.com
 
 May the Source be with you.



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Cassandra Scaling Alerts

2014-07-22 Thread Shane Hansen
I would look at load (disk space used) and system.compactions_in_progress.



On Tue, Jul 22, 2014 at 3:49 PM, Arup Chakrabarti a...@pagerduty.com
wrote:

 We have been going through and setting up alerts on our Cassandra
 clusters. We have catastrophic alerts setup to let us know when things are
 super broken, but we are now looking at setting up alerts for letting us
 know when we need to start scaling vertically or horizontally.

 We have alerts on our system metrics. What are the recommended metrics
 from the JMX that are strong indicators of needing to scale?



Re: Which way to Cassandraville?

2014-07-22 Thread DuyHai Doan
The problem with Hibernate and the kind is that they try to do many thing
at once. And support for JOINS bring a damned lots of complexity. You need
to manage object graphs and circular references - statefull session -
not  thread-safe - not good fit for async  multi threaded envs


On Tue, Jul 22, 2014 at 11:56 PM, Robert Stupp sn...@snazy.de wrote:

 True - Hibernate, Eclipselink and others add plenty of synchronization
 overhead owed the fact that an entity instance does not need to be
 explicitly persisted to get persisted (just change the loaded instance and
 flush the session). That's very expensive (CPU and heap). Even though
 transaction synchronization adds another cost.

 Pure mapping as itself is not really expensive compared to what one would
 do to return a Pojo or persist a Pojo. Take a look at
 https://bitbucket.org/snazy/caffinitas/ -
 PersistenceSessionImpl.loadOne()/insert() add not much overhead during
 runtime - but you get the object ready to use.


 PS We are doing several million requests per day with Hibernate - but I
 spent a lot of work to optimize framework between business logic and JPA.
 It would not work out of the box.


 Am 22.07.2014 um 23:32 schrieb Michael Dykman mdyk...@gmail.com:

  Removing *QL from application code is not really an indicator of the
  maturity of a technology. ORMs and automatic type mapping in general
  tend to be very easy things for a developer to work with allowing for
  rapid prototypes, but those applications are often ill-suited to being
  deployed is high-volume environments.
 
  I have used a wide variety of ORMs over the last 15 years, hibernate
  being a favourite at which I am held to have some expertise, but when
  I am creating an app for the real world in situations where I can
  expect several million requests/day, I do not touch them.
 
 
  On Tue, Jul 22, 2014 at 5:10 PM, Jake Luciani jak...@gmail.com wrote:
  Checkout datastax devcenter which is a GUI datamodelling tool for cql3
 
  http://www.datastax.com/what-we-offer/products-services/devcenter
 
 
  On Sun, Jul 20, 2014 at 7:17 PM, jcllings jclli...@gmail.com wrote:
 
  So I'm a Java application developer and I'm trying to find entry points
  for learning to work with Cassandra.
  I just finished reading Cassandra: The Definitive Guide which seems
  pretty out of date and while very informative as to the technology that
  Cassandra uses, was not very helpful from the perspective of an
  application developer.
 
  Having said that, what Java clients should I be looking at?  Are there
  any reasonably mature PoJo mapping techs for Cassandra analogous to
  Hibernate? I can't say that I'm looking forward to yet another *QL
  variant but I guess CQL is going to be a necessity.  What, if any, GUI
  tools are available for working with Cassandra, for data modelling?
 
  Jim C.
 
 
 
 
  --
  http://twitter.com/tjake
 
 
 
  --
  - michael dykman
  - mdyk...@gmail.com
 
  May the Source be with you.




Re: Cassandra Scaling Alerts

2014-07-22 Thread DuyHai Doan
also pending read / write operations (nodetool tpstats) and I/O


On Tue, Jul 22, 2014 at 11:59 PM, Shane Hansen shanemhan...@gmail.com
wrote:

 I would look at load (disk space used) and system.compactions_in_progress.



 On Tue, Jul 22, 2014 at 3:49 PM, Arup Chakrabarti a...@pagerduty.com
 wrote:

 We have been going through and setting up alerts on our Cassandra
 clusters. We have catastrophic alerts setup to let us know when things are
 super broken, but we are now looking at setting up alerts for letting us
 know when we need to start scaling vertically or horizontally.

 We have alerts on our system metrics. What are the recommended metrics
 from the JMX that are strong indicators of needing to scale?





RE: EXTERNAL: Re: Running Cassandra Server in an OSGi container

2014-07-22 Thread Rodgers, Hugh
What got our team on the path of trying to embed C* was the wiki page 
http://wiki.apache.org/cassandra/Embedding which implies this can be done. Also 
WSO2 Carbon and Achilles have both embedded C* (not in an OSGi container 
though, and Carbon is with an older C* version).

We are wanting an unzip and run system and do not expect the user to have to 
do much, if any, C* configuration.

From: Robert Stupp [mailto:sn...@snazy.de]
Sent: Tuesday, July 22, 2014 1:19 PM
To: user@cassandra.apache.org
Subject: EXTERNAL: Re: Running Cassandra Server in an OSGi container

What's your intention to do this?

There are unit test integrations using C* daemon. A related bug that prevented 
proper shutdown has been closed for C* 2.1-rc1: 
https://issues.apache.org/jira/browse/CASSANDRA-5635
It's perfectly fine to embed C* for unit tests.

But I'd definitely not recommend to use C* within a container in a real 
production environment.
Not just because of the few System.exit calls in CassandraDaemon but also of 
the other places where System.exit is called for very good reasons. These 
reasons include system/node failure scenarios (for example disk failures).

C* is designed to run in its own JVM process using dedicated hardware resources 
on multiple servers using commodity hardware without any virtualization or any 
shared storage. And it just works great with that.

There are good reasons to move computation near to the data - but that's always 
a separate OS process on C* nodes. Examples are Hadoop and Spark.

Am 22.07.2014 um 21:45 schrieb Rodgers, Hugh 
hugh.rodg...@lmco.commailto:hugh.rodg...@lmco.com:


Hello -

I have a use case where I need to run the Cassandra Server as an OSGi bundle. I 
have been able to embed all of the Cassandra dependencies in an OSGi bundle and 
run it on Karaf container, but I am not happy with the approach I have thus far.

Since CassandraDaemon has System.exit() calls in it, if these execute it will 
bring down my entire OSGi container rather than just the bundle Cassandra is 
running in. I hacked up a copy of CassandraDaemon enough to get it to run in 
the bundle with no System.exit() calls, but the Cassandra StorageService is not 
aware of it, i.e., I cannot call the StorageService.registerDaemon(...) 
method because my copy of CassandraDaemon does not extend Apache's. hence I am 
getting exceptions when I do shutdown my container or restart the bundle 
because the StorageService and my CassandraDaemon are not linked.

I am considering trying to extend Apache's CassandraDaemon and override its 
setup() method with a SecurityManager that disables System.exit() calls. This 
too sounds hacky.

Does anyone have any better suggestions? Or know of an existing open source 
project that has successfully embedded CassandraServer in an OSGi bundle?

I am using Cassandra v2.0.7 and am currently using CQL (vs. Thrift).

Thanks -

Hugh



Re: Which way to Cassandraville?

2014-07-22 Thread Robert Coli
On Tue, Jul 22, 2014 at 1:10 PM, Russell Bradberry rbradbe...@gmail.com
wrote:

 Having an ORM says nothing about the maturity of a database, it says more
 about the community and their willingness to create one.  The database
 itself has nothing to do with the creation of the ORM.


Except, as in this case, when one has baked what looks an awful lot like an
ORM into the Database... ;D

=Rob


Re: DataType protocol ID error for TIMESTAMPs when upgrading from 1.2.11 to 2.0.9

2014-07-22 Thread Robert Coli
On Tue, Jul 22, 2014 at 1:53 AM, Ben Hood 0x6e6...@gmail.com wrote:

 As Karl has suggested, client driver maintainers have opted to
 workaround the issue.


Indeed, reading up on the issue (and discussing it with folks) there are a
number of mitigating factors, most significantly driver workarounds use of
TimeUUIDs, which made this issue less common than reversed comparators use
cases are. I still consider it a serious issue due to the nature of the
regression, but it is fair to say not as serious as my initial reaction.


 As for the unit tests, I think this issue was only reproducible when
 upgrading a schema to 2.0.x - are you suggesting that there was/is
 test coverage for this scenario in the server?


No, I was wondering why such a test, which tests for regression in very
basic table access and appears to requires no distribution, does not
currently exist.

In this particular case, the answer to why not involves the idea that one
needs to be able to test with a driver in order to expose it, and currently
(as I understand it) only distributed tests use a driver.

I believe that operators expect there to be a robust representative test
schema that can be created on version X.Y.Z and be accessed on version
X+1.y.0 which would exercise this core code and increase confidence that
tables created in major version X will always be usable without exception
in X+1.

=Rob


Case Study from Migrating from RDBMS to Cassandra

2014-07-22 Thread Surbhi Gupta
Hi,

Does anybody has the case study for Migrating from RDBMS to Cassandra ?

Thanks


Why is the cassandra documentation such poor quality?

2014-07-22 Thread Kevin Burton
This document:

https://wiki.apache.org/cassandra/Operations

… for example.  Is extremely out dated… does NOT reflect 2.x releases
certainly.  Mentions commands that are long since removed/deprecated.

Instead of giving bad documentation, maybe remove this and mark it as
obsolete.

The datastax documentation… is … acceptable I guess.  My main criticism
there is that a lot of it it is in their blog.

Kevin

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


cluster rebalancing…

2014-07-22 Thread Kevin Burton
So , shouldn't it be easy to rebalance a cluster?

I'm not super excited to type out 200 commands to move around individual
tokens.

I realize that this isn't a super easy solution, and that there are
probably 2-3 different algorithms to pick here… but having this be the only
option doesn't seem scalable.

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: cluster rebalancing…

2014-07-22 Thread Jonathan Haddad
You don't need to specify tokens. The new node gets them automatically. 

 On Jul 22, 2014, at 7:03 PM, Kevin Burton bur...@spinn3r.com wrote:
 
 So , shouldn't it be easy to rebalance a cluster?
 
 I'm not super excited to type out 200 commands to move around individual 
 tokens.
 
 I realize that this isn't a super easy solution, and that there are probably 
 2-3 different algorithms to pick here… but having this be the only option 
 doesn't seem scalable.
 
 -- 
 Founder/CEO Spinn3r.com
 Location: San Francisco, CA
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 


ONE consistency required 2 writes? huh?

2014-07-22 Thread Kevin Burton
Perhaps it's me but it seems this exception is wrong:

Cassandra timeout during write query at consistency ONE (2 replica were
required but only 1 acknowledged the write)

.. but the documentation for ONE says:

 A write must be written to the commit log and memory table of at least
one replica node.
… so… in my situation… 1 replica DID ack the write… so why am I getting an
exception?

Maybe I'm jut not interpreting the exception correctly?

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: cluster rebalancing…

2014-07-22 Thread Kevin Burton
ok.. I think I get what's happening.  This node is still joining the
cluster.

It wasn't totally clear that it was still joining as the only indicator is
the little J ...


On Tue, Jul 22, 2014 at 7:09 PM, Jonathan Haddad jonathan.had...@gmail.com
wrote:

 You don't need to specify tokens. The new node gets them automatically.

 On Jul 22, 2014, at 7:03 PM, Kevin Burton bur...@spinn3r.com wrote:

 So , shouldn't it be easy to rebalance a cluster?

 I'm not super excited to type out 200 commands to move around individual
 tokens.

 I realize that this isn't a super easy solution, and that there are
 probably 2-3 different algorithms to pick here… but having this be the only
 option doesn't seem scalable.

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


All writes fail with ONE consistency level when adding second node to cluster?

2014-07-22 Thread Kevin Burton
I'm super confused by this.. and disturbed that this was my failure
scenario :-(

I had one cassandra node for the alpha of my app… and now we're moving into
beta… which means three replicas.

So I added the second node… but my app immediately broke with:

Cassandra timeout during write query at consistency ONE (2 replica were
required but only 1 acknowledged the write)

… but that makes no sense… if I'm at ONE and I have one acknowledged write,
why does it matter that the second one hasn't ack'd yet…

?

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: All writes fail with ONE consistency level when adding second node to cluster?

2014-07-22 Thread Andrew
ONE means write to one replica (in addition to the original).  If you want to 
write to any of them, use ANY.  Is that the right understanding?

http://www.datastax.com/docs/1.0/dml/data_consistency

Andrew

On July 22, 2014 at 7:43:43 PM, Kevin Burton (bur...@spinn3r.com) wrote:

I'm super confused by this.. and disturbed that this was my failure scenario :-(

I had one cassandra node for the alpha of my app… and now we're moving into 
beta… which means three replicas.

So I added the second node… but my app immediately broke with:

Cassandra timeout during write query at consistency ONE (2 replica were 
required but only 1 acknowledged the write)

… but that makes no sense… if I'm at ONE and I have one acknowledged write, why 
does it matter that the second one hasn't ack'd yet…

?

--
Founder/CEO Spinn3r.com
Location: San Francisco, CA
blog: http://burtonator.wordpress.com
… or check out my Google+ profile


Re: Case Study from Migrating from RDBMS to Cassandra

2014-07-22 Thread Shane Hansen
There's lots of info on migrating from a relational database to Cassandra
here:
http://www.datastax.com/relational-database-to-nosql



On Tue, Jul 22, 2014 at 7:45 PM, Surbhi Gupta surbhi.gupt...@gmail.com
wrote:

 Hi,

 Does anybody has the case study for Migrating from RDBMS to Cassandra ?

 Thanks



Re: All writes fail with ONE consistency level when adding second node to cluster?

2014-07-22 Thread Kevin Burton
WEIRD that it was working before… with one node.  Granted that this is a
rare config (one cassandra node) but it shouldn't work then.

If you attempt to write ONE to a single cassandra node, there is no (in
addition to) additional node to write to…

So this should have failed.

Bug?

… and I know why this is failing… my cassandra node is joining the
cluster now, but none of the ports are open.  So all writes will fail… I
have NO idea why the ports aren't open yet .. but it's not a firewall issue.



On Tue, Jul 22, 2014 at 7:46 PM, Andrew redmu...@gmail.com wrote:

 ONE means write to one replica (in addition to the original).  If you want
 to write to any of them, use ANY.  Is that the right understanding?

 http://www.datastax.com/docs/1.0/dml/data_consistency

 Andrew

 On July 22, 2014 at 7:43:43 PM, Kevin Burton (bur...@spinn3r.com) wrote:

  I'm super confused by this.. and disturbed that this was my failure
 scenario :-(

 I had one cassandra node for the alpha of my app… and now we're moving
 into beta… which means three replicas.

 So I added the second node… but my app immediately broke with:

 Cassandra timeout during write query at consistency ONE (2 replica were
 required but only 1 acknowledged the write)

 … but that makes no sense… if I'm at ONE and I have one acknowledged
 write, why does it matter that the second one hasn't ack'd yet…

 ?

 --

  Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
  http://spinn3r.com




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: All writes fail with ONE consistency level when adding second node to cluster?

2014-07-22 Thread Kevin Burton
Yeah.. that's fascinating … so now I get something that's even worse:

Cassandra timeout during write query at consistency ANY (2 replica were
required but only 1 acknowledged the write)

… the issue is that the new cassandra node has all its ports closed.

Only the storage port is open.

So obviously writes are going to fail to it.

… is this by design?  Perhaps it's not going to open the ports until the
node joins the ring?  It's currently joining …

so… basically, my entire cluster is offline during this join?

I assume this is either a bug or some weird state base on growing from 1-2
nodes?

frustrating :-(


On Tue, Jul 22, 2014 at 8:13 PM, graham sanderson gra...@vast.com wrote:

 Incorrect, ONE does not refer to the number of “other nodes, it just
 refers to the number of nodes. so ONE under normal circumstances would only
 require one node to acknowledge the write.

 The confusing error message you are getting is related to
 https://issues.apache.org/jira/browse/CASSANDRA-833… Kevin you are
 correct in that normally that error message would make no sense.

 I don’t have much experience adding/removing nodes, but I think what is
 happening is that your new node is in the middle of taken over ownership of
 a token range - while that happens C* is trying to write to both the old
 owner (your original node), AND (hence the 2 not 1 in the error message)
 the new owner (the new node) so that once the bootstrapping of the new node
 is complete, it is immediately safe to delete the (no longer owned data)
 from the old node. For whatever reason the write to the new node is timing
 out, causing the exception, and the error message is exposing the “2” which
 happens to be how many C* thinks it is waiting for at the time (i.e. how
 many it should be waiting for based on the consistency level (1) plus this
 extra node).


 On Jul 22, 2014, at 9:46 PM, Andrew redmu...@gmail.com wrote:

 ONE means write to one replica (in addition to the original).  If you want
 to write to any of them, use ANY.  Is that the right understanding?

 http://www.datastax.com/docs/1.0/dml/data_consistency

 Andrew

 On July 22, 2014 at 7:43:43 PM, Kevin Burton (bur...@spinn3r.com) wrote:

 I'm super confused by this.. and disturbed that this was my failure
 scenario :-(

 I had one cassandra node for the alpha of my app… and now we're moving
 into beta… which means three replicas.

 So I added the second node… but my app immediately broke with:

 Cassandra timeout during write query at consistency ONE (2 replica were
 required but only 1 acknowledged the write)

 … but that makes no sense… if I'm at ONE and I have one acknowledged
 write, why does it matter that the second one hasn't ack'd yet…

 ?

 --

 Founder/CEO Spinn3r.com http://spinn3r.com/
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com/





-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: All writes fail with ONE consistency level when adding second node to cluster?

2014-07-22 Thread Kevin Burton
and there are literally zero google hits on the query: Cassandra timeout
during write query at consistency ANY (2 replica were required but only 1
acknowledged the write)

.. so I imagine I'm the first to find this bug!  Aren't I lucky!


On Tue, Jul 22, 2014 at 8:46 PM, Kevin Burton bur...@spinn3r.com wrote:

 Yeah.. that's fascinating … so now I get something that's even worse:

 Cassandra timeout during write query at consistency ANY (2 replica were
 required but only 1 acknowledged the write)

 … the issue is that the new cassandra node has all its ports closed.

 Only the storage port is open.

 So obviously writes are going to fail to it.

 … is this by design?  Perhaps it's not going to open the ports until the
 node joins the ring?  It's currently joining …

 so… basically, my entire cluster is offline during this join?

 I assume this is either a bug or some weird state base on growing from 1-2
 nodes?

 frustrating :-(


 On Tue, Jul 22, 2014 at 8:13 PM, graham sanderson gra...@vast.com wrote:

 Incorrect, ONE does not refer to the number of “other nodes, it just
 refers to the number of nodes. so ONE under normal circumstances would only
 require one node to acknowledge the write.

 The confusing error message you are getting is related to
 https://issues.apache.org/jira/browse/CASSANDRA-833… Kevin you are
 correct in that normally that error message would make no sense.

 I don’t have much experience adding/removing nodes, but I think what is
 happening is that your new node is in the middle of taken over ownership of
 a token range - while that happens C* is trying to write to both the old
 owner (your original node), AND (hence the 2 not 1 in the error message)
 the new owner (the new node) so that once the bootstrapping of the new node
 is complete, it is immediately safe to delete the (no longer owned data)
 from the old node. For whatever reason the write to the new node is timing
 out, causing the exception, and the error message is exposing the “2” which
 happens to be how many C* thinks it is waiting for at the time (i.e. how
 many it should be waiting for based on the consistency level (1) plus this
 extra node).


 On Jul 22, 2014, at 9:46 PM, Andrew redmu...@gmail.com wrote:

 ONE means write to one replica (in addition to the original).  If you
 want to write to any of them, use ANY.  Is that the right understanding?

 http://www.datastax.com/docs/1.0/dml/data_consistency

 Andrew

 On July 22, 2014 at 7:43:43 PM, Kevin Burton (bur...@spinn3r.com) wrote:

 I'm super confused by this.. and disturbed that this was my failure
 scenario :-(

 I had one cassandra node for the alpha of my app… and now we're moving
 into beta… which means three replicas.

 So I added the second node… but my app immediately broke with:

 Cassandra timeout during write query at consistency ONE (2 replica
 were required but only 1 acknowledged the write)

 … but that makes no sense… if I'm at ONE and I have one acknowledged
 write, why does it matter that the second one hasn't ack'd yet…

 ?

 --

 Founder/CEO Spinn3r.com http://spinn3r.com/
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com/





 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: All writes fail with ONE consistency level when adding second node to cluster?

2014-07-22 Thread graham sanderson
I assumed you must have now switched to ANY which you probably didn’t want to 
do, and likely won’t help (and very few people use ANY which may explain the 
lack of google hits, plus this particular “Cassandra timeout during write query 
at consistency” error message comes from the datastax CQL java driver not C* 
itself.

In any case… my original response was just to explain to you that your 
understanding of what ONE means in general was correct, and this incorrect 
looking error message was a weird case during adding a node.

I have no idea what is going on with your bootstrapping node others may be able 
to help, but in the meanwhile I’d look for errors in the server log and google 
those and/or google for instructions on how to add nodes to a cassandra cluster 
on whatever version you are running.

On Jul 22, 2014, at 10:47 PM, Kevin Burton bur...@spinn3r.com wrote:

 and there are literally zero google hits on the query: Cassandra timeout 
 during write query at consistency ANY (2 replica were required but only 1 
 acknowledged the write)
 
 .. so I imagine I'm the first to find this bug!  Aren't I lucky!
 
 
 On Tue, Jul 22, 2014 at 8:46 PM, Kevin Burton bur...@spinn3r.com wrote:
 Yeah.. that's fascinating … so now I get something that's even worse:
 
 Cassandra timeout during write query at consistency ANY (2 replica were 
 required but only 1 acknowledged the write)
 
 … the issue is that the new cassandra node has all its ports closed.
 
 Only the storage port is open.
 
 So obviously writes are going to fail to it.
 
 … is this by design?  Perhaps it's not going to open the ports until the node 
 joins the ring?  It's currently joining …
 
 so… basically, my entire cluster is offline during this join?
 
 I assume this is either a bug or some weird state base on growing from 1-2 
 nodes?
 
 frustrating :-(
 
 
 On Tue, Jul 22, 2014 at 8:13 PM, graham sanderson gra...@vast.com wrote:
 Incorrect, ONE does not refer to the number of “other nodes, it just refers 
 to the number of nodes. so ONE under normal circumstances would only require 
 one node to acknowledge the write.
 
 The confusing error message you are getting is related to 
 https://issues.apache.org/jira/browse/CASSANDRA-833… Kevin you are correct in 
 that normally that error message would make no sense.
 
 I don’t have much experience adding/removing nodes, but I think what is 
 happening is that your new node is in the middle of taken over ownership of a 
 token range - while that happens C* is trying to write to both the old owner 
 (your original node), AND (hence the 2 not 1 in the error message) the new 
 owner (the new node) so that once the bootstrapping of the new node is 
 complete, it is immediately safe to delete the (no longer owned data) from 
 the old node. For whatever reason the write to the new node is timing out, 
 causing the exception, and the error message is exposing the “2” which 
 happens to be how many C* thinks it is waiting for at the time (i.e. how many 
 it should be waiting for based on the consistency level (1) plus this extra 
 node).
 
 
 On Jul 22, 2014, at 9:46 PM, Andrew redmu...@gmail.com wrote:
 
 ONE means write to one replica (in addition to the original).  If you want 
 to write to any of them, use ANY.  Is that the right understanding?
 
 http://www.datastax.com/docs/1.0/dml/data_consistency
 
 Andrew
 
 On July 22, 2014 at 7:43:43 PM, Kevin Burton (bur...@spinn3r.com) wrote:
 
 I'm super confused by this.. and disturbed that this was my failure 
 scenario :-(
 
 I had one cassandra node for the alpha of my app… and now we're moving into 
 beta… which means three replicas.
 
 So I added the second node… but my app immediately broke with:
 
 Cassandra timeout during write query at consistency ONE (2 replica were 
 required but only 1 acknowledged the write)
 
 … but that makes no sense… if I'm at ONE and I have one acknowledged write, 
 why does it matter that the second one hasn't ack'd yet…
 
 ?
 
 --
 
 Founder/CEO Spinn3r.com
 Location: San Francisco, CA
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 
 
 
 
 
 -- 
 
 Founder/CEO Spinn3r.com
 Location: San Francisco, CA
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 
 
 
 
 
 -- 
 
 Founder/CEO Spinn3r.com
 Location: San Francisco, CA
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 
 



smime.p7s
Description: S/MIME cryptographic signature


Re: All writes fail with ONE consistency level when adding second node to cluster?

2014-07-22 Thread Kevin Burton
Thanks of the feedback…

In hindsight.. I think what happened was that the new node started up… and
the driver wanted to write records to it… but the ports weren't up.

so I wonder if this is a bug in the datastax driver.

On bootstrap, and when joining, does cassandra always keep the ports
offline and then only open them up with the node has joined?


On Tue, Jul 22, 2014 at 8:55 PM, graham sanderson gra...@vast.com wrote:

 I assumed you must have now switched to ANY which you probably didn’t want
 to do, and likely won’t help (and very few people use ANY which may explain
 the lack of google hits, plus this particular “Cassandra timeout during
 write query at consistency” error message comes from the datastax CQL java
 driver not C* itself.

 In any case… my original response was just to explain to you that your
 understanding of what ONE means in general was correct, and this incorrect
 looking error message was a weird case during adding a node.

 I have no idea what is going on with your bootstrapping node others may be
 able to help, but in the meanwhile I’d look for errors in the server log
 and google those and/or google for instructions on how to add nodes to a
 cassandra cluster on whatever version you are running.

 On Jul 22, 2014, at 10:47 PM, Kevin Burton bur...@spinn3r.com wrote:

 and there are literally zero google hits on the query: Cassandra timeout
 during write query at consistency ANY (2 replica were required but only 1
 acknowledged the write)

 .. so I imagine I'm the first to find this bug!  Aren't I lucky!


 On Tue, Jul 22, 2014 at 8:46 PM, Kevin Burton bur...@spinn3r.com wrote:

 Yeah.. that's fascinating … so now I get something that's even worse:

 Cassandra timeout during write query at consistency ANY (2 replica were
 required but only 1 acknowledged the write)

 … the issue is that the new cassandra node has all its ports closed.

 Only the storage port is open.

 So obviously writes are going to fail to it.

 … is this by design?  Perhaps it's not going to open the ports until the
 node joins the ring?  It's currently joining …

 so… basically, my entire cluster is offline during this join?

 I assume this is either a bug or some weird state base on growing from
 1-2 nodes?

 frustrating :-(


 On Tue, Jul 22, 2014 at 8:13 PM, graham sanderson gra...@vast.com
 wrote:

 Incorrect, ONE does not refer to the number of “other nodes, it just
 refers to the number of nodes. so ONE under normal circumstances would only
 require one node to acknowledge the write.

 The confusing error message you are getting is related to
 https://issues.apache.org/jira/browse/CASSANDRA-833… Kevin you are
 correct in that normally that error message would make no sense.

 I don’t have much experience adding/removing nodes, but I think what is
 happening is that your new node is in the middle of taken over ownership of
 a token range - while that happens C* is trying to write to both the old
 owner (your original node), AND (hence the 2 not 1 in the error message)
 the new owner (the new node) so that once the bootstrapping of the new node
 is complete, it is immediately safe to delete the (no longer owned data)
 from the old node. For whatever reason the write to the new node is timing
 out, causing the exception, and the error message is exposing the “2” which
 happens to be how many C* thinks it is waiting for at the time (i.e. how
 many it should be waiting for based on the consistency level (1) plus this
 extra node).


 On Jul 22, 2014, at 9:46 PM, Andrew redmu...@gmail.com wrote:

 ONE means write to one replica (in addition to the original).  If you
 want to write to any of them, use ANY.  Is that the right understanding?

 http://www.datastax.com/docs/1.0/dml/data_consistency

 Andrew

 On July 22, 2014 at 7:43:43 PM, Kevin Burton (bur...@spinn3r.com) wrote:

 I'm super confused by this.. and disturbed that this was my failure
 scenario :-(

 I had one cassandra node for the alpha of my app… and now we're moving
 into beta… which means three replicas.

 So I added the second node… but my app immediately broke with:

 Cassandra timeout during write query at consistency ONE (2 replica
 were required but only 1 acknowledged the write)

 … but that makes no sense… if I'm at ONE and I have one acknowledged
 write, why does it matter that the second one hasn't ack'd yet…

 ?

 --

 Founder/CEO Spinn3r.com http://spinn3r.com/
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com/





 --

 Founder/CEO Spinn3r.com http://spinn3r.com/
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com/




 --

 Founder/CEO Spinn3r.com http://spinn3r.com/
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 

Re: Case Study from Migrating from RDBMS to Cassandra

2014-07-22 Thread Surbhi Gupta
Thansk Shane, Howover i am looking for any Proof of Concepts kind of
document .
Does anybody has complete end to end document which contains the
application overview,

How they have migrated from RDBMS to Cassandra?
What are the things to consider?
How they have converted data model and after the new data model?
How they have loaded the data into cassadnra ?
Performance test after and before migartion etc.

Thanks
Surbhi

On 23 July 2014 08:51, Shane Hansen shanemhan...@gmail.com wrote:

 There's lots of info on migrating from a relational database to Cassandra
 here:
 http://www.datastax.com/relational-database-to-nosql



 On Tue, Jul 22, 2014 at 7:45 PM, Surbhi Gupta surbhi.gupt...@gmail.com
 wrote:

 Hi,

 Does anybody has the case study for Migrating from RDBMS to Cassandra ?

 Thanks





Re: All writes fail with ONE consistency level when adding second node to cluster?

2014-07-22 Thread Andrew
I looked into this; ONE means it must be written to one replica—i.e., a node 
the data is supposed to be written to.  ANY means a hinted handoff will 
“count”.  So as long as it writes to any node on the cluster—even one that it’s 
not supposed to be on—it will be a success.  Good to know.

Andrew

On July 22, 2014 at 8:13:57 PM, graham sanderson (gra...@vast.com) wrote:

Incorrect, ONE does not refer to the number of “other nodes, it just refers to 
the number of nodes. so ONE under normal circumstances would only require one 
node to acknowledge the write.

The confusing error message you are getting is related to 
https://issues.apache.org/jira/browse/CASSANDRA-833… Kevin you are correct in 
that normally that error message would make no sense.

I don’t have much experience adding/removing nodes, but I think what is 
happening is that your new node is in the middle of taken over ownership of a 
token range - while that happens C* is trying to write to both the old owner 
(your original node), AND (hence the 2 not 1 in the error message) the new 
owner (the new node) so that once the bootstrapping of the new node is 
complete, it is immediately safe to delete the (no longer owned data) from the 
old node. For whatever reason the write to the new node is timing out, causing 
the exception, and the error message is exposing the “2” which happens to be 
how many C* thinks it is waiting for at the time (i.e. how many it should be 
waiting for based on the consistency level (1) plus this extra node).


On Jul 22, 2014, at 9:46 PM, Andrew redmu...@gmail.com wrote:

ONE means write to one replica (in addition to the original).  If you want to 
write to any of them, use ANY.  Is that the right understanding?

http://www.datastax.com/docs/1.0/dml/data_consistency

Andrew

On July 22, 2014 at 7:43:43 PM, Kevin Burton (bur...@spinn3r.com) wrote:

I'm super confused by this.. and disturbed that this was my failure scenario :-(

I had one cassandra node for the alpha of my app… and now we're moving into 
beta… which means three replicas.

So I added the second node… but my app immediately broke with:

Cassandra timeout during write query at consistency ONE (2 replica were 
required but only 1 acknowledged the write)

… but that makes no sense… if I'm at ONE and I have one acknowledged write, why 
does it matter that the second one hasn't ack'd yet…

?

--

Founder/CEO Spinn3r.com
Location: San Francisco, CA
blog: http://burtonator.wordpress.com
… or check out my Google+ profile