Re: Cassandra data model right definition

2016-09-30 Thread Edward Capriolo
I can iterate over JSON data stored in mongo and present it as a table with
rows and columns. It does not make mongo a rowstore.

On Fri, Sep 30, 2016 at 9:16 PM, Edward Capriolo 
wrote:

> The problem with calling it a row store:
>
> https://en.wikipedia.org/wiki/Row_(database)
>
> In the context of a relational database
> , a *row*—also called
> a record  or
> tuple —represents a single,
> implicitly structured data  item in a
> table . In simple terms,
> a database table can be thought of as consisting of *rows* andcolumns
>  or fields
> .[1]
>  Each row in a
> table represents a set of related data, and every row in the table has the
> same structure.
>
> When you have static columns and rows with maps, and lists, it is hard to
> argue that every row has the same structure. Physically at the storage
> layer they do not have the same structure and logically when accessing the
> data they barely have the same structure, as the static column is just
> appearing inside each row it is actually not contained in.
>
> On Fri, Sep 30, 2016 at 4:47 PM, Jonathan Haddad 
> wrote:
>
>> +1000 to what Benedict says. I usually call it a "partitioned row store"
>> which usually needs some extra explanation but is more accurate than
>> "column family" or whatever other thrift era terminology people still use.
>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan  wrote:
>>
>>> I used to present Cassandra as a NoSQL datastore with "distributed"
>>> table. This definition is closer to CQL and has some academic background
>>> (distributed hash table).
>>>
>>>
>>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>>> bened...@apache.org> wrote:
>>>
 Cassandra is not a "wide column store" anymore.  It has a schema.  Only
 thrift users no longer think they have a schema (though they do), and
 thrift is being deprecated.

 I really wish everyone would kill the term "wide column store" with
 fire.  It seems to have never meant anything beyond "schema-less,
 row-oriented", and a "column store" means literally the opposite of this.

 Not only that, but people don't even seem to realise the term "column
 store" existed long before "wide column store" and the latter is often
 abbreviated to the former, as here: http://www.planetcassandra.org
 /what-is-nosql/

 Since it no longer applies, let's all agree as a community to forget
 this awful nomenclature ever existed.



 On 30 September 2016 at 18:09, Joaquin Casares <
 joaq...@thelastpickle.com> wrote:

> Hi Mehdi,
>
> I can help clarify a few things.
>
> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row
> can have 2 billion columns, but in practice it shouldn't have more than 
> 100
> million columns.
>
> Cassandra partitions data to certain nodes based on the partition
> key(s), but does provide the option of setting zero or more clustering
> keys. Together, the partition key(s) and clustering key(s) form the 
> primary
> key.
>
> When writing to Cassandra, you will need to provide the full primary
> key, however, when reading from Cassandra, you only need to provide the
> full partition key.
>
> When you only provide the partition key for a read operation, you're
> able to return all columns that exist on that partition with low latency.
> These columns are displayed as "CQL rows" to make it easier to reason 
> about.
>
> Consider the schema:
>
> CREATE TABLE foo (
>   bar uuid,
>
>   boz uuid,
>
>   baz timeuuid,
>   data1 text,
>
>   data2 text,
>
>   PRIMARY KEY ((bar, boz), baz)
>
> );
>
>
> When you write to Cassandra you will need to send bar, boz, and baz
> and optionally data*, if it's relevant for that CQL row. If you chose not
> to define a data* field for a particular CQL row, then nothing is stored
> nor allocated on disk. But I wouldn't consider that caveat to be
> "schema-less".
>
> However, all writes to the same bar/boz will end up on the same
> Cassandra replica set (a configurable number of nodes) and be stored on 
> the
> same place(s) on disk within the SSTable(s). And on disk, each field 
> that's
> not a partition key is stored as a column, including clustering keys (this
> is optimized in Cassandra 3+, but now we're getting deep into internals).
>
> In this way you can get fast 

Row cache not working

2016-09-30 Thread Abhinav Solan
Hi Everyone,

My table looks like this -
CREATE TABLE test.reads (
svc_pt_id bigint,
meas_type_id bigint,
flags bigint,
read_time timestamp,
value double,
PRIMARY KEY ((svc_pt_id, meas_type_id))
) WITH bloom_filter_fp_chance = 0.1
AND caching = {'keys': 'ALL', 'rows_per_partition': '10'}
AND comment = ''
AND compaction = {'class':
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';

Have set up the C* nodes with
row_cache_size_in_mb: 1024
row_cache_save_period: 14400

and I am making this query
select svc_pt_id, meas_type_id, read_time, value FROM
cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id =
146;

with tracing on every time it says Row cache miss

activity

   | timestamp  | source  | source_elapsed
---++-+

Execute
CQL3 query | 2016-09-30 18:15:00.446000 |  192.168.199.75 |  0
 Parsing select svc_pt_id, meas_type_id, read_time, value FROM
cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id =
146; [SharedPool-Worker-1] | 2016-09-30 18:15:00.446000 |  192.168.199.75 |
   111

 Preparing statement
[SharedPool-Worker-1] | 2016-09-30 18:15:00.446000 |  192.168.199.75 |
   209

  reading data from /192.168.170.186
[SharedPool-Worker-1] | 2016-09-30 18:15:00.446001 |  192.168.199.75 |
   370

 Sending READ message to /192.168.170.186 [MessagingService-Outgoing-/
192.168.170.186] | 2016-09-30 18:15:00.446001 |  192.168.199.75 |
 450

REQUEST_RESPONSE message received from /192.168.170.186
[MessagingService-Incoming-/192.168.170.186] | 2016-09-30 18:15:00.448000 |
 192.168.199.75 |   2469

   Processing response from /192.168.170.186
[SharedPool-Worker-8] | 2016-09-30 18:15:00.448000 |  192.168.199.75 |
  2609

READ message received from /192.168.199.75 [MessagingService-Incoming-/
192.168.199.75] | 2016-09-30 18:15:00.449000 | 192.168.170.186 |
  75

  Row cache miss
[SharedPool-Worker-2] | 2016-09-30 18:15:00.449000 | 192.168.170.186 |
   218
  Fetching data but not
populating cache as query does not query from the start of the partition
[SharedPool-Worker-2] | 2016-09-30 18:15:00.449000 | 192.168.170.186 |
   246

  Executing single-partition query on cts_svc_pt_latest_int_read
[SharedPool-Worker-2] | 2016-09-30 18:15:00.449000 | 192.168.170.186 |
   259

Acquiring sstable references
[SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
   281

   Merging memtable contents
[SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
   295

 Merging data from sstable 8
[SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
   326

 Key cache hit for sstable 8
[SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
   351

 Merging data from sstable 7
[SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
   439

 Key cache hit for sstable 7
[SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
   468

   Read 1 live and 0 tombstone cells
[SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
   615

   Enqueuing response to /192.168.199.75
[SharedPool-Worker-2] | 2016-09-30 18:15:00.449002 | 192.168.170.186 |
   766
   Sending
REQUEST_RESPONSE message to /192.168.199.75 [MessagingService-Outgoing-/
192.168.199.75] | 2016-09-30 18:15:00.449002 | 192.168.170.186 |
 897


Request complete | 2016-09-30 18:15:00.44 |  192.168.199.75 |
2888

Can please anyone tell me what I am doing wrong?

Thanks,
Abhinav


Re: Cassandra data model right definition

2016-09-30 Thread Edward Capriolo
The problem with calling it a row store:

https://en.wikipedia.org/wiki/Row_(database)

In the context of a relational database
, a *row*—also called a
record  or tuple
—represents a single, implicitly
structured data  item in a table
. In simple terms, a
database table can be thought of as consisting of *rows* andcolumns
 or fields
.[1]
 Each row in a
table represents a set of related data, and every row in the table has the
same structure.

When you have static columns and rows with maps, and lists, it is hard to
argue that every row has the same structure. Physically at the storage
layer they do not have the same structure and logically when accessing the
data they barely have the same structure, as the static column is just
appearing inside each row it is actually not contained in.

On Fri, Sep 30, 2016 at 4:47 PM, Jonathan Haddad  wrote:

> +1000 to what Benedict says. I usually call it a "partitioned row store"
> which usually needs some extra explanation but is more accurate than
> "column family" or whatever other thrift era terminology people still use.
> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan  wrote:
>
>> I used to present Cassandra as a NoSQL datastore with "distributed"
>> table. This definition is closer to CQL and has some academic background
>> (distributed hash table).
>>
>>
>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>> bened...@apache.org> wrote:
>>
>>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>>> thrift users no longer think they have a schema (though they do), and
>>> thrift is being deprecated.
>>>
>>> I really wish everyone would kill the term "wide column store" with
>>> fire.  It seems to have never meant anything beyond "schema-less,
>>> row-oriented", and a "column store" means literally the opposite of this.
>>>
>>> Not only that, but people don't even seem to realise the term "column
>>> store" existed long before "wide column store" and the latter is often
>>> abbreviated to the former, as here: http://www.planetcassandra.
>>> org/what-is-nosql/
>>>
>>> Since it no longer applies, let's all agree as a community to forget
>>> this awful nomenclature ever existed.
>>>
>>>
>>>
>>> On 30 September 2016 at 18:09, Joaquin Casares <
>>> joaq...@thelastpickle.com> wrote:
>>>
 Hi Mehdi,

 I can help clarify a few things.

 As Carlos said, Cassandra is a Wide Column Store. Theoretically a row
 can have 2 billion columns, but in practice it shouldn't have more than 100
 million columns.

 Cassandra partitions data to certain nodes based on the partition
 key(s), but does provide the option of setting zero or more clustering
 keys. Together, the partition key(s) and clustering key(s) form the primary
 key.

 When writing to Cassandra, you will need to provide the full primary
 key, however, when reading from Cassandra, you only need to provide the
 full partition key.

 When you only provide the partition key for a read operation, you're
 able to return all columns that exist on that partition with low latency.
 These columns are displayed as "CQL rows" to make it easier to reason 
 about.

 Consider the schema:

 CREATE TABLE foo (
   bar uuid,

   boz uuid,

   baz timeuuid,
   data1 text,

   data2 text,

   PRIMARY KEY ((bar, boz), baz)

 );


 When you write to Cassandra you will need to send bar, boz, and baz and
 optionally data*, if it's relevant for that CQL row. If you chose not to
 define a data* field for a particular CQL row, then nothing is stored nor
 allocated on disk. But I wouldn't consider that caveat to be "schema-less".

 However, all writes to the same bar/boz will end up on the same
 Cassandra replica set (a configurable number of nodes) and be stored on the
 same place(s) on disk within the SSTable(s). And on disk, each field that's
 not a partition key is stored as a column, including clustering keys (this
 is optimized in Cassandra 3+, but now we're getting deep into internals).

 In this way you can get fast responses for all activity for bar/boz
 either over time, or for a specific time, with roughly the same number of
 disk seeks, with varying lengths on the disk scans.

 Hope that helps!

 Joaquin Casares
 Consultant
 Austin, TX

 Apache Cassandra Consulting
 http://www.thelastpickle.com

 On Fri, Sep 30, 2016 at 11:40 AM, Carlos 

Replacing a dead node in a live Cassandra Cluster

2016-09-30 Thread Rajath Subramanyam
Hello Cassandra-users,

I was running some tests today. My end goal was to learn more about
replacing a dead node in a live Cassandra cluster with minimal disruption
to the existing cluster and figure out a better and faster way of doing the
same.

I am running a package installation of the following version of Cassandra.

[centos@rj-cassandra-1 testcf-97896450869d11e6a84c4381bf5c5035]$ nodetool
version
ReleaseVersion: 2.1.12

I setup a 4 node Cassandra in the lab. I got one non-seed node (lets say
node1) down by issuing a 'sudo service cassandra stop'. Then following
following instructions from this link
,
I tried to replace node1 with the JMX
option -Dcassandra.replace_address=. However, when I
do this the bootstrap fails with the following error in the log:

ERROR [main] 2016-09-30 23:54:17,104 CassandraDaemon.java:579 - Exception
encountered during startup
java.lang.RuntimeException: Unable to gossip with any seeds
at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1337)
~[apache-cassandra-2.1.12.jar:2.1.12]
at
org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:512)
~[apache-cassandra-2.1.12.jar:2.1.12]
at
org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:783)
~[apache-cassandra-2.1.12.jar:2.1.12]
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:721)
~[apache-cassandra-2.1.12.jar:2.1.12]
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:612)
~[apache-cassandra-2.1.12.jar:2.1.12]
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:387)
[apache-cassandra-2.1.12.jar:2.1.12]
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:562)
[apache-cassandra-2.1.12.jar:2.1.12]
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:651)
[apache-cassandra-2.1.12.jar:2.1.12]
WARN  [StorageServiceShutdownHook] 2016-09-30 23:54:17,109
Gossiper.java:1454 - No local state or state is in silent shutdown, not
announcing shutdown
INFO  [StorageServiceShutdownHook] 2016-09-30 23:54:17,109
MessagingService.java:734 - Waiting for messaging service to quiesce
INFO  [ACCEPT-/10.7.0.232] 2016-09-30 23:54:17,110
MessagingService.java:1018 - MessagingService has terminated the accept()
thread

How do I recover from this error message ?


Rajath Subramanyam


Re: Cassandra data model right definition

2016-09-30 Thread Russell Bradberry
I agree 100%, this misunderstanding really bothers me as well.  I like the term 
“Partitioned Row Store” even though I am guilty of using the legacy 
“Column-Family Store” from darker times.  Even databases like Scylla which is 
supposed to be an Apache Cassandra clone tout themselves as a column-store, 
which is just utterly backwards as you mentioned.

 

From: Benedict Elliott Smith 
Reply-To: 
Date: Friday, September 30, 2016 at 5:12 PM
To: 
Subject: Re: Cassandra data model right definition

 

Absolutely.  A "partitioned row store" is exactly what I would call it.  As it 
happens, our README thinks the same, which is fantastic.  

 

I thought I'd take a look at the rest of our cohort, and didn't get far before 
disappointment.  HBase literally calls itself a "column-oriented store" - which 
is so totally wrong it's simultaneously hilarious and tragic.  

 

I guess we can't blame the wider internet for misunderstanding/misnaming us 
poor "wide column stores" if even one of the major examples doesn't know what 
it, itself, is!

 

 

 

 

On 30 September 2016 at 21:47, Jonathan Haddad  wrote:

+1000 to what Benedict says. I usually call it a "partitioned row store" which 
usually needs some extra explanation but is more accurate than "column family" 
or whatever other thrift era terminology people still use. 

On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan  wrote:

I used to present Cassandra as a NoSQL datastore with "distributed" table. This 
definition is closer to CQL and has some academic background (distributed hash 
table).

 

 

On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith  
wrote:

Cassandra is not a "wide column store" anymore.  It has a schema.  Only thrift 
users no longer think they have a schema (though they do), and thrift is being 
deprecated.

 

I really wish everyone would kill the term "wide column store" with fire.  It 
seems to have never meant anything beyond "schema-less, row-oriented", and a 
"column store" means literally the opposite of this.

 

Not only that, but people don't even seem to realise the term "column store" 
existed long before "wide column store" and the latter is often abbreviated to 
the former, as here: http://www.planetcassandra.org/what-is-nosql/ 

 

Since it no longer applies, let's all agree as a community to forget this awful 
nomenclature ever existed.

 

 

 

On 30 September 2016 at 18:09, Joaquin Casares  
wrote:

Hi Mehdi,

 

I can help clarify a few things.

 

As Carlos said, Cassandra is a Wide Column Store. Theoretically a row can have 
2 billion columns, but in practice it shouldn't have more than 100 million 
columns.

 

Cassandra partitions data to certain nodes based on the partition key(s), but 
does provide the option of setting zero or more clustering keys. Together, the 
partition key(s) and clustering key(s) form the primary key.

 

When writing to Cassandra, you will need to provide the full primary key, 
however, when reading from Cassandra, you only need to provide the full 
partition key.

 

When you only provide the partition key for a read operation, you're able to 
return all columns that exist on that partition with low latency. These columns 
are displayed as "CQL rows" to make it easier to reason about.

 

Consider the schema:

 

CREATE TABLE foo (

  bar uuid,

  boz uuid,

  baz timeuuid,

  data1 text,

  data2 text,

  PRIMARY KEY ((bar, boz), baz)

);

 

When you write to Cassandra you will need to send bar, boz, and baz and 
optionally data*, if it's relevant for that CQL row. If you chose not to define 
a data* field for a particular CQL row, then nothing is stored nor allocated on 
disk. But I wouldn't consider that caveat to be "schema-less".

 

However, all writes to the same bar/boz will end up on the same Cassandra 
replica set (a configurable number of nodes) and be stored on the same place(s) 
on disk within the SSTable(s). And on disk, each field that's not a partition 
key is stored as a column, including clustering keys (this is optimized in 
Cassandra 3+, but now we're getting deep into internals).

 

In this way you can get fast responses for all activity for bar/boz either over 
time, or for a specific time, with roughly the same number of disk seeks, with 
varying lengths on the disk scans.

 

Hope that helps!


Joaquin Casares

Consultant

Austin, TX

 

Apache Cassandra Consulting

http://www.thelastpickle.com

 

On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso  wrote:

Cassandra is a Wide Column Store http://db-engines.com/en/system/Cassandra


Carlos Alonso | Software Engineer | @calonso

 

On 30 September 2016 at 18:24, Mehdi Bada  wrote:

Hi all, 

 

I have a theoritical question: 

- Is Apache Cassandra really a column store?

Column store mean storing the data 

Re: Cassandra data model right definition

2016-09-30 Thread Benedict Elliott Smith
Absolutely.  A "partitioned row store" is exactly what I would call it.  As
it happens, our README thinks the same, which is fantastic.

I thought I'd take a look at the rest of our cohort, and didn't get far
before disappointment.  HBase literally calls itself a
"*column-oriented* store"
- which is so totally wrong it's simultaneously hilarious and tragic.

I guess we can't blame the wider internet for misunderstanding/misnaming us
poor "wide column stores" if even one of the major examples doesn't know
what it, itself, is!




On 30 September 2016 at 21:47, Jonathan Haddad  wrote:

> +1000 to what Benedict says. I usually call it a "partitioned row store"
> which usually needs some extra explanation but is more accurate than
> "column family" or whatever other thrift era terminology people still use.
> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan  wrote:
>
>> I used to present Cassandra as a NoSQL datastore with "distributed"
>> table. This definition is closer to CQL and has some academic background
>> (distributed hash table).
>>
>>
>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>> bened...@apache.org> wrote:
>>
>>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>>> thrift users no longer think they have a schema (though they do), and
>>> thrift is being deprecated.
>>>
>>> I really wish everyone would kill the term "wide column store" with
>>> fire.  It seems to have never meant anything beyond "schema-less,
>>> row-oriented", and a "column store" means literally the opposite of this.
>>>
>>> Not only that, but people don't even seem to realise the term "column
>>> store" existed long before "wide column store" and the latter is often
>>> abbreviated to the former, as here: http://www.planetcassandra.
>>> org/what-is-nosql/
>>>
>>> Since it no longer applies, let's all agree as a community to forget
>>> this awful nomenclature ever existed.
>>>
>>>
>>>
>>> On 30 September 2016 at 18:09, Joaquin Casares <
>>> joaq...@thelastpickle.com> wrote:
>>>
 Hi Mehdi,

 I can help clarify a few things.

 As Carlos said, Cassandra is a Wide Column Store. Theoretically a row
 can have 2 billion columns, but in practice it shouldn't have more than 100
 million columns.

 Cassandra partitions data to certain nodes based on the partition
 key(s), but does provide the option of setting zero or more clustering
 keys. Together, the partition key(s) and clustering key(s) form the primary
 key.

 When writing to Cassandra, you will need to provide the full primary
 key, however, when reading from Cassandra, you only need to provide the
 full partition key.

 When you only provide the partition key for a read operation, you're
 able to return all columns that exist on that partition with low latency.
 These columns are displayed as "CQL rows" to make it easier to reason 
 about.

 Consider the schema:

 CREATE TABLE foo (
   bar uuid,

   boz uuid,

   baz timeuuid,
   data1 text,

   data2 text,

   PRIMARY KEY ((bar, boz), baz)

 );


 When you write to Cassandra you will need to send bar, boz, and baz and
 optionally data*, if it's relevant for that CQL row. If you chose not to
 define a data* field for a particular CQL row, then nothing is stored nor
 allocated on disk. But I wouldn't consider that caveat to be "schema-less".

 However, all writes to the same bar/boz will end up on the same
 Cassandra replica set (a configurable number of nodes) and be stored on the
 same place(s) on disk within the SSTable(s). And on disk, each field that's
 not a partition key is stored as a column, including clustering keys (this
 is optimized in Cassandra 3+, but now we're getting deep into internals).

 In this way you can get fast responses for all activity for bar/boz
 either over time, or for a specific time, with roughly the same number of
 disk seeks, with varying lengths on the disk scans.

 Hope that helps!

 Joaquin Casares
 Consultant
 Austin, TX

 Apache Cassandra Consulting
 http://www.thelastpickle.com

 On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso 
 wrote:

> Cassandra is a Wide Column Store http://db-engines.com/
> en/system/Cassandra
>
> Carlos Alonso | Software Engineer | @calonso
> 
>
> On 30 September 2016 at 18:24, Mehdi Bada  > wrote:
>
>> Hi all,
>>
>> I have a theoritical question:
>> - Is Apache Cassandra really a column store?
>> Column store mean storing the data as column rather than as a rows.
>>
>> In fact C* store the data as row, and data is partionned with row key.
>>
>> Finally, for me, Cassandra is a row 

Re: Cassandra data model right definition

2016-09-30 Thread Jonathan Haddad
+1000 to what Benedict says. I usually call it a "partitioned row store"
which usually needs some extra explanation but is more accurate than
"column family" or whatever other thrift era terminology people still use.
On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan  wrote:

> I used to present Cassandra as a NoSQL datastore with "distributed" table.
> This definition is closer to CQL and has some academic background
> (distributed hash table).
>
>
> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
> bened...@apache.org> wrote:
>
>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>> thrift users no longer think they have a schema (though they do), and
>> thrift is being deprecated.
>>
>> I really wish everyone would kill the term "wide column store" with
>> fire.  It seems to have never meant anything beyond "schema-less,
>> row-oriented", and a "column store" means literally the opposite of this.
>>
>> Not only that, but people don't even seem to realise the term "column
>> store" existed long before "wide column store" and the latter is often
>> abbreviated to the former, as here:
>> http://www.planetcassandra.org/what-is-nosql/
>>
>> Since it no longer applies, let's all agree as a community to forget this
>> awful nomenclature ever existed.
>>
>>
>>
>> On 30 September 2016 at 18:09, Joaquin Casares > > wrote:
>>
>>> Hi Mehdi,
>>>
>>> I can help clarify a few things.
>>>
>>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row
>>> can have 2 billion columns, but in practice it shouldn't have more than 100
>>> million columns.
>>>
>>> Cassandra partitions data to certain nodes based on the partition
>>> key(s), but does provide the option of setting zero or more clustering
>>> keys. Together, the partition key(s) and clustering key(s) form the primary
>>> key.
>>>
>>> When writing to Cassandra, you will need to provide the full primary
>>> key, however, when reading from Cassandra, you only need to provide the
>>> full partition key.
>>>
>>> When you only provide the partition key for a read operation, you're
>>> able to return all columns that exist on that partition with low latency.
>>> These columns are displayed as "CQL rows" to make it easier to reason about.
>>>
>>> Consider the schema:
>>>
>>> CREATE TABLE foo (
>>>   bar uuid,
>>>
>>>   boz uuid,
>>>
>>>   baz timeuuid,
>>>   data1 text,
>>>
>>>   data2 text,
>>>
>>>   PRIMARY KEY ((bar, boz), baz)
>>>
>>> );
>>>
>>>
>>> When you write to Cassandra you will need to send bar, boz, and baz and
>>> optionally data*, if it's relevant for that CQL row. If you chose not to
>>> define a data* field for a particular CQL row, then nothing is stored nor
>>> allocated on disk. But I wouldn't consider that caveat to be "schema-less".
>>>
>>> However, all writes to the same bar/boz will end up on the same
>>> Cassandra replica set (a configurable number of nodes) and be stored on the
>>> same place(s) on disk within the SSTable(s). And on disk, each field that's
>>> not a partition key is stored as a column, including clustering keys (this
>>> is optimized in Cassandra 3+, but now we're getting deep into internals).
>>>
>>> In this way you can get fast responses for all activity for bar/boz
>>> either over time, or for a specific time, with roughly the same number of
>>> disk seeks, with varying lengths on the disk scans.
>>>
>>> Hope that helps!
>>>
>>> Joaquin Casares
>>> Consultant
>>> Austin, TX
>>>
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso 
>>> wrote:
>>>
 Cassandra is a Wide Column Store
 http://db-engines.com/en/system/Cassandra

 Carlos Alonso | Software Engineer | @calonso
 

 On 30 September 2016 at 18:24, Mehdi Bada 
 wrote:

> Hi all,
>
> I have a theoritical question:
> - Is Apache Cassandra really a column store?
> Column store mean storing the data as column rather than as a rows.
>
> In fact C* store the data as row, and data is partionned with row key.
>
> Finally, for me, Cassandra is a row oriented schema less DBMS Is
> it true for you also???
>
> Many thanks in advance for your reply
>
> Best Regards
> Mehdi Bada
> 
>
> *Mehdi Bada* | Consultant
> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422
> 96 15
> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
> mehdi.b...@dbi-services.com
> www.dbi-services.com
>
>
>
>
> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
> team
> *
>


>>>
>>
>


Re: Cassandra data model right definition

2016-09-30 Thread Edward Capriolo
Then:
Physically: A data store which physically structured-log-merge of SSTables
(see) https://cloud.google.com/bigtable/.
Now:
One of the change made in Apache Cassandra 3.0 is a relatively
important refactor
of the storage engine .
I say refactor because the basics have not changed: data is still inserted
in a memtable which get flushed over time to a sstable with compaction
baby-sitting the set of sstables on disk, and reads uses both memtable and
sstables to retrieve results. But the internal structure of the objects
manipulated in those phases has changed, and that entails a significant
amount of refactoring in the code. The principal motivation is that new
storage engine more directly manipulate the structure that is exposed
through CQL, and knowing that structure at the storage engine level has
many advantages: some features are easier to add and the engine has more
information to optimize.

http://www.datastax.com/2015/12/storage-engine-30

Then:
An RPC abstraction over he data with methods like get_slice which selected
columns from a single 'row key'
Now:
A Query based abstraction over the data with queries like SELECT * FROM
table WHERE x=y in which most language features works over single
'partitions'

And 3? implementations of secondary index like things:
Secondary Indexes
Materialized Views
SasiIndex

Which add to query functionality typically by storing an index (or
secondary form) in a way optimized for given query functionality.






On Fri, Sep 30, 2016 at 1:52 PM, DuyHai Doan  wrote:

> I used to present Cassandra as a NoSQL datastore with "distributed" table.
> This definition is closer to CQL and has some academic background
> (distributed hash table).
>
>
> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
> bened...@apache.org> wrote:
>
>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>> thrift users no longer think they have a schema (though they do), and
>> thrift is being deprecated.
>>
>> I really wish everyone would kill the term "wide column store" with
>> fire.  It seems to have never meant anything beyond "schema-less,
>> row-oriented", and a "column store" means literally the opposite of this.
>>
>> Not only that, but people don't even seem to realise the term "column
>> store" existed long before "wide column store" and the latter is often
>> abbreviated to the former, as here: http://www.planetcassandra.org
>> /what-is-nosql/
>>
>> Since it no longer applies, let's all agree as a community to forget this
>> awful nomenclature ever existed.
>>
>>
>>
>> On 30 September 2016 at 18:09, Joaquin Casares > > wrote:
>>
>>> Hi Mehdi,
>>>
>>> I can help clarify a few things.
>>>
>>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row
>>> can have 2 billion columns, but in practice it shouldn't have more than 100
>>> million columns.
>>>
>>> Cassandra partitions data to certain nodes based on the partition
>>> key(s), but does provide the option of setting zero or more clustering
>>> keys. Together, the partition key(s) and clustering key(s) form the primary
>>> key.
>>>
>>> When writing to Cassandra, you will need to provide the full primary
>>> key, however, when reading from Cassandra, you only need to provide the
>>> full partition key.
>>>
>>> When you only provide the partition key for a read operation, you're
>>> able to return all columns that exist on that partition with low latency.
>>> These columns are displayed as "CQL rows" to make it easier to reason about.
>>>
>>> Consider the schema:
>>>
>>> CREATE TABLE foo (
>>>   bar uuid,
>>>
>>>   boz uuid,
>>>
>>>   baz timeuuid,
>>>   data1 text,
>>>
>>>   data2 text,
>>>
>>>   PRIMARY KEY ((bar, boz), baz)
>>>
>>> );
>>>
>>>
>>> When you write to Cassandra you will need to send bar, boz, and baz and
>>> optionally data*, if it's relevant for that CQL row. If you chose not to
>>> define a data* field for a particular CQL row, then nothing is stored nor
>>> allocated on disk. But I wouldn't consider that caveat to be "schema-less".
>>>
>>> However, all writes to the same bar/boz will end up on the same
>>> Cassandra replica set (a configurable number of nodes) and be stored on the
>>> same place(s) on disk within the SSTable(s). And on disk, each field that's
>>> not a partition key is stored as a column, including clustering keys (this
>>> is optimized in Cassandra 3+, but now we're getting deep into internals).
>>>
>>> In this way you can get fast responses for all activity for bar/boz
>>> either over time, or for a specific time, with roughly the same number of
>>> disk seeks, with varying lengths on the disk scans.
>>>
>>> Hope that helps!
>>>
>>> Joaquin Casares
>>> Consultant
>>> Austin, TX
>>>
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso 
>>> wrote:
>>>
 Cassandra is a 

Re: Way to write to dc1 but keep data only in dc2

2016-09-30 Thread Dorian Hoxha
Thanks Edward. Looks like it's not possible what I really wanted (to use
some kind of a quorum write ex).

Note that the queue is ordered, but I need just so they eventually happen,
but with more consistency than ANY (2 nodes or more).

On Fri, Sep 30, 2016 at 12:25 AM, Edward Capriolo 
wrote:

> You can do something like this, though your use of terminology like
> "queue" really do not apply.
>
> You can setup your keyspace with replication in only one data center.
>
> CREATE KEYSPACE NTSkeyspace WITH REPLICATION = { 'class' : 
> 'NetworkTopologyStrategy', 'dc2' : 3 };
>
> This will make the NTSkeyspace like only in one data center. You can
> always write to any Cassandra node, since they will transparently proxy the
> writes to the proper place. You can configure your client to ONLY bind to
> specific hosts or data centers/hosts DC1.
>
> You can use a write consistency level like ANY. IF you use a consistency
> level like ONE. It will cause the the write to block anyway waiting for
> completion on the other datacenter.
>
> Since you mentioned the words "like a queue" I would suggest an
> alternative is to writing the data do a distributed commit log like kafka.
> At that point you can decouple the write systems either through producer
> consumer or through a tool like Kafka's mirror maker.
>
>
> On Thu, Sep 29, 2016 at 5:24 PM, Dorian Hoxha 
> wrote:
>
>> I have dc1 and dc2.
>> I want to keep a keyspace only on dc2.
>> But I only have my app on dc1.
>> And I want to write to dc1 (lower latency) which will not keep data
>> locally but just push it to dc2.
>> While reading will only work for dc2.
>> Since my app is mostly write, my app ~will be faster while not having to
>> deploy to the app to dc2 or write directly to dc2 with higher latency.
>>
>> dc1 would act like a queue or something and just push data + delete
>> locally.
>>
>> Does this make sense ?
>>
>> Thank You
>>
>
>


[ANNOUNCE] Achilles 5.1.0

2016-09-30 Thread DuyHai Doan
Hello C* users

I'm happy to announce the release of Achilles 5.1.0, the first mapper which
is Cassandra-version aware e.g. it only generates source code that
corresponds to the features supported by your C* version.

- C* 2.1: base version
- C* 2.2: UDF, UDA, JSON syntax
- C* 3.0: materialized view,  single-column and multi-column slice
restrictions for DELETE
- C* 3.2: support for type casting in SELECT clause
- C* 3.6: PER PARTITION LIMIT, unfrozen UDT (1st level only)
- C* 3.7: support for SASI (stable version)
- DSE 4.8.x, DSE 5.0.x: support for DSE Search

All the details are in the the wiki:
https://github.com/doanduyhai/Achilles/wiki/Compile-Time-Config


Re: Difference in token range count

2016-09-30 Thread laxmikanth sadula
Hi Eric,

Thanks for the reply...
RF=3 for all DCs...

On Fri, Sep 30, 2016 at 9:57 PM, Eric Stevens  wrote:

> What is your replication factor in this DC?
>
> On Fri, Sep 30, 2016 at 8:08 AM techpyaasa .  wrote:
>
>> Hi ,
>>
>> We have c*-2.0.17  with 3 data centers . Each data center has 9 nodes. 
>> vnodes enabled in all nodes.
>>
>> When I ran -local repair(./nodetool -local repair keyspace_name1 
>> columnfamily_1) on one of data center I saw following print
>>
>> "Starting repair command #3, repairing *2647 ranges* for keyspace 
>> keyspace_name1"
>>
>> The count of ranges , it is supposed to be *2304*(256*9) as we have 9 nodes 
>> in one data center right but why it is showing as 2647 ranges ??
>>
>> Can someone please clarify why this difference in token ranges count?
>>
>> Thanks
>> techpyaasa
>>
>>


-- 
Regards,
Laxmikanth
99621 38051


Re: Cassandra data model right definition

2016-09-30 Thread DuyHai Doan
I used to present Cassandra as a NoSQL datastore with "distributed" table.
This definition is closer to CQL and has some academic background
(distributed hash table).


On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith  wrote:

> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
> thrift users no longer think they have a schema (though they do), and
> thrift is being deprecated.
>
> I really wish everyone would kill the term "wide column store" with fire.
> It seems to have never meant anything beyond "schema-less, row-oriented",
> and a "column store" means literally the opposite of this.
>
> Not only that, but people don't even seem to realise the term "column
> store" existed long before "wide column store" and the latter is often
> abbreviated to the former, as here: http://www.planetcassandra.
> org/what-is-nosql/
>
> Since it no longer applies, let's all agree as a community to forget this
> awful nomenclature ever existed.
>
>
>
> On 30 September 2016 at 18:09, Joaquin Casares 
> wrote:
>
>> Hi Mehdi,
>>
>> I can help clarify a few things.
>>
>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row can
>> have 2 billion columns, but in practice it shouldn't have more than 100
>> million columns.
>>
>> Cassandra partitions data to certain nodes based on the partition key(s),
>> but does provide the option of setting zero or more clustering keys.
>> Together, the partition key(s) and clustering key(s) form the primary key.
>>
>> When writing to Cassandra, you will need to provide the full primary key,
>> however, when reading from Cassandra, you only need to provide the full
>> partition key.
>>
>> When you only provide the partition key for a read operation, you're able
>> to return all columns that exist on that partition with low latency. These
>> columns are displayed as "CQL rows" to make it easier to reason about.
>>
>> Consider the schema:
>>
>> CREATE TABLE foo (
>>   bar uuid,
>>
>>   boz uuid,
>>
>>   baz timeuuid,
>>   data1 text,
>>
>>   data2 text,
>>
>>   PRIMARY KEY ((bar, boz), baz)
>>
>> );
>>
>>
>> When you write to Cassandra you will need to send bar, boz, and baz and
>> optionally data*, if it's relevant for that CQL row. If you chose not to
>> define a data* field for a particular CQL row, then nothing is stored nor
>> allocated on disk. But I wouldn't consider that caveat to be "schema-less".
>>
>> However, all writes to the same bar/boz will end up on the same Cassandra
>> replica set (a configurable number of nodes) and be stored on the same
>> place(s) on disk within the SSTable(s). And on disk, each field that's not
>> a partition key is stored as a column, including clustering keys (this is
>> optimized in Cassandra 3+, but now we're getting deep into internals).
>>
>> In this way you can get fast responses for all activity for bar/boz
>> either over time, or for a specific time, with roughly the same number of
>> disk seeks, with varying lengths on the disk scans.
>>
>> Hope that helps!
>>
>> Joaquin Casares
>> Consultant
>> Austin, TX
>>
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso 
>> wrote:
>>
>>> Cassandra is a Wide Column Store http://db-engines.com/en
>>> /system/Cassandra
>>>
>>> Carlos Alonso | Software Engineer | @calonso
>>> 
>>>
>>> On 30 September 2016 at 18:24, Mehdi Bada 
>>> wrote:
>>>
 Hi all,

 I have a theoritical question:
 - Is Apache Cassandra really a column store?
 Column store mean storing the data as column rather than as a rows.

 In fact C* store the data as row, and data is partionned with row key.

 Finally, for me, Cassandra is a row oriented schema less DBMS Is it
 true for you also???

 Many thanks in advance for your reply

 Best Regards
 Mehdi Bada
 

 *Mehdi Bada* | Consultant
 Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422
 96 15
 dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
 mehdi.b...@dbi-services.com
 www.dbi-services.com




 *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
 team
 *

>>>
>>>
>>
>


Re: Cassandra data model right definition

2016-09-30 Thread Benedict Elliott Smith
Cassandra is not a "wide column store" anymore.  It has a schema.  Only
thrift users no longer think they have a schema (though they do), and
thrift is being deprecated.

I really wish everyone would kill the term "wide column store" with fire.
It seems to have never meant anything beyond "schema-less, row-oriented",
and a "column store" means literally the opposite of this.

Not only that, but people don't even seem to realise the term "column
store" existed long before "wide column store" and the latter is often
abbreviated to the former, as here:
http://www.planetcassandra.org/what-is-nosql/

Since it no longer applies, let's all agree as a community to forget this
awful nomenclature ever existed.



On 30 September 2016 at 18:09, Joaquin Casares 
wrote:

> Hi Mehdi,
>
> I can help clarify a few things.
>
> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row can
> have 2 billion columns, but in practice it shouldn't have more than 100
> million columns.
>
> Cassandra partitions data to certain nodes based on the partition key(s),
> but does provide the option of setting zero or more clustering keys.
> Together, the partition key(s) and clustering key(s) form the primary key.
>
> When writing to Cassandra, you will need to provide the full primary key,
> however, when reading from Cassandra, you only need to provide the full
> partition key.
>
> When you only provide the partition key for a read operation, you're able
> to return all columns that exist on that partition with low latency. These
> columns are displayed as "CQL rows" to make it easier to reason about.
>
> Consider the schema:
>
> CREATE TABLE foo (
>   bar uuid,
>
>   boz uuid,
>
>   baz timeuuid,
>   data1 text,
>
>   data2 text,
>
>   PRIMARY KEY ((bar, boz), baz)
>
> );
>
>
> When you write to Cassandra you will need to send bar, boz, and baz and
> optionally data*, if it's relevant for that CQL row. If you chose not to
> define a data* field for a particular CQL row, then nothing is stored nor
> allocated on disk. But I wouldn't consider that caveat to be "schema-less".
>
> However, all writes to the same bar/boz will end up on the same Cassandra
> replica set (a configurable number of nodes) and be stored on the same
> place(s) on disk within the SSTable(s). And on disk, each field that's not
> a partition key is stored as a column, including clustering keys (this is
> optimized in Cassandra 3+, but now we're getting deep into internals).
>
> In this way you can get fast responses for all activity for bar/boz either
> over time, or for a specific time, with roughly the same number of disk
> seeks, with varying lengths on the disk scans.
>
> Hope that helps!
>
> Joaquin Casares
> Consultant
> Austin, TX
>
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso 
> wrote:
>
>> Cassandra is a Wide Column Store http://db-engines.com/en
>> /system/Cassandra
>>
>> Carlos Alonso | Software Engineer | @calonso
>> 
>>
>> On 30 September 2016 at 18:24, Mehdi Bada 
>> wrote:
>>
>>> Hi all,
>>>
>>> I have a theoritical question:
>>> - Is Apache Cassandra really a column store?
>>> Column store mean storing the data as column rather than as a rows.
>>>
>>> In fact C* store the data as row, and data is partionned with row key.
>>>
>>> Finally, for me, Cassandra is a row oriented schema less DBMS Is it
>>> true for you also???
>>>
>>> Many thanks in advance for your reply
>>>
>>> Best Regards
>>> Mehdi Bada
>>> 
>>>
>>> *Mehdi Bada* | Consultant
>>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96
>>> 15
>>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>>> mehdi.b...@dbi-services.com
>>> www.dbi-services.com
>>>
>>>
>>>
>>>
>>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
>>> team
>>> *
>>>
>>
>>
>


Re: Cassandra data model right definition

2016-09-30 Thread Joaquin Casares
Hi Mehdi,

I can help clarify a few things.

As Carlos said, Cassandra is a Wide Column Store. Theoretically a row can
have 2 billion columns, but in practice it shouldn't have more than 100
million columns.

Cassandra partitions data to certain nodes based on the partition key(s),
but does provide the option of setting zero or more clustering keys.
Together, the partition key(s) and clustering key(s) form the primary key.

When writing to Cassandra, you will need to provide the full primary key,
however, when reading from Cassandra, you only need to provide the full
partition key.

When you only provide the partition key for a read operation, you're able
to return all columns that exist on that partition with low latency. These
columns are displayed as "CQL rows" to make it easier to reason about.

Consider the schema:

CREATE TABLE foo (
  bar uuid,

  boz uuid,

  baz timeuuid,
  data1 text,

  data2 text,

  PRIMARY KEY ((bar, boz), baz)

);


When you write to Cassandra you will need to send bar, boz, and baz and
optionally data*, if it's relevant for that CQL row. If you chose not to
define a data* field for a particular CQL row, then nothing is stored nor
allocated on disk. But I wouldn't consider that caveat to be "schema-less".

However, all writes to the same bar/boz will end up on the same Cassandra
replica set (a configurable number of nodes) and be stored on the same
place(s) on disk within the SSTable(s). And on disk, each field that's not
a partition key is stored as a column, including clustering keys (this is
optimized in Cassandra 3+, but now we're getting deep into internals).

In this way you can get fast responses for all activity for bar/boz either
over time, or for a specific time, with roughly the same number of disk
seeks, with varying lengths on the disk scans.

Hope that helps!

Joaquin Casares
Consultant
Austin, TX

Apache Cassandra Consulting
http://www.thelastpickle.com

On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso  wrote:

> Cassandra is a Wide Column Store http://db-engines.com/en/system/Cassandra
>
> Carlos Alonso | Software Engineer | @calonso 
>
> On 30 September 2016 at 18:24, Mehdi Bada 
> wrote:
>
>> Hi all,
>>
>> I have a theoritical question:
>> - Is Apache Cassandra really a column store?
>> Column store mean storing the data as column rather than as a rows.
>>
>> In fact C* store the data as row, and data is partionned with row key.
>>
>> Finally, for me, Cassandra is a row oriented schema less DBMS Is it
>> true for you also???
>>
>> Many thanks in advance for your reply
>>
>> Best Regards
>> Mehdi Bada
>> 
>>
>> *Mehdi Bada* | Consultant
>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96
>> 15
>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>> mehdi.b...@dbi-services.com
>> www.dbi-services.com
>>
>>
>>
>>
>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
>> team
>> *
>>
>
>


Re: Cassandra data model right definition

2016-09-30 Thread Carlos Alonso
Cassandra is a Wide Column Store http://db-engines.com/en/system/Cassandra

Carlos Alonso | Software Engineer | @calonso 

On 30 September 2016 at 18:24, Mehdi Bada 
wrote:

> Hi all,
>
> I have a theoritical question:
> - Is Apache Cassandra really a column store?
> Column store mean storing the data as column rather than as a rows.
>
> In fact C* store the data as row, and data is partionned with row key.
>
> Finally, for me, Cassandra is a row oriented schema less DBMS Is it
> true for you also???
>
> Many thanks in advance for your reply
>
> Best Regards
> Mehdi Bada
> 
>
> *Mehdi Bada* | Consultant
> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15
> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
> mehdi.b...@dbi-services.com
> www.dbi-services.com
>
>
>
>
> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
> team
> *
>


Cassandra data model right definition

2016-09-30 Thread Mehdi Bada
Hi all, 

I have a theoritical question: 
- Is Apache Cassandra really a column store? 
Column store mean storing the data as column rather than as a rows. 

In fact C* store the data as row, and data is partionned with row key. 

Finally, for me, Cassandra is a row oriented schema less DBMS Is it true 
for you also??? 

Many thanks in advance for your reply 

Best Regards 
Mehdi Bada 
 

Mehdi Bada | Consultant 
Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15 
dbi services, Rue de la Jeunesse 2, CH-2800 Delémont 
mehdi.b...@dbi-services.com 
www.dbi-services.com 



⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the team 


Re: C* files getting stuck

2016-09-30 Thread Paul Fife
Hello Amit -

I can confirm that we also experienced this issue in 2.0.x and were not
able to find a solution other than a restart. Since upgrading to 2.2.x the
problem did disappear.

Thanks,
Paul Fife

On Fri, Sep 30, 2016 at 6:48 AM, Amit Singh F 
wrote:

> Hi All,
>
>
>
> Please check if anybody has faced below issue and if yes what best can be
> done to avoid this.?
>
> Thanks in advance.
>
>
>
> Regards
>
> Amit Singh
>
>
>
> *From:* Amit Singh F [mailto:amit.f.si...@ericsson.com
> ]
> *Sent:* Wednesday, June 29, 2016 3:52 PM
> *To:* user@cassandra.apache.org
> *Subject:* C* files getting stuck
>
>
>
> Hi All
>
> We are running Cassandra 2.0.14 and disk usage is very high. On
> investigating it further we found that there are around 4-5 files(~ 150 GB)
> in stuck mode.
>
> Command Fired : lsof /var/lib/cassandra | grep -i deleted
>
> Output :
>
> java 12158 cassandra 308r REG 8,16 34396638044 12727268
> /var/lib/cassandra/data/mykeyspace/mycolumnfamily/
> mykeyspace-mycolumnfamily-jb-16481-Data.db (deleted)
> java 12158 cassandra 327r REG 8,16 101982374806 12715102
> /var/lib/cassandra/data/mykeyspace/mycolumnfamily/
> mykeyspace-mycolumnfamily-jb-126861-Data.db (deleted)
> java 12158 cassandra 339r REG 8,16 12966304784 12714010
> /var/lib/cassandra/data/mykeyspace/mycolumnfamily/
> mykeyspace-mycolumnfamily-jb-213548-Data.db (deleted)
> java 12158 cassandra 379r REG 8,16 15323318036 12714957
> /var/lib/cassandra/data/mykeyspace/mycolumnfamily/
> mykeyspace-mycolumnfamily-jb-182936-Data.db (deleted)
>
> we are not able to see these files in any directory. This is somewhat
> similar to  https://issues.apache.org/jira/browse/CASSANDRA-6275 which is
> fixed but still issue is there on higher version. Also in logs no error
> related to compaction is reported.
>
> so could any one of you please provide any suggestion how to counter this.
> Restarting Cassandra is one solution but this issue keeps on occurring so
> we cannot restart production machine is not recommended so frequently.
>
> Also we know that this version is not supported but there is high
> probability that it can occur in higher version too.
>
> Regards
>
> Amit Singh
>


Difference in token range count

2016-09-30 Thread techpyaasa .
Hi ,

We have c*-2.0.17  with 3 data centers . Each data center has 9 nodes.
vnodes enabled in all nodes.

When I ran -local repair(./nodetool -local repair keyspace_name1
columnfamily_1) on one of data center I saw following print

"Starting repair command #3, repairing *2647 ranges* for keyspace
keyspace_name1"

The count of ranges , it is supposed to be *2304*(256*9) as we have 9
nodes in one data center right but why it is showing as 2647 ranges ??

Can someone please clarify why this difference in token ranges count?

Thanks
techpyaasa


RE: C* files getting stuck

2016-09-30 Thread Amit Singh F
Hi All,

Please check if anybody has faced below issue and if yes what best can be done 
to avoid this.?
Thanks in advance.

Regards
Amit Singh

From: Amit Singh F [mailto:amit.f.si...@ericsson.com]
Sent: Wednesday, June 29, 2016 3:52 PM
To: user@cassandra.apache.org
Subject: C* files getting stuck


Hi All

We are running Cassandra 2.0.14 and disk usage is very high. On investigating 
it further we found that there are around 4-5 files(~ 150 GB) in stuck mode.

Command Fired : lsof /var/lib/cassandra | grep -i deleted

Output :

java 12158 cassandra 308r REG 8,16 34396638044 12727268 
/var/lib/cassandra/data/mykeyspace/mycolumnfamily/mykeyspace-mycolumnfamily-jb-16481-Data.db
 (deleted)
java 12158 cassandra 327r REG 8,16 101982374806 12715102 
/var/lib/cassandra/data/mykeyspace/mycolumnfamily/mykeyspace-mycolumnfamily-jb-126861-Data.db
 (deleted)
java 12158 cassandra 339r REG 8,16 12966304784 12714010 
/var/lib/cassandra/data/mykeyspace/mycolumnfamily/mykeyspace-mycolumnfamily-jb-213548-Data.db
 (deleted)
java 12158 cassandra 379r REG 8,16 15323318036 12714957 
/var/lib/cassandra/data/mykeyspace/mycolumnfamily/mykeyspace-mycolumnfamily-jb-182936-Data.db
 (deleted)

we are not able to see these files in any directory. This is somewhat similar 
to  https://issues.apache.org/jira/browse/CASSANDRA-6275 which is fixed but 
still issue is there on higher version. Also in logs no error related to 
compaction is reported.

so could any one of you please provide any suggestion how to counter this. 
Restarting Cassandra is one solution but this issue keeps on occurring so we 
cannot restart production machine is not recommended so frequently.

Also we know that this version is not supported but there is high probability 
that it can occur in higher version too.
Regards
Amit Singh


Error while read after upgrade from 2.2.7 to 3.0.8

2016-09-30 Thread Oleg Krayushkin
Hi,

Since the upgrade from Cassandra version 2.2.7 to 3.0.8 We're getting
following error almost every several minutes on every node. For node at
173.170.147.120 error in system.log would be:

INFO  [SharedPool-Worker-4] 2016-09-30 10:26:39,068 Message.java:605
   - Unexpected exception during request; channel = [id: 0xfd64cd67,
/173.170.147.120:50660 :> /18.4.63.191:9042]
java.io.IOException: Error while read(...): Connection reset by peer
at io.netty.channel.epoll.Native.readAddress(Native Method)
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at 
io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.doReadBytes(EpollSocketChannel.java:675)
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at 
io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.epollInReady(EpollSocketChannel.java:714)
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at 
io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:326)
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:264)
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at 
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]

As far as I see, in all such errors there are always [id: <...>,
/: :>
/:.  Also broadcast_address and
listen_address are always belong to the current node adresses.

What are possible reasons of such errors and how can I fix it? Any thoughts
would be appreciated.


High CPU usage by cqlsh when network is disconnected on client

2016-09-30 Thread Bhuvan Rawal
Hi,

We are using Cassandra 3.6 and I have been facing this issue for a while.
When I connect to a cassandra cluster using cqlsh and disconnect the
network keeping cqlsh on, I get really high cpu utilization on client by
cqlsh python process. On network reconnect things return back to normal.


​On debugging a particular process with strace I get a lot of lines like:
[pid  8449] connect(4, {sa_family=AF_INET, sin_port=htons(9042),
sin_addr=inet_addr("10.20.34.11")}, 16) = -1 ENETUNREACH (Network is
unreachable)
[pid  8449] close(4)= 0
[pid  8449] futex(0x7f39a8001360, FUTEX_WAKE_PRIVATE, 1) = 1
[pid  5734] <... futex resumed> )   = 0
[pid  5734] futex(0x1956fb0,
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, 

[pid  8449] futex(0x1956fb0, FUTEX_WAKE_PRIVATE, 1) = 1
[pid  5734] <... futex resumed> )   = 0
[pid  5734] socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 4
[pid  5734] fcntl(4, F_GETFL)   = 0x2 (flags O_RDWR)
[pid  5734] fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid  5734] connect(4, {sa_family=AF_INET, sin_port=htons(9042),
sin_addr=inet_addr("10.20.34.11")}, 16) = -1 ENETUNREACH (Network is
unreachable)
[pid  5734] close(4)= 0
[pid  5734] socket(PF_INET, SOCK_STREAM, IPPROTO_TCP 
[pid  8449] futex(0x7f39a40aa390,
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, 



Shall I create a jira for the same?

Thanks & Regards,
Bhuvan