Re: CQL 3 and wide rows

2014-05-20 Thread Aaron Morton
In a CQL 3 table the only **column** names are the ones defined in the table, 
in the example below there are three column names. 


 CREATE TABLE keyspace.widerow (
 row_key text,
 wide_row_column text,
 data_column text,
 PRIMARY KEY (row_key, wide_row_column));
 
 Check out, for example, 
 http://www.datastax.com/dev/blog/schema-in-cassandra-1-1.​

Internally there may be more **cells** ( as we now call the internal columns). 
In the example above each value for row_key will create a single partition (as 
we now call internal storage engine rows). In each of those partitions there 
will be cells for each CQL 3 row that has the same row_key, those cells will 
use a Composite for the name. The first part of the composite will be the value 
of the wide_row_column and the second will be the literal name of the non 
primary key columns. 

IMHO Wide partitions (storage engine rows) are more prevalent in CQL3 than 
thrift models. 

 But still - I do not see Iteration, so it looks to me that CQL 3 is limited 
 when compared to CLI/Hector.
Now days you can do pretty much everything you can in cli. Provide an example 
and we may be able to help. 

Cheers
Aaron

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 20/05/2014, at 8:18 am, Maciej Miklas mac.mik...@gmail.com wrote:

 Hi James,
 
 Clustering is based on rows. I think that you meant not clustering columns, 
 but compound columns. Still all columns belong to single table and are stored 
 within single folder on one computer. And it looks to me (but I’am not sure) 
 that CQL 3 driver loads all column names into memory - which is confusing to 
 me. From one side we have wide row, but we load whole into ram…..
 
 My understanding of wide row is a row that supports millions of columns, or 
 similar things like map or set. In CLI you would generate column names (or 
 use compound columns) to simulate set or map,  in CQL 3 you would use some 
 static names plus Map or Set structures, or you could still alter table and 
 have large number of columns. But still - I do not see Iteration, so it looks 
 to me that CQL 3 is limited when compared to CLI/Hector.
 
 
 Regards,
 Maciej
 
 On 19 May 2014, at 17:30, James Campbell ja...@breachintelligence.com wrote:
 
 Maciej,
 
 In CQL3 wide rows are expected to be created using clustering columns.  So 
 while the schema will have a relatively smaller number of named columns, the 
 effect is a wide row.  For example:
 
 CREATE TABLE keyspace.widerow (
 row_key text,
 wide_row_column text,
 data_column text,
 PRIMARY KEY (row_key, wide_row_column));
 
 Check out, for example, 
 http://www.datastax.com/dev/blog/schema-in-cassandra-1-1.​
 
 James
 From: Maciej Miklas mac.mik...@gmail.com
 Sent: Monday, May 19, 2014 11:20 AM
 To: user@cassandra.apache.org
 Subject: CQL 3 and wide rows
  
 Hi *,
 
 I’ve checked DataStax driver code for CQL 3, and it looks like the column 
 names for particular table are fully loaded into memory, it this true?
 
 Cassandra should support wide rows, meaning tables with millions of columns. 
 Knowing that, I would expect kind of iterator for column names. Am I missing 
 something here? 
 
 
 Regards,
 Maciej Miklas
 



Re: CQL 3 and wide rows

2014-05-20 Thread Jack Krupansky
To keep the terminology clear, your “row_key” is actually the “partition key”, 
and “wide_row_column” is actually a “clustering column”, and the combination of 
your row_key and wide_row_column is a “compound primary key”.

-- Jack Krupansky

From: Aaron Morton 
Sent: Tuesday, May 20, 2014 3:06 AM
To: Cassandra User 
Subject: Re: CQL 3 and wide rows

In a CQL 3 table the only **column** names are the ones defined in the table, 
in the example below there are three column names.  


CREATE TABLE keyspace.widerow (

row_key text,

wide_row_column text,

data_column text,

PRIMARY KEY (row_key, wide_row_column));


Check out, for example, 
http://www.datastax.com/dev/blog/schema-in-cassandra-1-1.​

Internally there may be more **cells** ( as we now call the internal columns). 
In the example above each value for row_key will create a single partition (as 
we now call internal storage engine rows). In each of those partitions there 
will be cells for each CQL 3 row that has the same row_key, those cells will 
use a Composite for the name. The first part of the composite will be the value 
of the wide_row_column and the second will be the literal name of the non 
primary key columns. 

IMHO Wide partitions (storage engine rows) are more prevalent in CQL3 than 
thrift models. 

  But still - I do not see Iteration, so it looks to me that CQL 3 is limited 
when compared to CLI/Hector.
Now days you can do pretty much everything you can in cli. Provide an example 
and we may be able to help. 

Cheers
Aaron

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 20/05/2014, at 8:18 am, Maciej Miklas mac.mik...@gmail.com wrote:


  Hi James, 

  Clustering is based on rows. I think that you meant not clustering columns, 
but compound columns. Still all columns belong to single table and are stored 
within single folder on one computer. And it looks to me (but I’am not sure) 
that CQL 3 driver loads all column names into memory - which is confusing to 
me. From one side we have wide row, but we load whole into ram…..

  My understanding of wide row is a row that supports millions of columns, or 
similar things like map or set. In CLI you would generate column names (or use 
compound columns) to simulate set or map,  in CQL 3 you would use some static 
names plus Map or Set structures, or you could still alter table and have large 
number of columns. But still - I do not see Iteration, so it looks to me that 
CQL 3 is limited when compared to CLI/Hector.


  Regards,
  Maciej

  On 19 May 2014, at 17:30, James Campbell ja...@breachintelligence.com wrote:


Maciej,


In CQL3 wide rows are expected to be created using clustering columns.  
So while the schema will have a relatively smaller number of named columns, the 
effect is a wide row.  For example:


CREATE TABLE keyspace.widerow (

row_key text,

wide_row_column text,

data_column text,

PRIMARY KEY (row_key, wide_row_column));


Check out, for example, 
http://www.datastax.com/dev/blog/schema-in-cassandra-1-1.​


James




From: Maciej Miklas mac.mik...@gmail.com
Sent: Monday, May 19, 2014 11:20 AM
To: user@cassandra.apache.org
Subject: CQL 3 and wide rows 

Hi *, 

I’ve checked DataStax driver code for CQL 3, and it looks like the column 
names for particular table are fully loaded into memory, it this true?

Cassandra should support wide rows, meaning tables with millions of 
columns. Knowing that, I would expect kind of iterator for column names. Am I 
missing something here? 


Regards,
Maciej Miklas



Re: CQL 3 and wide rows

2014-05-20 Thread Maciej Miklas
yes :)

On 20 May 2014, at 14:24, Jack Krupansky j...@basetechnology.com wrote:

 To keep the terminology clear, your “row_key” is actually the “partition 
 key”, and “wide_row_column” is actually a “clustering column”, and the 
 combination of your row_key and wide_row_column is a “compound primary key”.
  
 -- Jack Krupansky
  
 From: Aaron Morton
 Sent: Tuesday, May 20, 2014 3:06 AM
 To: Cassandra User
 Subject: Re: CQL 3 and wide rows
  
 In a CQL 3 table the only **column** names are the ones defined in the table, 
 in the example below there are three column names. 
  
  
 CREATE TABLE keyspace.widerow (
 row_key text,
 wide_row_column text,
 data_column text,
 PRIMARY KEY (row_key, wide_row_column));
  
 Check out, for example, 
 http://www.datastax.com/dev/blog/schema-in-cassandra-1-1.​
  
 Internally there may be more **cells** ( as we now call the internal 
 columns). In the example above each value for row_key will create a single 
 partition (as we now call internal storage engine rows). In each of those 
 partitions there will be cells for each CQL 3 row that has the same row_key, 
 those cells will use a Composite for the name. The first part of the 
 composite will be the value of the wide_row_column and the second will be the 
 literal name of the non primary key columns.
  
 IMHO Wide partitions (storage engine rows) are more prevalent in CQL3 than 
 thrift models.
  
 But still - I do not see Iteration, so it looks to me that CQL 3 is limited 
 when compared to CLI/Hector.
 Now days you can do pretty much everything you can in cli. Provide an example 
 and we may be able to help.
  
 Cheers
 Aaron
  
 -
 Aaron Morton
 New Zealand
 @aaronmorton
  
 Co-Founder  Principal Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com
  
 On 20/05/2014, at 8:18 am, Maciej Miklas mac.mik...@gmail.com wrote:
 
 Hi James,
  
 Clustering is based on rows. I think that you meant not clustering columns, 
 but compound columns. Still all columns belong to single table and are 
 stored within single folder on one computer. And it looks to me (but I’am 
 not sure) that CQL 3 driver loads all column names into memory - which is 
 confusing to me. From one side we have wide row, but we load whole into 
 ram…..
  
 My understanding of wide row is a row that supports millions of columns, or 
 similar things like map or set. In CLI you would generate column names (or 
 use compound columns) to simulate set or map,  in CQL 3 you would use some 
 static names plus Map or Set structures, or you could still alter table and 
 have large number of columns. But still - I do not see Iteration, so it 
 looks to me that CQL 3 is limited when compared to CLI/Hector.
  
  
 Regards,
 Maciej
  
 On 19 May 2014, at 17:30, James Campbell ja...@breachintelligence.com 
 wrote:
 
 Maciej,
  
 In CQL3 wide rows are expected to be created using clustering columns.  
 So while the schema will have a relatively smaller number of named columns, 
 the effect is a wide row.  For example:
  
 CREATE TABLE keyspace.widerow (
 row_key text,
 wide_row_column text,
 data_column text,
 PRIMARY KEY (row_key, wide_row_column));
  
 Check out, for example, 
 http://www.datastax.com/dev/blog/schema-in-cassandra-1-1.​
  
 James
 From: Maciej Miklas mac.mik...@gmail.com
 Sent: Monday, May 19, 2014 11:20 AM
 To: user@cassandra.apache.org
 Subject: CQL 3 and wide rows
  
 Hi *,
  
 I’ve checked DataStax driver code for CQL 3, and it looks like the column 
 names for particular table are fully loaded into memory, it this true?
  
 Cassandra should support wide rows, meaning tables with millions of 
 columns. Knowing that, I would expect kind of iterator for column names. Am 
 I missing something here?
  
  
 Regards,
 Maciej Miklas
 
  
 
  



Re: CQL 3 and wide rows

2014-05-20 Thread Maciej Miklas
Hi Aron,

Thanks for the answer!


Lest consider such CLI code:

for(int i = 0 ; i  10_000_000 ; i++) {
  set[‘rowKey1’][‘myCol::i’] = UUID.randomUUID();
}


The code above will create single row, that contains 10^6 columns sorted by 
‘i’. This will work fine, and this is the wide row to my understanding - row 
that holds many columns AND I can read only some part of it by right slice 
query. On the other hand side, I can iterate over all columns without latencies 
because data is stored on single node. I’ve been using similar structures as 
replacement for secondary indexes - it’s well known pattern.

How would I model it in CQL 3?

1) I could create Map, but Maps are fully loaded into memory, and Map 
containing 10^6 elements is definitely a problem. Plus it’s a big waste of RAM 
if you consider that I need only to read small subset.

2) I could alter table for each new column, which would create similar 
structure to this one from my CLI example. But it looks to me that all columns 
names are loaded into ram, which is still large limitation. I hope that I am 
wrong here - I am not sure.

3) I could redesign my model and divide data into many rows, but why would I do 
that, if I can use wide rows.

My idea of wide row, is a row that can hold large amount of key-value pairs (in 
any form), where I can filter on those keys to efficiently load only that part 
which I currently need.


Regards,
Maciej 


On 20 May 2014, at 09:06, Aaron Morton aa...@thelastpickle.com wrote:

 In a CQL 3 table the only **column** names are the ones defined in the table, 
 in the example below there are three column names. 
 
 
 CREATE TABLE keyspace.widerow (
 row_key text,
 wide_row_column text,
 data_column text,
 PRIMARY KEY (row_key, wide_row_column));
 
 Check out, for example, 
 http://www.datastax.com/dev/blog/schema-in-cassandra-1-1.​
 
 Internally there may be more **cells** ( as we now call the internal 
 columns). In the example above each value for row_key will create a single 
 partition (as we now call internal storage engine rows). In each of those 
 partitions there will be cells for each CQL 3 row that has the same row_key, 
 those cells will use a Composite for the name. The first part of the 
 composite will be the value of the wide_row_column and the second will be the 
 literal name of the non primary key columns. 
 
 IMHO Wide partitions (storage engine rows) are more prevalent in CQL3 than 
 thrift models. 
 
 But still - I do not see Iteration, so it looks to me that CQL 3 is limited 
 when compared to CLI/Hector.
 Now days you can do pretty much everything you can in cli. Provide an example 
 and we may be able to help. 
 
 Cheers
 Aaron
 
 -
 Aaron Morton
 New Zealand
 @aaronmorton
 
 Co-Founder  Principal Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com
 
 On 20/05/2014, at 8:18 am, Maciej Miklas mac.mik...@gmail.com wrote:
 
 Hi James,
 
 Clustering is based on rows. I think that you meant not clustering columns, 
 but compound columns. Still all columns belong to single table and are 
 stored within single folder on one computer. And it looks to me (but I’am 
 not sure) that CQL 3 driver loads all column names into memory - which is 
 confusing to me. From one side we have wide row, but we load whole into 
 ram…..
 
 My understanding of wide row is a row that supports millions of columns, or 
 similar things like map or set. In CLI you would generate column names (or 
 use compound columns) to simulate set or map,  in CQL 3 you would use some 
 static names plus Map or Set structures, or you could still alter table and 
 have large number of columns. But still - I do not see Iteration, so it 
 looks to me that CQL 3 is limited when compared to CLI/Hector.
 
 
 Regards,
 Maciej
 
 On 19 May 2014, at 17:30, James Campbell ja...@breachintelligence.com 
 wrote:
 
 Maciej,
 
 In CQL3 wide rows are expected to be created using clustering columns.  
 So while the schema will have a relatively smaller number of named columns, 
 the effect is a wide row.  For example:
 
 CREATE TABLE keyspace.widerow (
 row_key text,
 wide_row_column text,
 data_column text,
 PRIMARY KEY (row_key, wide_row_column));
 
 Check out, for example, 
 http://www.datastax.com/dev/blog/schema-in-cassandra-1-1.​
 
 James
 From: Maciej Miklas mac.mik...@gmail.com
 Sent: Monday, May 19, 2014 11:20 AM
 To: user@cassandra.apache.org
 Subject: CQL 3 and wide rows
  
 Hi *,
 
 I’ve checked DataStax driver code for CQL 3, and it looks like the column 
 names for particular table are fully loaded into memory, it this true?
 
 Cassandra should support wide rows, meaning tables with millions of 
 columns. Knowing that, I would expect kind of iterator for column names. Am 
 I missing something here? 
 
 
 Regards,
 Maciej Miklas
 
 



Re: CQL 3 and wide rows

2014-05-20 Thread Nate McCall
Something like this might work:


cqlsh:my_keyspace CREATE TABLE my_widerow (
 ...   id text,
 ...   my_col timeuuid,
 ...   PRIMARY KEY (id, my_col)
 ... ) WITH caching='KEYS_ONLY' AND
 ...   compaction={'class': 'LeveledCompactionStrategy'};
cqlsh:my_keyspace insert into my_widerow (id, my_col) values
('some_key_1',now());
cqlsh:my_keyspace insert into my_widerow (id, my_col) values
('some_key_1',now());
cqlsh:my_keyspace insert into my_widerow (id, my_col) values
('some_key_1',now());
cqlsh:my_keyspace insert into my_widerow (id, my_col) values
('some_key_1',now());
cqlsh:my_keyspace insert into my_widerow (id, my_col) values
('some_key_1',now());
cqlsh:my_keyspace insert into my_widerow (id, my_col) values
('some_key_1',now());
cqlsh:my_keyspace insert into my_widerow (id, my_col) values
('some_key_1',now());
cqlsh:my_keyspace insert into my_widerow (id, my_col) values
('some_key_1',now());
cqlsh:my_keyspace insert into my_widerow (id, my_col) values
('some_key_1',now());
cqlsh:my_keyspace insert into my_widerow (id, my_col) values
('some_key_1',now());
cqlsh:my_keyspace select * from my_widerow;

 id | my_col
+--
 some_key_1 | 7266d240-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 73ba0630-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 74404d30-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 74defe30-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 75569f30-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 75bf9a30-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 76227ab0-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 76cfd1b0-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 777364b0-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 7aa061b0-e030-11e3-a50d-8b2f9bfbfa10

cqlsh:my_keyspace select * from my_widerow where id = 'some_key_1' and
my_col  73ba0630-e030-11e3-a50d-8b2f9bfbfa10;

 id | my_col
+--
 some_key_1 | 74404d30-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 74defe30-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 75569f30-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 75bf9a30-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 76227ab0-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 76cfd1b0-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 777364b0-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 7aa061b0-e030-11e3-a50d-8b2f9bfbfa10

cqlsh:my_keyspace select * from my_widerow where id = 'some_key_1' and
my_col  73ba0630-e030-11e3-a50d-8b2f9bfbfa10 and my_col 
76227ab0-e030-11e3-a50d-8b2f9bfbfa10;

 id | my_col
+--
 some_key_1 | 74404d30-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 74defe30-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 75569f30-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 75bf9a30-e030-11e3-a50d-8b2f9bfbfa10



These queries would all work fine from the DS Java Driver. Note that only
the cells that are needed are pulled into memory:


./bin/nodetool cfstats my_keyspace my_widerow
   ...
   Column Family: my_widerow
   ...
   Average live cells per slice (last five minutes): 6.0
   ...


This shows that we are slicing across 6 rows on average for the last couple
of select statements.

Hope that helps.



-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder  Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: CQL 3 and wide rows

2014-05-20 Thread Maciej Miklas
Thank you Nate - now I understand it ! This is real improvement when compared 
to CLI :)

Regards,
Maciej


On 20 May 2014, at 17:16, Nate McCall n...@thelastpickle.com wrote:

 Something like this might work:
 
 
 cqlsh:my_keyspace CREATE TABLE my_widerow (
  ...   id text,
  ...   my_col timeuuid,
  ...   PRIMARY KEY (id, my_col)
  ... ) WITH caching='KEYS_ONLY' AND
  ...   compaction={'class': 'LeveledCompactionStrategy'};
 cqlsh:my_keyspace insert into my_widerow (id, my_col) values 
 ('some_key_1',now());
 cqlsh:my_keyspace insert into my_widerow (id, my_col) values 
 ('some_key_1',now());
 cqlsh:my_keyspace insert into my_widerow (id, my_col) values 
 ('some_key_1',now());
 cqlsh:my_keyspace insert into my_widerow (id, my_col) values 
 ('some_key_1',now());
 cqlsh:my_keyspace insert into my_widerow (id, my_col) values 
 ('some_key_1',now());
 cqlsh:my_keyspace insert into my_widerow (id, my_col) values 
 ('some_key_1',now());
 cqlsh:my_keyspace insert into my_widerow (id, my_col) values 
 ('some_key_1',now());
 cqlsh:my_keyspace insert into my_widerow (id, my_col) values 
 ('some_key_1',now());
 cqlsh:my_keyspace insert into my_widerow (id, my_col) values 
 ('some_key_1',now());
 cqlsh:my_keyspace insert into my_widerow (id, my_col) values 
 ('some_key_1',now());
 cqlsh:my_keyspace select * from my_widerow;
 
  id | my_col
 +--
  some_key_1 | 7266d240-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 73ba0630-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 74404d30-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 74defe30-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 75569f30-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 75bf9a30-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 76227ab0-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 76cfd1b0-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 777364b0-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 7aa061b0-e030-11e3-a50d-8b2f9bfbfa10
 
 cqlsh:my_keyspace select * from my_widerow where id = 'some_key_1' and 
 my_col  73ba0630-e030-11e3-a50d-8b2f9bfbfa10;
 
  id | my_col
 +--
  some_key_1 | 74404d30-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 74defe30-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 75569f30-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 75bf9a30-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 76227ab0-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 76cfd1b0-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 777364b0-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 7aa061b0-e030-11e3-a50d-8b2f9bfbfa10
 
 cqlsh:my_keyspace select * from my_widerow where id = 'some_key_1' and 
 my_col  73ba0630-e030-11e3-a50d-8b2f9bfbfa10 and my_col  
 76227ab0-e030-11e3-a50d-8b2f9bfbfa10;
 
  id | my_col
 +--
  some_key_1 | 74404d30-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 74defe30-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 75569f30-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 75bf9a30-e030-11e3-a50d-8b2f9bfbfa10
 
 
 
 These queries would all work fine from the DS Java Driver. Note that only the 
 cells that are needed are pulled into memory:
 
 
 ./bin/nodetool cfstats my_keyspace my_widerow
...
Column Family: my_widerow
...
Average live cells per slice (last five minutes): 6.0
...
 
 
 This shows that we are slicing across 6 rows on average for the last couple 
 of select statements. 
 
 Hope that helps.
 
 
 
 -- 
 -
 Nate McCall
 Austin, TX
 @zznate
 
 Co-Founder  Sr. Technical Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com



CQL 3 and wide rows

2014-05-19 Thread Maciej Miklas
Hi *,

I’ve checked DataStax driver code for CQL 3, and it looks like the column
names for particular table are fully loaded into memory, it this true?

Cassandra should support wide rows, meaning tables with millions of
columns. Knowing that, I would expect kind of iterator for column names. Am
I missing something here?


Regards,
Maciej Miklas


RE: CQL 3 and wide rows

2014-05-19 Thread James Campbell
Maciej,


In CQL3 wide rows are expected to be created using clustering columns.  So 
while the schema will have a relatively smaller number of named columns, the 
effect is a wide row.  For example:


CREATE TABLE keyspace.widerow (

row_key text,

wide_row_column text,

data_column text,

PRIMARY KEY (row_key, wide_row_column));


Check out, for example, 
http://www.datastax.com/dev/blog/schema-in-cassandra-1-1.?


James


From: Maciej Miklas mac.mik...@gmail.com
Sent: Monday, May 19, 2014 11:20 AM
To: user@cassandra.apache.org
Subject: CQL 3 and wide rows

Hi *,

I've checked DataStax driver code for CQL 3, and it looks like the column names 
for particular table are fully loaded into memory, it this true?

Cassandra should support wide rows, meaning tables with millions of columns. 
Knowing that, I would expect kind of iterator for column names. Am I missing 
something here?


Regards,
Maciej Miklas


Re: CQL 3 and wide rows

2014-05-19 Thread Jack Krupansky
You might want to review this blog post on supporting dynamic columns in CQL3, 
which points out that “the way to model dynamic cells in CQL is with a compound 
primary key.”

See:
http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows

-- Jack Krupansky

From: Maciej Miklas 
Sent: Monday, May 19, 2014 11:20 AM
To: user@cassandra.apache.org 
Subject: CQL 3 and wide rows

Hi *, 

I’ve checked DataStax driver code for CQL 3, and it looks like the column names 
for particular table are fully loaded into memory, it this true?

Cassandra should support wide rows, meaning tables with millions of columns. 
Knowing that, I would expect kind of iterator for column names. Am I missing 
something here? 


Regards,
Maciej Miklas

Re: CQL 3 and wide rows

2014-05-19 Thread Maciej Miklas
Hallo Jack,

You have given a perfect example for wide row.  Each reading from sensor 
creates new column within a row. It was also possible with Hector/CLI to have 
millions of columns within a single row. According to this page 
http://wiki.apache.org/cassandra/CassandraLimitations single row can have 2 
billions columns.

How does this relate to CQL 3 and tables? 

I still do not understand it because:
- it looks like driver loads all column names into memory - it looks to me that 
the 2 billions limitation from CLI is not valid anymore
- Map and Set values do not support iterator 


Regards,
Maciej


On 19 May 2014, at 17:31, Jack Krupansky j...@basetechnology.com wrote:

 You might want to review this blog post on supporting dynamic columns in 
 CQL3, which points out that “the way to model dynamic cells in CQL is with a 
 compound primary key.”
  
 See:
 http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows
  
 -- Jack Krupansky
  
 From: Maciej Miklas
 Sent: Monday, May 19, 2014 11:20 AM
 To: user@cassandra.apache.org
 Subject: CQL 3 and wide rows
  
 Hi *,
  
 I’ve checked DataStax driver code for CQL 3, and it looks like the column 
 names for particular table are fully loaded into memory, it this true?
  
 Cassandra should support wide rows, meaning tables with millions of columns. 
 Knowing that, I would expect kind of iterator for column names. Am I missing 
 something here?
  
  
 Regards,
 Maciej Miklas



Re: CQL 3 and wide rows

2014-05-19 Thread Maciej Miklas
Hi James,

Clustering is based on rows. I think that you meant not clustering columns, but 
compound columns. Still all columns belong to single table and are stored 
within single folder on one computer. And it looks to me (but I’am not sure) 
that CQL 3 driver loads all column names into memory - which is confusing to 
me. From one side we have wide row, but we load whole into ram…..

My understanding of wide row is a row that supports millions of columns, or 
similar things like map or set. In CLI you would generate column names (or use 
compound columns) to simulate set or map,  in CQL 3 you would use some static 
names plus Map or Set structures, or you could still alter table and have large 
number of columns. But still - I do not see Iteration, so it looks to me that 
CQL 3 is limited when compared to CLI/Hector.


Regards,
Maciej

On 19 May 2014, at 17:30, James Campbell ja...@breachintelligence.com wrote:

 Maciej,
 
 In CQL3 wide rows are expected to be created using clustering columns.  So 
 while the schema will have a relatively smaller number of named columns, the 
 effect is a wide row.  For example:
 
 CREATE TABLE keyspace.widerow (
 row_key text,
 wide_row_column text,
 data_column text,
 PRIMARY KEY (row_key, wide_row_column));
 
 Check out, for example, 
 http://www.datastax.com/dev/blog/schema-in-cassandra-1-1.​
 
 James
 From: Maciej Miklas mac.mik...@gmail.com
 Sent: Monday, May 19, 2014 11:20 AM
 To: user@cassandra.apache.org
 Subject: CQL 3 and wide rows
  
 Hi *,
 
 I’ve checked DataStax driver code for CQL 3, and it looks like the column 
 names for particular table are fully loaded into memory, it this true?
 
 Cassandra should support wide rows, meaning tables with millions of columns. 
 Knowing that, I would expect kind of iterator for column names. Am I missing 
 something here? 
 
 
 Regards,
 Maciej Miklas