Re: Maximum number of columns in a table

2016-09-16 Thread Jens Rantil
I listened to a talk about the new Cassandra 3 file (sstable) format. One
takeaway was that the new format supports sparse data better. That is, is
you have 2000 columns, but only setting some of the columns, the disk usage
will be much less.

Cheers,
Jens

On Thu, Sep 15, 2016 at 10:24 PM Dorian Hoxha 
wrote:

> @DuyHai
> I know they don't support.
> I need key+value mapping, not just "values" or just "keys".
>
> I'll use the lucene index.
>
>
>
> On Thu, Sep 15, 2016 at 10:23 PM, DuyHai Doan 
> wrote:
>
>> I'd advise anyone against using the old native secondary index ... You'll
>> get poor performance (that's the main reason why some people developed
>> SASI).
>>
>> On Thu, Sep 15, 2016 at 10:20 PM, Hannu Kröger  wrote:
>>
>>> Hi,
>>>
>>> The ‘old-fashioned’ secondary indexes do support index of collection
>>> values:
>>> https://docs.datastax.com/en/cql/3.1/cql/ddl/ddlIndexColl.html
>>>
>>> Br,
>>> Hannu
>>>
>>> On 15 Sep 2016, at 15:59, DuyHai Doan  wrote:
>>>
>>> "But the problem is I can't use secondary indexing "where int25=5",
>>> while with normal columns I can."
>>>
>>> You have many objectives that contradict themselves in term of impl.
>>>
>>> Right now you're unlucky, SASI does not support indexing collections yet
>>> (it may come in future, when ?  ¯\_(ツ)_/¯ )
>>>
>>> If you're using DSE Search or Stratio Lucene Index, you can index map
>>> values
>>>
>>> On Thu, Sep 15, 2016 at 9:53 PM, Dorian Hoxha 
>>> wrote:
>>>
 Yes that makes more sense. But the problem is I can't use secondary
 indexing "where int25=5", while with normal columns I can.

 On Thu, Sep 15, 2016 at 8:23 PM, sfesc...@gmail.com >>> > wrote:

> I agree a single blob would also work (I do that in some cases). The
> reason for the map is if you need more flexible updating. I think your
> solution of a map/data type works well.
>
> On Thu, Sep 15, 2016 at 11:10 AM DuyHai Doan 
> wrote:
>
>> "But I need rows together to work with them (indexing etc)"
>>
>> What do you mean rows together ? You mean that you want to fetch a
>> single row instead of 1 row per property right ?
>>
>> In this case, the map might be the solution:
>>
>> CREATE TABLE generic_with_maps(
>>object_id uuid
>>boolean_map map
>>text_map map
>>long_map map,
>>...
>>PRIMARY KEY(object_id)
>> );
>>
>> The trick here is to store all the fields of the object in different
>> map, depending on the type of the field.
>>
>> The map key is always text and it contains the name of the field.
>>
>> Example
>>
>> {
>>"id": ,
>> "name": "John DOE",
>> "age":  32,
>> "last_visited_date":  "2016-09-10 12:01:03",
>> }
>>
>> INSERT INTO generic_with_maps(id, map_text, map_long, map_date)
>> VALUES(xxx, {'name': 'John DOE'}, {'age': 32}, {'last_visited_date': 
>> '2016-09-10
>> 12:01:03'});
>>
>> When you do a select, you'll get a SINGLE row returned. But then you
>> need to extract all the properties from different maps, not a big deal
>>
>> On Thu, Sep 15, 2016 at 7:54 PM, Dorian Hoxha > > wrote:
>>
>>> @DuyHai
>>> Yes, that's another case, the "entity" model used in rdbms. But I
>>> need rows together to work with them (indexing etc).
>>>
>>> @sfespace
>>> The map is needed when you have a dynamic schema. I don't have a
>>> dynamic schema (may have, and will use the map if I do). I just have
>>> thousands of schemas. One user needs 10 integers, while another user 
>>> needs
>>> 20 booleans, and another needs 30 integers, or a combination of them 
>>> all.
>>>
>>> On Thu, Sep 15, 2016 at 7:46 PM, DuyHai Doan 
>>> wrote:
>>>
 "Another possible alternative is to use a single map column"

 --> how do you manage the different types then ? Because maps in
 Cassandra are strongly typed

 Unless you set the type of map value to blob, in this case you
 might as well store all the object as a single blob column

 On Thu, Sep 15, 2016 at 6:13 PM, sfesc...@gmail.com <
 sfesc...@gmail.com> wrote:

> Another possible alternative is to use a single map column.
>
>
> On Thu, Sep 15, 2016 at 7:19 AM Dorian Hoxha <
> dorian.ho...@gmail.com> wrote:
>
>> Since I will only have 1 table with that many columns, and the
>> other tables will be "normal" tables with max 30 columns, and the 
>> memory of
>> 2K columns won't be that big, I'm gonna guess I'll be fine.
>>
>> The data model is too dynamic, the alternative would be to create
>> a table for each user which will have even more overhead since the 
>> number
>> of users is in the several thousands/millions.
>>
>>
>>

Re: Maximum number of columns in a table

2016-09-15 Thread Dorian Hoxha
@DuyHai
I know they don't support.
I need key+value mapping, not just "values" or just "keys".

I'll use the lucene index.



On Thu, Sep 15, 2016 at 10:23 PM, DuyHai Doan  wrote:

> I'd advise anyone against using the old native secondary index ... You'll
> get poor performance (that's the main reason why some people developed
> SASI).
>
> On Thu, Sep 15, 2016 at 10:20 PM, Hannu Kröger  wrote:
>
>> Hi,
>>
>> The ‘old-fashioned’ secondary indexes do support index of collection
>> values:
>> https://docs.datastax.com/en/cql/3.1/cql/ddl/ddlIndexColl.html
>>
>> Br,
>> Hannu
>>
>> On 15 Sep 2016, at 15:59, DuyHai Doan  wrote:
>>
>> "But the problem is I can't use secondary indexing "where int25=5", while
>> with normal columns I can."
>>
>> You have many objectives that contradict themselves in term of impl.
>>
>> Right now you're unlucky, SASI does not support indexing collections yet
>> (it may come in future, when ?  ¯\_(ツ)_/¯ )
>>
>> If you're using DSE Search or Stratio Lucene Index, you can index map
>> values
>>
>> On Thu, Sep 15, 2016 at 9:53 PM, Dorian Hoxha 
>> wrote:
>>
>>> Yes that makes more sense. But the problem is I can't use secondary
>>> indexing "where int25=5", while with normal columns I can.
>>>
>>> On Thu, Sep 15, 2016 at 8:23 PM, sfesc...@gmail.com 
>>> wrote:
>>>
 I agree a single blob would also work (I do that in some cases). The
 reason for the map is if you need more flexible updating. I think your
 solution of a map/data type works well.

 On Thu, Sep 15, 2016 at 11:10 AM DuyHai Doan 
 wrote:

> "But I need rows together to work with them (indexing etc)"
>
> What do you mean rows together ? You mean that you want to fetch a
> single row instead of 1 row per property right ?
>
> In this case, the map might be the solution:
>
> CREATE TABLE generic_with_maps(
>object_id uuid
>boolean_map map
>text_map map
>long_map map,
>...
>PRIMARY KEY(object_id)
> );
>
> The trick here is to store all the fields of the object in different
> map, depending on the type of the field.
>
> The map key is always text and it contains the name of the field.
>
> Example
>
> {
>"id": ,
> "name": "John DOE",
> "age":  32,
> "last_visited_date":  "2016-09-10 12:01:03",
> }
>
> INSERT INTO generic_with_maps(id, map_text, map_long, map_date)
> VALUES(xxx, {'name': 'John DOE'}, {'age': 32}, {'last_visited_date': 
> '2016-09-10
> 12:01:03'});
>
> When you do a select, you'll get a SINGLE row returned. But then you
> need to extract all the properties from different maps, not a big deal
>
> On Thu, Sep 15, 2016 at 7:54 PM, Dorian Hoxha 
> wrote:
>
>> @DuyHai
>> Yes, that's another case, the "entity" model used in rdbms. But I
>> need rows together to work with them (indexing etc).
>>
>> @sfespace
>> The map is needed when you have a dynamic schema. I don't have a
>> dynamic schema (may have, and will use the map if I do). I just have
>> thousands of schemas. One user needs 10 integers, while another user 
>> needs
>> 20 booleans, and another needs 30 integers, or a combination of them all.
>>
>> On Thu, Sep 15, 2016 at 7:46 PM, DuyHai Doan 
>> wrote:
>>
>>> "Another possible alternative is to use a single map column"
>>>
>>> --> how do you manage the different types then ? Because maps in
>>> Cassandra are strongly typed
>>>
>>> Unless you set the type of map value to blob, in this case you might
>>> as well store all the object as a single blob column
>>>
>>> On Thu, Sep 15, 2016 at 6:13 PM, sfesc...@gmail.com <
>>> sfesc...@gmail.com> wrote:
>>>
 Another possible alternative is to use a single map column.


 On Thu, Sep 15, 2016 at 7:19 AM Dorian Hoxha <
 dorian.ho...@gmail.com> wrote:

> Since I will only have 1 table with that many columns, and the
> other tables will be "normal" tables with max 30 columns, and the 
> memory of
> 2K columns won't be that big, I'm gonna guess I'll be fine.
>
> The data model is too dynamic, the alternative would be to create
> a table for each user which will have even more overhead since the 
> number
> of users is in the several thousands/millions.
>
>
> On Thu, Sep 15, 2016 at 3:04 PM, DuyHai Doan  > wrote:
>
>> There is no real limit in term of number of columns in a table, I
>> would say that the impact of having a lot of columns is the amount 
>> of meta
>> data C* needs to keep in memory for encoding/decoding each row.
>>
>> Now, if you have a table with 1000+ columns, the problem is
>> probably your data model...
>>
>>>

Re: Maximum number of columns in a table

2016-09-15 Thread Hannu Kröger
I do agree on that.

> On 15 Sep 2016, at 16:23, DuyHai Doan  wrote:
> 
> I'd advise anyone against using the old native secondary index ... You'll get 
> poor performance (that's the main reason why some people developed SASI).
> 
> On Thu, Sep 15, 2016 at 10:20 PM, Hannu Kröger  > wrote:
> Hi,
> 
> The ‘old-fashioned’ secondary indexes do support index of collection values:
> https://docs.datastax.com/en/cql/3.1/cql/ddl/ddlIndexColl.html 
> 
> 
> Br,
> Hannu
> 
>> On 15 Sep 2016, at 15:59, DuyHai Doan > > wrote:
>> 
>> "But the problem is I can't use secondary indexing "where int25=5", while 
>> with normal columns I can."
>> 
>> You have many objectives that contradict themselves in term of impl.
>> 
>> Right now you're unlucky, SASI does not support indexing collections yet (it 
>> may come in future, when ?  ¯\_(ツ)_/¯ )
>> 
>> If you're using DSE Search or Stratio Lucene Index, you can index map values 
>> 
>> On Thu, Sep 15, 2016 at 9:53 PM, Dorian Hoxha > > wrote:
>> Yes that makes more sense. But the problem is I can't use secondary indexing 
>> "where int25=5", while with normal columns I can.
>> 
>> On Thu, Sep 15, 2016 at 8:23 PM, sfesc...@gmail.com 
>>  mailto:sfesc...@gmail.com>> 
>> wrote:
>> I agree a single blob would also work (I do that in some cases). The reason 
>> for the map is if you need more flexible updating. I think your solution of 
>> a map/data type works well.
>> 
>> On Thu, Sep 15, 2016 at 11:10 AM DuyHai Doan > > wrote:
>> "But I need rows together to work with them (indexing etc)"
>> 
>> What do you mean rows together ? You mean that you want to fetch a single 
>> row instead of 1 row per property right ?
>> 
>> In this case, the map might be the solution:
>> 
>> CREATE TABLE generic_with_maps(
>>object_id uuid
>>boolean_map map
>>text_map map
>>long_map map,
>>...
>>PRIMARY KEY(object_id)
>> );
>> 
>> The trick here is to store all the fields of the object in different map, 
>> depending on the type of the field.
>> 
>> The map key is always text and it contains the name of the field.
>> 
>> Example
>> 
>> {
>>"id": ,
>> "name": "John DOE",
>> "age":  32,
>> "last_visited_date":  "2016-09-10 12:01:03", 
>> }
>> 
>> INSERT INTO generic_with_maps(id, map_text, map_long, map_date)
>> VALUES(xxx, {'name': 'John DOE'}, {'age': 32}, {'last_visited_date': 
>> '2016-09-10 12:01:03'});
>> 
>> When you do a select, you'll get a SINGLE row returned. But then you need to 
>> extract all the properties from different maps, not a big deal
>> 
>> On Thu, Sep 15, 2016 at 7:54 PM, Dorian Hoxha > > wrote:
>> @DuyHai
>> Yes, that's another case, the "entity" model used in rdbms. But I need rows 
>> together to work with them (indexing etc).
>> 
>> @sfespace
>> The map is needed when you have a dynamic schema. I don't have a dynamic 
>> schema (may have, and will use the map if I do). I just have thousands of 
>> schemas. One user needs 10 integers, while another user needs 20 booleans, 
>> and another needs 30 integers, or a combination of them all.
>> 
>> On Thu, Sep 15, 2016 at 7:46 PM, DuyHai Doan > > wrote:
>> "Another possible alternative is to use a single map column"
>> 
>> --> how do you manage the different types then ? Because maps in Cassandra 
>> are strongly typed
>> 
>> Unless you set the type of map value to blob, in this case you might as well 
>> store all the object as a single blob column
>> 
>> On Thu, Sep 15, 2016 at 6:13 PM, sfesc...@gmail.com 
>>  mailto:sfesc...@gmail.com>> 
>> wrote:
>> Another possible alternative is to use a single map column.
>> 
>> 
>> On Thu, Sep 15, 2016 at 7:19 AM Dorian Hoxha > > wrote:
>> Since I will only have 1 table with that many columns, and the other tables 
>> will be "normal" tables with max 30 columns, and the memory of 2K columns 
>> won't be that big, I'm gonna guess I'll be fine.
>> 
>> The data model is too dynamic, the alternative would be to create a table 
>> for each user which will have even more overhead since the number of users 
>> is in the several thousands/millions.
>> 
>> 
>> On Thu, Sep 15, 2016 at 3:04 PM, DuyHai Doan > > wrote:
>> There is no real limit in term of number of columns in a table, I would say 
>> that the impact of having a lot of columns is the amount of meta data C* 
>> needs to keep in memory for encoding/decoding each row.
>> 
>> Now, if you have a table with 1000+ columns, the problem is probably your 
>> data model...
>> 
>> On Thu, Sep 15, 2016 at 2:59 PM, Dorian Hoxha > > wrote:
>> Is there alot of overhead with having a big number of columns in a table ? 
>> Not unbounded

Re: Maximum number of columns in a table

2016-09-15 Thread DuyHai Doan
I'd advise anyone against using the old native secondary index ... You'll
get poor performance (that's the main reason why some people developed
SASI).

On Thu, Sep 15, 2016 at 10:20 PM, Hannu Kröger  wrote:

> Hi,
>
> The ‘old-fashioned’ secondary indexes do support index of collection
> values:
> https://docs.datastax.com/en/cql/3.1/cql/ddl/ddlIndexColl.html
>
> Br,
> Hannu
>
> On 15 Sep 2016, at 15:59, DuyHai Doan  wrote:
>
> "But the problem is I can't use secondary indexing "where int25=5", while
> with normal columns I can."
>
> You have many objectives that contradict themselves in term of impl.
>
> Right now you're unlucky, SASI does not support indexing collections yet
> (it may come in future, when ?  ¯\_(ツ)_/¯ )
>
> If you're using DSE Search or Stratio Lucene Index, you can index map
> values
>
> On Thu, Sep 15, 2016 at 9:53 PM, Dorian Hoxha 
> wrote:
>
>> Yes that makes more sense. But the problem is I can't use secondary
>> indexing "where int25=5", while with normal columns I can.
>>
>> On Thu, Sep 15, 2016 at 8:23 PM, sfesc...@gmail.com 
>> wrote:
>>
>>> I agree a single blob would also work (I do that in some cases). The
>>> reason for the map is if you need more flexible updating. I think your
>>> solution of a map/data type works well.
>>>
>>> On Thu, Sep 15, 2016 at 11:10 AM DuyHai Doan 
>>> wrote:
>>>
 "But I need rows together to work with them (indexing etc)"

 What do you mean rows together ? You mean that you want to fetch a
 single row instead of 1 row per property right ?

 In this case, the map might be the solution:

 CREATE TABLE generic_with_maps(
object_id uuid
boolean_map map
text_map map
long_map map,
...
PRIMARY KEY(object_id)
 );

 The trick here is to store all the fields of the object in different
 map, depending on the type of the field.

 The map key is always text and it contains the name of the field.

 Example

 {
"id": ,
 "name": "John DOE",
 "age":  32,
 "last_visited_date":  "2016-09-10 12:01:03",
 }

 INSERT INTO generic_with_maps(id, map_text, map_long, map_date)
 VALUES(xxx, {'name': 'John DOE'}, {'age': 32}, {'last_visited_date': 
 '2016-09-10
 12:01:03'});

 When you do a select, you'll get a SINGLE row returned. But then you
 need to extract all the properties from different maps, not a big deal

 On Thu, Sep 15, 2016 at 7:54 PM, Dorian Hoxha 
 wrote:

> @DuyHai
> Yes, that's another case, the "entity" model used in rdbms. But I need
> rows together to work with them (indexing etc).
>
> @sfespace
> The map is needed when you have a dynamic schema. I don't have a
> dynamic schema (may have, and will use the map if I do). I just have
> thousands of schemas. One user needs 10 integers, while another user needs
> 20 booleans, and another needs 30 integers, or a combination of them all.
>
> On Thu, Sep 15, 2016 at 7:46 PM, DuyHai Doan 
> wrote:
>
>> "Another possible alternative is to use a single map column"
>>
>> --> how do you manage the different types then ? Because maps in
>> Cassandra are strongly typed
>>
>> Unless you set the type of map value to blob, in this case you might
>> as well store all the object as a single blob column
>>
>> On Thu, Sep 15, 2016 at 6:13 PM, sfesc...@gmail.com <
>> sfesc...@gmail.com> wrote:
>>
>>> Another possible alternative is to use a single map column.
>>>
>>>
>>> On Thu, Sep 15, 2016 at 7:19 AM Dorian Hoxha 
>>> wrote:
>>>
 Since I will only have 1 table with that many columns, and the
 other tables will be "normal" tables with max 30 columns, and the 
 memory of
 2K columns won't be that big, I'm gonna guess I'll be fine.

 The data model is too dynamic, the alternative would be to create a
 table for each user which will have even more overhead since the 
 number of
 users is in the several thousands/millions.


 On Thu, Sep 15, 2016 at 3:04 PM, DuyHai Doan 
 wrote:

> There is no real limit in term of number of columns in a table, I
> would say that the impact of having a lot of columns is the amount of 
> meta
> data C* needs to keep in memory for encoding/decoding each row.
>
> Now, if you have a table with 1000+ columns, the problem is
> probably your data model...
>
> On Thu, Sep 15, 2016 at 2:59 PM, Dorian Hoxha <
> dorian.ho...@gmail.com> wrote:
>
>> Is there alot of overhead with having a big number of columns in
>> a table ? Not unbounded, but say, would 2000 be a problem(I think 
>> that's
>> the maximum I'll need) ?
>>
>> Thank You
>

Re: Maximum number of columns in a table

2016-09-15 Thread Hannu Kröger
Hi,

The ‘old-fashioned’ secondary indexes do support index of collection values:
https://docs.datastax.com/en/cql/3.1/cql/ddl/ddlIndexColl.html 


Br,
Hannu

> On 15 Sep 2016, at 15:59, DuyHai Doan  wrote:
> 
> "But the problem is I can't use secondary indexing "where int25=5", while 
> with normal columns I can."
> 
> You have many objectives that contradict themselves in term of impl.
> 
> Right now you're unlucky, SASI does not support indexing collections yet (it 
> may come in future, when ?  ¯\_(ツ)_/¯ )
> 
> If you're using DSE Search or Stratio Lucene Index, you can index map values 
> 
> On Thu, Sep 15, 2016 at 9:53 PM, Dorian Hoxha  > wrote:
> Yes that makes more sense. But the problem is I can't use secondary indexing 
> "where int25=5", while with normal columns I can.
> 
> On Thu, Sep 15, 2016 at 8:23 PM, sfesc...@gmail.com 
>  mailto:sfesc...@gmail.com>> 
> wrote:
> I agree a single blob would also work (I do that in some cases). The reason 
> for the map is if you need more flexible updating. I think your solution of a 
> map/data type works well.
> 
> On Thu, Sep 15, 2016 at 11:10 AM DuyHai Doan  > wrote:
> "But I need rows together to work with them (indexing etc)"
> 
> What do you mean rows together ? You mean that you want to fetch a single row 
> instead of 1 row per property right ?
> 
> In this case, the map might be the solution:
> 
> CREATE TABLE generic_with_maps(
>object_id uuid
>boolean_map map
>text_map map
>long_map map,
>...
>PRIMARY KEY(object_id)
> );
> 
> The trick here is to store all the fields of the object in different map, 
> depending on the type of the field.
> 
> The map key is always text and it contains the name of the field.
> 
> Example
> 
> {
>"id": ,
> "name": "John DOE",
> "age":  32,
> "last_visited_date":  "2016-09-10 12:01:03", 
> }
> 
> INSERT INTO generic_with_maps(id, map_text, map_long, map_date)
> VALUES(xxx, {'name': 'John DOE'}, {'age': 32}, {'last_visited_date': 
> '2016-09-10 12:01:03'});
> 
> When you do a select, you'll get a SINGLE row returned. But then you need to 
> extract all the properties from different maps, not a big deal
> 
> On Thu, Sep 15, 2016 at 7:54 PM, Dorian Hoxha  > wrote:
> @DuyHai
> Yes, that's another case, the "entity" model used in rdbms. But I need rows 
> together to work with them (indexing etc).
> 
> @sfespace
> The map is needed when you have a dynamic schema. I don't have a dynamic 
> schema (may have, and will use the map if I do). I just have thousands of 
> schemas. One user needs 10 integers, while another user needs 20 booleans, 
> and another needs 30 integers, or a combination of them all.
> 
> On Thu, Sep 15, 2016 at 7:46 PM, DuyHai Doan  > wrote:
> "Another possible alternative is to use a single map column"
> 
> --> how do you manage the different types then ? Because maps in Cassandra 
> are strongly typed
> 
> Unless you set the type of map value to blob, in this case you might as well 
> store all the object as a single blob column
> 
> On Thu, Sep 15, 2016 at 6:13 PM, sfesc...@gmail.com 
>  mailto:sfesc...@gmail.com>> 
> wrote:
> Another possible alternative is to use a single map column.
> 
> 
> On Thu, Sep 15, 2016 at 7:19 AM Dorian Hoxha  > wrote:
> Since I will only have 1 table with that many columns, and the other tables 
> will be "normal" tables with max 30 columns, and the memory of 2K columns 
> won't be that big, I'm gonna guess I'll be fine.
> 
> The data model is too dynamic, the alternative would be to create a table for 
> each user which will have even more overhead since the number of users is in 
> the several thousands/millions.
> 
> 
> On Thu, Sep 15, 2016 at 3:04 PM, DuyHai Doan  > wrote:
> There is no real limit in term of number of columns in a table, I would say 
> that the impact of having a lot of columns is the amount of meta data C* 
> needs to keep in memory for encoding/decoding each row.
> 
> Now, if you have a table with 1000+ columns, the problem is probably your 
> data model...
> 
> On Thu, Sep 15, 2016 at 2:59 PM, Dorian Hoxha  > wrote:
> Is there alot of overhead with having a big number of columns in a table ? 
> Not unbounded, but say, would 2000 be a problem(I think that's the maximum 
> I'll need) ?
> 
> Thank You
> 
> 
> 
> 
> 
> 
> 



Re: Maximum number of columns in a table

2016-09-15 Thread DuyHai Doan
"But the problem is I can't use secondary indexing "where int25=5", while
with normal columns I can."

You have many objectives that contradict themselves in term of impl.

Right now you're unlucky, SASI does not support indexing collections yet
(it may come in future, when ?  ¯\_(ツ)_/¯ )

If you're using DSE Search or Stratio Lucene Index, you can index map
values

On Thu, Sep 15, 2016 at 9:53 PM, Dorian Hoxha 
wrote:

> Yes that makes more sense. But the problem is I can't use secondary
> indexing "where int25=5", while with normal columns I can.
>
> On Thu, Sep 15, 2016 at 8:23 PM, sfesc...@gmail.com 
> wrote:
>
>> I agree a single blob would also work (I do that in some cases). The
>> reason for the map is if you need more flexible updating. I think your
>> solution of a map/data type works well.
>>
>> On Thu, Sep 15, 2016 at 11:10 AM DuyHai Doan 
>> wrote:
>>
>>> "But I need rows together to work with them (indexing etc)"
>>>
>>> What do you mean rows together ? You mean that you want to fetch a
>>> single row instead of 1 row per property right ?
>>>
>>> In this case, the map might be the solution:
>>>
>>> CREATE TABLE generic_with_maps(
>>>object_id uuid
>>>boolean_map map
>>>text_map map
>>>long_map map,
>>>...
>>>PRIMARY KEY(object_id)
>>> );
>>>
>>> The trick here is to store all the fields of the object in different
>>> map, depending on the type of the field.
>>>
>>> The map key is always text and it contains the name of the field.
>>>
>>> Example
>>>
>>> {
>>>"id": ,
>>> "name": "John DOE",
>>> "age":  32,
>>> "last_visited_date":  "2016-09-10 12:01:03",
>>> }
>>>
>>> INSERT INTO generic_with_maps(id, map_text, map_long, map_date)
>>> VALUES(xxx, {'name': 'John DOE'}, {'age': 32}, {'last_visited_date': 
>>> '2016-09-10
>>> 12:01:03'});
>>>
>>> When you do a select, you'll get a SINGLE row returned. But then you
>>> need to extract all the properties from different maps, not a big deal
>>>
>>> On Thu, Sep 15, 2016 at 7:54 PM, Dorian Hoxha 
>>> wrote:
>>>
 @DuyHai
 Yes, that's another case, the "entity" model used in rdbms. But I need
 rows together to work with them (indexing etc).

 @sfespace
 The map is needed when you have a dynamic schema. I don't have a
 dynamic schema (may have, and will use the map if I do). I just have
 thousands of schemas. One user needs 10 integers, while another user needs
 20 booleans, and another needs 30 integers, or a combination of them all.

 On Thu, Sep 15, 2016 at 7:46 PM, DuyHai Doan 
 wrote:

> "Another possible alternative is to use a single map column"
>
> --> how do you manage the different types then ? Because maps in
> Cassandra are strongly typed
>
> Unless you set the type of map value to blob, in this case you might
> as well store all the object as a single blob column
>
> On Thu, Sep 15, 2016 at 6:13 PM, sfesc...@gmail.com <
> sfesc...@gmail.com> wrote:
>
>> Another possible alternative is to use a single map column.
>>
>>
>> On Thu, Sep 15, 2016 at 7:19 AM Dorian Hoxha 
>> wrote:
>>
>>> Since I will only have 1 table with that many columns, and the other
>>> tables will be "normal" tables with max 30 columns, and the memory of 2K
>>> columns won't be that big, I'm gonna guess I'll be fine.
>>>
>>> The data model is too dynamic, the alternative would be to create a
>>> table for each user which will have even more overhead since the number 
>>> of
>>> users is in the several thousands/millions.
>>>
>>>
>>> On Thu, Sep 15, 2016 at 3:04 PM, DuyHai Doan 
>>> wrote:
>>>
 There is no real limit in term of number of columns in a table, I
 would say that the impact of having a lot of columns is the amount of 
 meta
 data C* needs to keep in memory for encoding/decoding each row.

 Now, if you have a table with 1000+ columns, the problem is
 probably your data model...

 On Thu, Sep 15, 2016 at 2:59 PM, Dorian Hoxha <
 dorian.ho...@gmail.com> wrote:

> Is there alot of overhead with having a big number of columns in a
> table ? Not unbounded, but say, would 2000 be a problem(I think 
> that's the
> maximum I'll need) ?
>
> Thank You
>


>>>
>

>>>
>


Re: Maximum number of columns in a table

2016-09-15 Thread Dorian Hoxha
Yes that makes more sense. But the problem is I can't use secondary
indexing "where int25=5", while with normal columns I can.

On Thu, Sep 15, 2016 at 8:23 PM, sfesc...@gmail.com 
wrote:

> I agree a single blob would also work (I do that in some cases). The
> reason for the map is if you need more flexible updating. I think your
> solution of a map/data type works well.
>
> On Thu, Sep 15, 2016 at 11:10 AM DuyHai Doan  wrote:
>
>> "But I need rows together to work with them (indexing etc)"
>>
>> What do you mean rows together ? You mean that you want to fetch a single
>> row instead of 1 row per property right ?
>>
>> In this case, the map might be the solution:
>>
>> CREATE TABLE generic_with_maps(
>>object_id uuid
>>boolean_map map
>>text_map map
>>long_map map,
>>...
>>PRIMARY KEY(object_id)
>> );
>>
>> The trick here is to store all the fields of the object in different map,
>> depending on the type of the field.
>>
>> The map key is always text and it contains the name of the field.
>>
>> Example
>>
>> {
>>"id": ,
>> "name": "John DOE",
>> "age":  32,
>> "last_visited_date":  "2016-09-10 12:01:03",
>> }
>>
>> INSERT INTO generic_with_maps(id, map_text, map_long, map_date)
>> VALUES(xxx, {'name': 'John DOE'}, {'age': 32}, {'last_visited_date': 
>> '2016-09-10
>> 12:01:03'});
>>
>> When you do a select, you'll get a SINGLE row returned. But then you need
>> to extract all the properties from different maps, not a big deal
>>
>> On Thu, Sep 15, 2016 at 7:54 PM, Dorian Hoxha 
>> wrote:
>>
>>> @DuyHai
>>> Yes, that's another case, the "entity" model used in rdbms. But I need
>>> rows together to work with them (indexing etc).
>>>
>>> @sfespace
>>> The map is needed when you have a dynamic schema. I don't have a dynamic
>>> schema (may have, and will use the map if I do). I just have thousands of
>>> schemas. One user needs 10 integers, while another user needs 20 booleans,
>>> and another needs 30 integers, or a combination of them all.
>>>
>>> On Thu, Sep 15, 2016 at 7:46 PM, DuyHai Doan 
>>> wrote:
>>>
 "Another possible alternative is to use a single map column"

 --> how do you manage the different types then ? Because maps in
 Cassandra are strongly typed

 Unless you set the type of map value to blob, in this case you might as
 well store all the object as a single blob column

 On Thu, Sep 15, 2016 at 6:13 PM, sfesc...@gmail.com >>> > wrote:

> Another possible alternative is to use a single map column.
>
>
> On Thu, Sep 15, 2016 at 7:19 AM Dorian Hoxha 
> wrote:
>
>> Since I will only have 1 table with that many columns, and the other
>> tables will be "normal" tables with max 30 columns, and the memory of 2K
>> columns won't be that big, I'm gonna guess I'll be fine.
>>
>> The data model is too dynamic, the alternative would be to create a
>> table for each user which will have even more overhead since the number 
>> of
>> users is in the several thousands/millions.
>>
>>
>> On Thu, Sep 15, 2016 at 3:04 PM, DuyHai Doan 
>> wrote:
>>
>>> There is no real limit in term of number of columns in a table, I
>>> would say that the impact of having a lot of columns is the amount of 
>>> meta
>>> data C* needs to keep in memory for encoding/decoding each row.
>>>
>>> Now, if you have a table with 1000+ columns, the problem is probably
>>> your data model...
>>>
>>> On Thu, Sep 15, 2016 at 2:59 PM, Dorian Hoxha <
>>> dorian.ho...@gmail.com> wrote:
>>>
 Is there alot of overhead with having a big number of columns in a
 table ? Not unbounded, but say, would 2000 be a problem(I think that's 
 the
 maximum I'll need) ?

 Thank You

>>>
>>>
>>

>>>
>>


Re: Maximum number of columns in a table

2016-09-15 Thread sfesc...@gmail.com
I agree a single blob would also work (I do that in some cases). The reason
for the map is if you need more flexible updating. I think your solution of
a map/data type works well.

On Thu, Sep 15, 2016 at 11:10 AM DuyHai Doan  wrote:

> "But I need rows together to work with them (indexing etc)"
>
> What do you mean rows together ? You mean that you want to fetch a single
> row instead of 1 row per property right ?
>
> In this case, the map might be the solution:
>
> CREATE TABLE generic_with_maps(
>object_id uuid
>boolean_map map
>text_map map
>long_map map,
>...
>PRIMARY KEY(object_id)
> );
>
> The trick here is to store all the fields of the object in different map,
> depending on the type of the field.
>
> The map key is always text and it contains the name of the field.
>
> Example
>
> {
>"id": ,
> "name": "John DOE",
> "age":  32,
> "last_visited_date":  "2016-09-10 12:01:03",
> }
>
> INSERT INTO generic_with_maps(id, map_text, map_long, map_date)
> VALUES(xxx, {'name': 'John DOE'}, {'age': 32}, {'last_visited_date': 
> '2016-09-10
> 12:01:03'});
>
> When you do a select, you'll get a SINGLE row returned. But then you need
> to extract all the properties from different maps, not a big deal
>
> On Thu, Sep 15, 2016 at 7:54 PM, Dorian Hoxha 
> wrote:
>
>> @DuyHai
>> Yes, that's another case, the "entity" model used in rdbms. But I need
>> rows together to work with them (indexing etc).
>>
>> @sfespace
>> The map is needed when you have a dynamic schema. I don't have a dynamic
>> schema (may have, and will use the map if I do). I just have thousands of
>> schemas. One user needs 10 integers, while another user needs 20 booleans,
>> and another needs 30 integers, or a combination of them all.
>>
>> On Thu, Sep 15, 2016 at 7:46 PM, DuyHai Doan 
>> wrote:
>>
>>> "Another possible alternative is to use a single map column"
>>>
>>> --> how do you manage the different types then ? Because maps in
>>> Cassandra are strongly typed
>>>
>>> Unless you set the type of map value to blob, in this case you might as
>>> well store all the object as a single blob column
>>>
>>> On Thu, Sep 15, 2016 at 6:13 PM, sfesc...@gmail.com 
>>> wrote:
>>>
 Another possible alternative is to use a single map column.


 On Thu, Sep 15, 2016 at 7:19 AM Dorian Hoxha 
 wrote:

> Since I will only have 1 table with that many columns, and the other
> tables will be "normal" tables with max 30 columns, and the memory of 2K
> columns won't be that big, I'm gonna guess I'll be fine.
>
> The data model is too dynamic, the alternative would be to create a
> table for each user which will have even more overhead since the number of
> users is in the several thousands/millions.
>
>
> On Thu, Sep 15, 2016 at 3:04 PM, DuyHai Doan 
> wrote:
>
>> There is no real limit in term of number of columns in a table, I
>> would say that the impact of having a lot of columns is the amount of 
>> meta
>> data C* needs to keep in memory for encoding/decoding each row.
>>
>> Now, if you have a table with 1000+ columns, the problem is probably
>> your data model...
>>
>> On Thu, Sep 15, 2016 at 2:59 PM, Dorian Hoxha > > wrote:
>>
>>> Is there alot of overhead with having a big number of columns in a
>>> table ? Not unbounded, but say, would 2000 be a problem(I think that's 
>>> the
>>> maximum I'll need) ?
>>>
>>> Thank You
>>>
>>
>>
>
>>>
>>
>


Re: Maximum number of columns in a table

2016-09-15 Thread DuyHai Doan
"But I need rows together to work with them (indexing etc)"

What do you mean rows together ? You mean that you want to fetch a single
row instead of 1 row per property right ?

In this case, the map might be the solution:

CREATE TABLE generic_with_maps(
   object_id uuid
   boolean_map map
   text_map map
   long_map map,
   ...
   PRIMARY KEY(object_id)
);

The trick here is to store all the fields of the object in different map,
depending on the type of the field.

The map key is always text and it contains the name of the field.

Example

{
   "id": ,
"name": "John DOE",
"age":  32,
"last_visited_date":  "2016-09-10 12:01:03",
}

INSERT INTO generic_with_maps(id, map_text, map_long, map_date)
VALUES(xxx, {'name': 'John DOE'}, {'age': 32}, {'last_visited_date':
'2016-09-10
12:01:03'});

When you do a select, you'll get a SINGLE row returned. But then you need
to extract all the properties from different maps, not a big deal

On Thu, Sep 15, 2016 at 7:54 PM, Dorian Hoxha 
wrote:

> @DuyHai
> Yes, that's another case, the "entity" model used in rdbms. But I need
> rows together to work with them (indexing etc).
>
> @sfespace
> The map is needed when you have a dynamic schema. I don't have a dynamic
> schema (may have, and will use the map if I do). I just have thousands of
> schemas. One user needs 10 integers, while another user needs 20 booleans,
> and another needs 30 integers, or a combination of them all.
>
> On Thu, Sep 15, 2016 at 7:46 PM, DuyHai Doan  wrote:
>
>> "Another possible alternative is to use a single map column"
>>
>> --> how do you manage the different types then ? Because maps in
>> Cassandra are strongly typed
>>
>> Unless you set the type of map value to blob, in this case you might as
>> well store all the object as a single blob column
>>
>> On Thu, Sep 15, 2016 at 6:13 PM, sfesc...@gmail.com 
>> wrote:
>>
>>> Another possible alternative is to use a single map column.
>>>
>>>
>>> On Thu, Sep 15, 2016 at 7:19 AM Dorian Hoxha 
>>> wrote:
>>>
 Since I will only have 1 table with that many columns, and the other
 tables will be "normal" tables with max 30 columns, and the memory of 2K
 columns won't be that big, I'm gonna guess I'll be fine.

 The data model is too dynamic, the alternative would be to create a
 table for each user which will have even more overhead since the number of
 users is in the several thousands/millions.


 On Thu, Sep 15, 2016 at 3:04 PM, DuyHai Doan 
 wrote:

> There is no real limit in term of number of columns in a table, I
> would say that the impact of having a lot of columns is the amount of meta
> data C* needs to keep in memory for encoding/decoding each row.
>
> Now, if you have a table with 1000+ columns, the problem is probably
> your data model...
>
> On Thu, Sep 15, 2016 at 2:59 PM, Dorian Hoxha 
> wrote:
>
>> Is there alot of overhead with having a big number of columns in a
>> table ? Not unbounded, but say, would 2000 be a problem(I think that's 
>> the
>> maximum I'll need) ?
>>
>> Thank You
>>
>
>

>>
>


Re: Maximum number of columns in a table

2016-09-15 Thread Dorian Hoxha
@DuyHai
Yes, that's another case, the "entity" model used in rdbms. But I need rows
together to work with them (indexing etc).

@sfespace
The map is needed when you have a dynamic schema. I don't have a dynamic
schema (may have, and will use the map if I do). I just have thousands of
schemas. One user needs 10 integers, while another user needs 20 booleans,
and another needs 30 integers, or a combination of them all.

On Thu, Sep 15, 2016 at 7:46 PM, DuyHai Doan  wrote:

> "Another possible alternative is to use a single map column"
>
> --> how do you manage the different types then ? Because maps in Cassandra
> are strongly typed
>
> Unless you set the type of map value to blob, in this case you might as
> well store all the object as a single blob column
>
> On Thu, Sep 15, 2016 at 6:13 PM, sfesc...@gmail.com 
> wrote:
>
>> Another possible alternative is to use a single map column.
>>
>>
>> On Thu, Sep 15, 2016 at 7:19 AM Dorian Hoxha 
>> wrote:
>>
>>> Since I will only have 1 table with that many columns, and the other
>>> tables will be "normal" tables with max 30 columns, and the memory of 2K
>>> columns won't be that big, I'm gonna guess I'll be fine.
>>>
>>> The data model is too dynamic, the alternative would be to create a
>>> table for each user which will have even more overhead since the number of
>>> users is in the several thousands/millions.
>>>
>>>
>>> On Thu, Sep 15, 2016 at 3:04 PM, DuyHai Doan 
>>> wrote:
>>>
 There is no real limit in term of number of columns in a table, I would
 say that the impact of having a lot of columns is the amount of meta data
 C* needs to keep in memory for encoding/decoding each row.

 Now, if you have a table with 1000+ columns, the problem is probably
 your data model...

 On Thu, Sep 15, 2016 at 2:59 PM, Dorian Hoxha 
 wrote:

> Is there alot of overhead with having a big number of columns in a
> table ? Not unbounded, but say, would 2000 be a problem(I think that's the
> maximum I'll need) ?
>
> Thank You
>


>>>
>


Re: Maximum number of columns in a table

2016-09-15 Thread DuyHai Doan
"Another possible alternative is to use a single map column"

--> how do you manage the different types then ? Because maps in Cassandra
are strongly typed

Unless you set the type of map value to blob, in this case you might as
well store all the object as a single blob column

On Thu, Sep 15, 2016 at 6:13 PM, sfesc...@gmail.com 
wrote:

> Another possible alternative is to use a single map column.
>
>
> On Thu, Sep 15, 2016 at 7:19 AM Dorian Hoxha 
> wrote:
>
>> Since I will only have 1 table with that many columns, and the other
>> tables will be "normal" tables with max 30 columns, and the memory of 2K
>> columns won't be that big, I'm gonna guess I'll be fine.
>>
>> The data model is too dynamic, the alternative would be to create a table
>> for each user which will have even more overhead since the number of users
>> is in the several thousands/millions.
>>
>>
>> On Thu, Sep 15, 2016 at 3:04 PM, DuyHai Doan 
>> wrote:
>>
>>> There is no real limit in term of number of columns in a table, I would
>>> say that the impact of having a lot of columns is the amount of meta data
>>> C* needs to keep in memory for encoding/decoding each row.
>>>
>>> Now, if you have a table with 1000+ columns, the problem is probably
>>> your data model...
>>>
>>> On Thu, Sep 15, 2016 at 2:59 PM, Dorian Hoxha 
>>> wrote:
>>>
 Is there alot of overhead with having a big number of columns in a
 table ? Not unbounded, but say, would 2000 be a problem(I think that's the
 maximum I'll need) ?

 Thank You

>>>
>>>
>>


Re: Maximum number of columns in a table

2016-09-15 Thread sfesc...@gmail.com
Another possible alternative is to use a single map column.

On Thu, Sep 15, 2016 at 7:19 AM Dorian Hoxha  wrote:

> Since I will only have 1 table with that many columns, and the other
> tables will be "normal" tables with max 30 columns, and the memory of 2K
> columns won't be that big, I'm gonna guess I'll be fine.
>
> The data model is too dynamic, the alternative would be to create a table
> for each user which will have even more overhead since the number of users
> is in the several thousands/millions.
>
>
> On Thu, Sep 15, 2016 at 3:04 PM, DuyHai Doan  wrote:
>
>> There is no real limit in term of number of columns in a table, I would
>> say that the impact of having a lot of columns is the amount of meta data
>> C* needs to keep in memory for encoding/decoding each row.
>>
>> Now, if you have a table with 1000+ columns, the problem is probably your
>> data model...
>>
>> On Thu, Sep 15, 2016 at 2:59 PM, Dorian Hoxha 
>> wrote:
>>
>>> Is there alot of overhead with having a big number of columns in a table
>>> ? Not unbounded, but say, would 2000 be a problem(I think that's the
>>> maximum I'll need) ?
>>>
>>> Thank You
>>>
>>
>>
>


Re: Maximum number of columns in a table

2016-09-15 Thread DuyHai Doan
"The data model is too dynamic"

--> then create a table to cope with dynamic data types. Example

CREATE TABLE dynamic_data(
 object_id uuid,
 property_name text,
 property_type text,
 bool_value boolean,
 long_value bigint,
 decimal_value double,
 text_value text,
 date_value timestamp,
 uuid_value uuid,
 ...,
PRIMARY KEY ((object_id), property_name)
);

Consider the following object in JSON format:

{
   "id": ,
"name": "John DOE",
"age":  32,
"last_visited_date":  "2016-09-10 12:01:03",
}

It would result into

BEGIN UNLOGGED BATCH
INSERT INTO dynamic_data(object_id, property_name, property_type,
text_value) VALUES(xxx, 'name', 'John DOE');
INSERT INTO dynamic_data(object_id, property_name, property_type,
long_value) VALUES(xxx, 'age', 32);
INSERT INTO dynamic_data(object_id, property_name, property_type,
date_value) VALUES(xxx, 'last_visited_date', '2016-09-10 12:01:03');
APPLY BATCH;

You can use safely unlogged batch because the partition key is the same for
all rows so C* is clever enough to coalesce all the inserts into a single
mutation. There will be no overhead because of the batch.

To fetch all values of the object: SELECT * FROM dynamic_data WHERE
object_id = xxx LIMIT 1000;

To delete the whole object, use delete by partition key: DELETE FROM
dynamic_date WHERE object_id = xxx;

To delete a single property, provide also the property name: DELETE FROM
dynamic_date WHERE object_id = xxx AND property_name = 'last_visited_date';

To add a new property to an existing object, just insert:  INSERT INTO
dynamic_data(object_id,
property_name, property_type, bool_value) VALUES(xxx, 'is_married', false);

The only drawback of this data model is that it is abstract e.g. by just
looking at the schema you cannot really tell what kind of data is contains,
but it is precisely what you look for ...


On Thu, Sep 15, 2016 at 4:19 PM, Dorian Hoxha 
wrote:

> Since I will only have 1 table with that many columns, and the other
> tables will be "normal" tables with max 30 columns, and the memory of 2K
> columns won't be that big, I'm gonna guess I'll be fine.
>
> The data model is too dynamic, the alternative would be to create a table
> for each user which will have even more overhead since the number of users
> is in the several thousands/millions.
>
>
> On Thu, Sep 15, 2016 at 3:04 PM, DuyHai Doan  wrote:
>
>> There is no real limit in term of number of columns in a table, I would
>> say that the impact of having a lot of columns is the amount of meta data
>> C* needs to keep in memory for encoding/decoding each row.
>>
>> Now, if you have a table with 1000+ columns, the problem is probably your
>> data model...
>>
>> On Thu, Sep 15, 2016 at 2:59 PM, Dorian Hoxha 
>> wrote:
>>
>>> Is there alot of overhead with having a big number of columns in a table
>>> ? Not unbounded, but say, would 2000 be a problem(I think that's the
>>> maximum I'll need) ?
>>>
>>> Thank You
>>>
>>
>>
>


Re: Maximum number of columns in a table

2016-09-15 Thread Dorian Hoxha
Since I will only have 1 table with that many columns, and the other tables
will be "normal" tables with max 30 columns, and the memory of 2K columns
won't be that big, I'm gonna guess I'll be fine.

The data model is too dynamic, the alternative would be to create a table
for each user which will have even more overhead since the number of users
is in the several thousands/millions.

On Thu, Sep 15, 2016 at 3:04 PM, DuyHai Doan  wrote:

> There is no real limit in term of number of columns in a table, I would
> say that the impact of having a lot of columns is the amount of meta data
> C* needs to keep in memory for encoding/decoding each row.
>
> Now, if you have a table with 1000+ columns, the problem is probably your
> data model...
>
> On Thu, Sep 15, 2016 at 2:59 PM, Dorian Hoxha 
> wrote:
>
>> Is there alot of overhead with having a big number of columns in a table
>> ? Not unbounded, but say, would 2000 be a problem(I think that's the
>> maximum I'll need) ?
>>
>> Thank You
>>
>
>


Re: Maximum number of columns in a table

2016-09-15 Thread DuyHai Doan
There is no real limit in term of number of columns in a table, I would say
that the impact of having a lot of columns is the amount of meta data C*
needs to keep in memory for encoding/decoding each row.

Now, if you have a table with 1000+ columns, the problem is probably your
data model...

On Thu, Sep 15, 2016 at 2:59 PM, Dorian Hoxha 
wrote:

> Is there alot of overhead with having a big number of columns in a table ?
> Not unbounded, but say, would 2000 be a problem(I think that's the maximum
> I'll need) ?
>
> Thank You
>


Maximum number of columns in a table

2016-09-15 Thread Dorian Hoxha
Is there alot of overhead with having a big number of columns in a table ?
Not unbounded, but say, would 2000 be a problem(I think that's the maximum
I'll need) ?

Thank You