Re: Is to ok restart DECOMMISION

2016-09-15 Thread Mark Rose
I've done that several times. Kill the process, restart it, let it
sync, decommission.

You'll need enough space on the receiving nodes for the full set of
data, on top of the other data that was already sent earlier, plus
room to cleanup/compact it.

Before you kill, check system.log to see if it died on anything. If
so, the decommission process will never finish. If not, let it
continue. Of particular note is that by default transferring large
sstables will timeout. You can fix that by adjusting
streaming_socket_timeout_in_ms to a sufficiently large value (I set it
to a day).

-Mark

On Thu, Sep 15, 2016 at 9:28 AM, laxmikanth sadula
 wrote:
> I started decommssioned a node in our cassandra cluster.
> But its taking too long time (more than 12 hrs) , so I would like to
> restart(stop/kill the node & restart 'node decommission' again)..
>
> Does killing node/stopping decommission and restarting decommission will
> cause any issues to cluster?
>
> Using c*-2.0.17 , 2 Data centers, each DC with 3 groups each , each group
> with 3 nodes with RF-3
>
> --
> Thanks...!


Re: Maximum number of columns in a table

2016-09-15 Thread Dorian Hoxha
@DuyHai
I know they don't support.
I need key+value mapping, not just "values" or just "keys".

I'll use the lucene index.



On Thu, Sep 15, 2016 at 10:23 PM, DuyHai Doan  wrote:

> I'd advise anyone against using the old native secondary index ... You'll
> get poor performance (that's the main reason why some people developed
> SASI).
>
> On Thu, Sep 15, 2016 at 10:20 PM, Hannu Kröger  wrote:
>
>> Hi,
>>
>> The ‘old-fashioned’ secondary indexes do support index of collection
>> values:
>> https://docs.datastax.com/en/cql/3.1/cql/ddl/ddlIndexColl.html
>>
>> Br,
>> Hannu
>>
>> On 15 Sep 2016, at 15:59, DuyHai Doan  wrote:
>>
>> "But the problem is I can't use secondary indexing "where int25=5", while
>> with normal columns I can."
>>
>> You have many objectives that contradict themselves in term of impl.
>>
>> Right now you're unlucky, SASI does not support indexing collections yet
>> (it may come in future, when ?  ¯\_(ツ)_/¯ )
>>
>> If you're using DSE Search or Stratio Lucene Index, you can index map
>> values
>>
>> On Thu, Sep 15, 2016 at 9:53 PM, Dorian Hoxha 
>> wrote:
>>
>>> Yes that makes more sense. But the problem is I can't use secondary
>>> indexing "where int25=5", while with normal columns I can.
>>>
>>> On Thu, Sep 15, 2016 at 8:23 PM, sfesc...@gmail.com 
>>> wrote:
>>>
 I agree a single blob would also work (I do that in some cases). The
 reason for the map is if you need more flexible updating. I think your
 solution of a map/data type works well.

 On Thu, Sep 15, 2016 at 11:10 AM DuyHai Doan 
 wrote:

> "But I need rows together to work with them (indexing etc)"
>
> What do you mean rows together ? You mean that you want to fetch a
> single row instead of 1 row per property right ?
>
> In this case, the map might be the solution:
>
> CREATE TABLE generic_with_maps(
>object_id uuid
>boolean_map map
>text_map map
>long_map map,
>...
>PRIMARY KEY(object_id)
> );
>
> The trick here is to store all the fields of the object in different
> map, depending on the type of the field.
>
> The map key is always text and it contains the name of the field.
>
> Example
>
> {
>"id": ,
> "name": "John DOE",
> "age":  32,
> "last_visited_date":  "2016-09-10 12:01:03",
> }
>
> INSERT INTO generic_with_maps(id, map_text, map_long, map_date)
> VALUES(xxx, {'name': 'John DOE'}, {'age': 32}, {'last_visited_date': 
> '2016-09-10
> 12:01:03'});
>
> When you do a select, you'll get a SINGLE row returned. But then you
> need to extract all the properties from different maps, not a big deal
>
> On Thu, Sep 15, 2016 at 7:54 PM, Dorian Hoxha 
> wrote:
>
>> @DuyHai
>> Yes, that's another case, the "entity" model used in rdbms. But I
>> need rows together to work with them (indexing etc).
>>
>> @sfespace
>> The map is needed when you have a dynamic schema. I don't have a
>> dynamic schema (may have, and will use the map if I do). I just have
>> thousands of schemas. One user needs 10 integers, while another user 
>> needs
>> 20 booleans, and another needs 30 integers, or a combination of them all.
>>
>> On Thu, Sep 15, 2016 at 7:46 PM, DuyHai Doan 
>> wrote:
>>
>>> "Another possible alternative is to use a single map column"
>>>
>>> --> how do you manage the different types then ? Because maps in
>>> Cassandra are strongly typed
>>>
>>> Unless you set the type of map value to blob, in this case you might
>>> as well store all the object as a single blob column
>>>
>>> On Thu, Sep 15, 2016 at 6:13 PM, sfesc...@gmail.com <
>>> sfesc...@gmail.com> wrote:
>>>
 Another possible alternative is to use a single map column.


 On Thu, Sep 15, 2016 at 7:19 AM Dorian Hoxha <
 dorian.ho...@gmail.com> wrote:

> Since I will only have 1 table with that many columns, and the
> other tables will be "normal" tables with max 30 columns, and the 
> memory of
> 2K columns won't be that big, I'm gonna guess I'll be fine.
>
> The data model is too dynamic, the alternative would be to create
> a table for each user which will have even more overhead since the 
> number
> of users is in the several thousands/millions.
>
>
> On Thu, Sep 15, 2016 at 3:04 PM, DuyHai Doan  > wrote:
>
>> There is no real limit in term of number of columns in a table, I
>> would say that the impact of having a lot of columns is the 

Re: Maximum number of columns in a table

2016-09-15 Thread Hannu Kröger
I do agree on that.

> On 15 Sep 2016, at 16:23, DuyHai Doan  wrote:
> 
> I'd advise anyone against using the old native secondary index ... You'll get 
> poor performance (that's the main reason why some people developed SASI).
> 
> On Thu, Sep 15, 2016 at 10:20 PM, Hannu Kröger  > wrote:
> Hi,
> 
> The ‘old-fashioned’ secondary indexes do support index of collection values:
> https://docs.datastax.com/en/cql/3.1/cql/ddl/ddlIndexColl.html 
> 
> 
> Br,
> Hannu
> 
>> On 15 Sep 2016, at 15:59, DuyHai Doan > > wrote:
>> 
>> "But the problem is I can't use secondary indexing "where int25=5", while 
>> with normal columns I can."
>> 
>> You have many objectives that contradict themselves in term of impl.
>> 
>> Right now you're unlucky, SASI does not support indexing collections yet (it 
>> may come in future, when ?  ¯\_(ツ)_/¯ )
>> 
>> If you're using DSE Search or Stratio Lucene Index, you can index map values 
>> 
>> On Thu, Sep 15, 2016 at 9:53 PM, Dorian Hoxha > > wrote:
>> Yes that makes more sense. But the problem is I can't use secondary indexing 
>> "where int25=5", while with normal columns I can.
>> 
>> On Thu, Sep 15, 2016 at 8:23 PM, sfesc...@gmail.com 
>>  > 
>> wrote:
>> I agree a single blob would also work (I do that in some cases). The reason 
>> for the map is if you need more flexible updating. I think your solution of 
>> a map/data type works well.
>> 
>> On Thu, Sep 15, 2016 at 11:10 AM DuyHai Doan > > wrote:
>> "But I need rows together to work with them (indexing etc)"
>> 
>> What do you mean rows together ? You mean that you want to fetch a single 
>> row instead of 1 row per property right ?
>> 
>> In this case, the map might be the solution:
>> 
>> CREATE TABLE generic_with_maps(
>>object_id uuid
>>boolean_map map
>>text_map map
>>long_map map,
>>...
>>PRIMARY KEY(object_id)
>> );
>> 
>> The trick here is to store all the fields of the object in different map, 
>> depending on the type of the field.
>> 
>> The map key is always text and it contains the name of the field.
>> 
>> Example
>> 
>> {
>>"id": ,
>> "name": "John DOE",
>> "age":  32,
>> "last_visited_date":  "2016-09-10 12:01:03", 
>> }
>> 
>> INSERT INTO generic_with_maps(id, map_text, map_long, map_date)
>> VALUES(xxx, {'name': 'John DOE'}, {'age': 32}, {'last_visited_date': 
>> '2016-09-10 12:01:03'});
>> 
>> When you do a select, you'll get a SINGLE row returned. But then you need to 
>> extract all the properties from different maps, not a big deal
>> 
>> On Thu, Sep 15, 2016 at 7:54 PM, Dorian Hoxha > > wrote:
>> @DuyHai
>> Yes, that's another case, the "entity" model used in rdbms. But I need rows 
>> together to work with them (indexing etc).
>> 
>> @sfespace
>> The map is needed when you have a dynamic schema. I don't have a dynamic 
>> schema (may have, and will use the map if I do). I just have thousands of 
>> schemas. One user needs 10 integers, while another user needs 20 booleans, 
>> and another needs 30 integers, or a combination of them all.
>> 
>> On Thu, Sep 15, 2016 at 7:46 PM, DuyHai Doan > > wrote:
>> "Another possible alternative is to use a single map column"
>> 
>> --> how do you manage the different types then ? Because maps in Cassandra 
>> are strongly typed
>> 
>> Unless you set the type of map value to blob, in this case you might as well 
>> store all the object as a single blob column
>> 
>> On Thu, Sep 15, 2016 at 6:13 PM, sfesc...@gmail.com 
>>  > 
>> wrote:
>> Another possible alternative is to use a single map column.
>> 
>> 
>> On Thu, Sep 15, 2016 at 7:19 AM Dorian Hoxha > > wrote:
>> Since I will only have 1 table with that many columns, and the other tables 
>> will be "normal" tables with max 30 columns, and the memory of 2K columns 
>> won't be that big, I'm gonna guess I'll be fine.
>> 
>> The data model is too dynamic, the alternative would be to create a table 
>> for each user which will have even more overhead since the number of users 
>> is in the several thousands/millions.
>> 
>> 
>> On Thu, Sep 15, 2016 at 3:04 PM, DuyHai Doan > > wrote:
>> There is no real limit in term of number of columns in a table, I would say 
>> that the impact of having a lot of columns is the amount of meta data C* 
>> needs to keep in memory for encoding/decoding 

Re: Maximum number of columns in a table

2016-09-15 Thread DuyHai Doan
I'd advise anyone against using the old native secondary index ... You'll
get poor performance (that's the main reason why some people developed
SASI).

On Thu, Sep 15, 2016 at 10:20 PM, Hannu Kröger  wrote:

> Hi,
>
> The ‘old-fashioned’ secondary indexes do support index of collection
> values:
> https://docs.datastax.com/en/cql/3.1/cql/ddl/ddlIndexColl.html
>
> Br,
> Hannu
>
> On 15 Sep 2016, at 15:59, DuyHai Doan  wrote:
>
> "But the problem is I can't use secondary indexing "where int25=5", while
> with normal columns I can."
>
> You have many objectives that contradict themselves in term of impl.
>
> Right now you're unlucky, SASI does not support indexing collections yet
> (it may come in future, when ?  ¯\_(ツ)_/¯ )
>
> If you're using DSE Search or Stratio Lucene Index, you can index map
> values
>
> On Thu, Sep 15, 2016 at 9:53 PM, Dorian Hoxha 
> wrote:
>
>> Yes that makes more sense. But the problem is I can't use secondary
>> indexing "where int25=5", while with normal columns I can.
>>
>> On Thu, Sep 15, 2016 at 8:23 PM, sfesc...@gmail.com 
>> wrote:
>>
>>> I agree a single blob would also work (I do that in some cases). The
>>> reason for the map is if you need more flexible updating. I think your
>>> solution of a map/data type works well.
>>>
>>> On Thu, Sep 15, 2016 at 11:10 AM DuyHai Doan 
>>> wrote:
>>>
 "But I need rows together to work with them (indexing etc)"

 What do you mean rows together ? You mean that you want to fetch a
 single row instead of 1 row per property right ?

 In this case, the map might be the solution:

 CREATE TABLE generic_with_maps(
object_id uuid
boolean_map map
text_map map
long_map map,
...
PRIMARY KEY(object_id)
 );

 The trick here is to store all the fields of the object in different
 map, depending on the type of the field.

 The map key is always text and it contains the name of the field.

 Example

 {
"id": ,
 "name": "John DOE",
 "age":  32,
 "last_visited_date":  "2016-09-10 12:01:03",
 }

 INSERT INTO generic_with_maps(id, map_text, map_long, map_date)
 VALUES(xxx, {'name': 'John DOE'}, {'age': 32}, {'last_visited_date': 
 '2016-09-10
 12:01:03'});

 When you do a select, you'll get a SINGLE row returned. But then you
 need to extract all the properties from different maps, not a big deal

 On Thu, Sep 15, 2016 at 7:54 PM, Dorian Hoxha 
 wrote:

> @DuyHai
> Yes, that's another case, the "entity" model used in rdbms. But I need
> rows together to work with them (indexing etc).
>
> @sfespace
> The map is needed when you have a dynamic schema. I don't have a
> dynamic schema (may have, and will use the map if I do). I just have
> thousands of schemas. One user needs 10 integers, while another user needs
> 20 booleans, and another needs 30 integers, or a combination of them all.
>
> On Thu, Sep 15, 2016 at 7:46 PM, DuyHai Doan 
> wrote:
>
>> "Another possible alternative is to use a single map column"
>>
>> --> how do you manage the different types then ? Because maps in
>> Cassandra are strongly typed
>>
>> Unless you set the type of map value to blob, in this case you might
>> as well store all the object as a single blob column
>>
>> On Thu, Sep 15, 2016 at 6:13 PM, sfesc...@gmail.com <
>> sfesc...@gmail.com> wrote:
>>
>>> Another possible alternative is to use a single map column.
>>>
>>>
>>> On Thu, Sep 15, 2016 at 7:19 AM Dorian Hoxha 
>>> wrote:
>>>
 Since I will only have 1 table with that many columns, and the
 other tables will be "normal" tables with max 30 columns, and the 
 memory of
 2K columns won't be that big, I'm gonna guess I'll be fine.

 The data model is too dynamic, the alternative would be to create a
 table for each user which will have even more overhead since the 
 number of
 users is in the several thousands/millions.


 On Thu, Sep 15, 2016 at 3:04 PM, DuyHai Doan 
 wrote:

> There is no real limit in term of number of columns in a table, I
> would say that the impact of having a lot of columns is the amount of 
> meta
> data C* needs to keep in memory for encoding/decoding each row.
>
> Now, if you have a table with 1000+ columns, the problem is
> probably your data model...
>
> On Thu, Sep 15, 2016 at 2:59 PM, Dorian Hoxha <
> dorian.ho...@gmail.com> wrote:
>
>> 

Re: Maximum number of columns in a table

2016-09-15 Thread Hannu Kröger
Hi,

The ‘old-fashioned’ secondary indexes do support index of collection values:
https://docs.datastax.com/en/cql/3.1/cql/ddl/ddlIndexColl.html 


Br,
Hannu

> On 15 Sep 2016, at 15:59, DuyHai Doan  wrote:
> 
> "But the problem is I can't use secondary indexing "where int25=5", while 
> with normal columns I can."
> 
> You have many objectives that contradict themselves in term of impl.
> 
> Right now you're unlucky, SASI does not support indexing collections yet (it 
> may come in future, when ?  ¯\_(ツ)_/¯ )
> 
> If you're using DSE Search or Stratio Lucene Index, you can index map values 
> 
> On Thu, Sep 15, 2016 at 9:53 PM, Dorian Hoxha  > wrote:
> Yes that makes more sense. But the problem is I can't use secondary indexing 
> "where int25=5", while with normal columns I can.
> 
> On Thu, Sep 15, 2016 at 8:23 PM, sfesc...@gmail.com 
>  > 
> wrote:
> I agree a single blob would also work (I do that in some cases). The reason 
> for the map is if you need more flexible updating. I think your solution of a 
> map/data type works well.
> 
> On Thu, Sep 15, 2016 at 11:10 AM DuyHai Doan  > wrote:
> "But I need rows together to work with them (indexing etc)"
> 
> What do you mean rows together ? You mean that you want to fetch a single row 
> instead of 1 row per property right ?
> 
> In this case, the map might be the solution:
> 
> CREATE TABLE generic_with_maps(
>object_id uuid
>boolean_map map
>text_map map
>long_map map,
>...
>PRIMARY KEY(object_id)
> );
> 
> The trick here is to store all the fields of the object in different map, 
> depending on the type of the field.
> 
> The map key is always text and it contains the name of the field.
> 
> Example
> 
> {
>"id": ,
> "name": "John DOE",
> "age":  32,
> "last_visited_date":  "2016-09-10 12:01:03", 
> }
> 
> INSERT INTO generic_with_maps(id, map_text, map_long, map_date)
> VALUES(xxx, {'name': 'John DOE'}, {'age': 32}, {'last_visited_date': 
> '2016-09-10 12:01:03'});
> 
> When you do a select, you'll get a SINGLE row returned. But then you need to 
> extract all the properties from different maps, not a big deal
> 
> On Thu, Sep 15, 2016 at 7:54 PM, Dorian Hoxha  > wrote:
> @DuyHai
> Yes, that's another case, the "entity" model used in rdbms. But I need rows 
> together to work with them (indexing etc).
> 
> @sfespace
> The map is needed when you have a dynamic schema. I don't have a dynamic 
> schema (may have, and will use the map if I do). I just have thousands of 
> schemas. One user needs 10 integers, while another user needs 20 booleans, 
> and another needs 30 integers, or a combination of them all.
> 
> On Thu, Sep 15, 2016 at 7:46 PM, DuyHai Doan  > wrote:
> "Another possible alternative is to use a single map column"
> 
> --> how do you manage the different types then ? Because maps in Cassandra 
> are strongly typed
> 
> Unless you set the type of map value to blob, in this case you might as well 
> store all the object as a single blob column
> 
> On Thu, Sep 15, 2016 at 6:13 PM, sfesc...@gmail.com 
>  > 
> wrote:
> Another possible alternative is to use a single map column.
> 
> 
> On Thu, Sep 15, 2016 at 7:19 AM Dorian Hoxha  > wrote:
> Since I will only have 1 table with that many columns, and the other tables 
> will be "normal" tables with max 30 columns, and the memory of 2K columns 
> won't be that big, I'm gonna guess I'll be fine.
> 
> The data model is too dynamic, the alternative would be to create a table for 
> each user which will have even more overhead since the number of users is in 
> the several thousands/millions.
> 
> 
> On Thu, Sep 15, 2016 at 3:04 PM, DuyHai Doan  > wrote:
> There is no real limit in term of number of columns in a table, I would say 
> that the impact of having a lot of columns is the amount of meta data C* 
> needs to keep in memory for encoding/decoding each row.
> 
> Now, if you have a table with 1000+ columns, the problem is probably your 
> data model...
> 
> On Thu, Sep 15, 2016 at 2:59 PM, Dorian Hoxha  > wrote:
> Is there alot of overhead with having a big number of columns in a table ? 
> Not unbounded, but say, would 2000 be a problem(I think that's the maximum 
> I'll need) ?
> 
> Thank You
> 
> 
> 
> 
> 
> 
> 



Re: CASSANDRA-5376: CQL IN clause on last key not working when schema includes set,list or map

2016-09-15 Thread Tyler Hobbs
That ticket was just to improve the error message.  From the comments on
the ticket:

"Unfortunately, handling collections is slightly harder than what
CASSANDRA-5230  aimed
for, because we can't do a name query. So this will have to wait for
CASSANDRA-4762 . In
the meantime, we should obviously not throw an assertion error so attaching
a patch to improve validation."

However, it seems like this would be possible to support in Cassandra 3.x.
We probably just need to remove the check and verify that it actually
works.  Can you open a new JIRA ticket for this?

On Thu, Sep 15, 2016 at 12:49 PM, Samba  wrote:

> any update on this issue?
>
> the quoted JIRA issue (CASSANDRA-5376) is resolved as fixed in 1.2.4 but
> it is still not possible (even in 3.7)  to use IN operator in queries that
> fetch collection columns.
>
> is the fix only to report better error message that this is not possible
> or was it fixed then but the issue resurfaced in regression?
>
> could you please confirm one way or the other?
>
> Thanks and Regards,
> Samba
>
>
> On Tue, Sep 6, 2016 at 6:34 PM, Samba  wrote:
>
>> Hi,
>>
>> "CASSANDRA-5376: CQL IN clause on last key not working when schema
>> includes set,list or map"
>>
>> is marked resolved in 1.2.4 but i still see the issue (not an Assertion
>> Error, but an query validation message)
>>
>> was the issue resolved only to report proper error message or was it
>> fixed to support retrieving collections when query contains IN clause of
>> partition/cluster (last) columns?
>>
>> If it was fixed properly to support retrieving collections with IN
>> clause, then is it a bug in 3.7 release that i get the same message?
>>
>> Could you please explain, if it not fixed as intended, if there are plans
>> to support this in future?
>>
>> Thanks & Regards,
>> Samba
>>
>
>


-- 
Tyler Hobbs
DataStax 


Re: Maximum number of columns in a table

2016-09-15 Thread DuyHai Doan
"But the problem is I can't use secondary indexing "where int25=5", while
with normal columns I can."

You have many objectives that contradict themselves in term of impl.

Right now you're unlucky, SASI does not support indexing collections yet
(it may come in future, when ?  ¯\_(ツ)_/¯ )

If you're using DSE Search or Stratio Lucene Index, you can index map
values

On Thu, Sep 15, 2016 at 9:53 PM, Dorian Hoxha 
wrote:

> Yes that makes more sense. But the problem is I can't use secondary
> indexing "where int25=5", while with normal columns I can.
>
> On Thu, Sep 15, 2016 at 8:23 PM, sfesc...@gmail.com 
> wrote:
>
>> I agree a single blob would also work (I do that in some cases). The
>> reason for the map is if you need more flexible updating. I think your
>> solution of a map/data type works well.
>>
>> On Thu, Sep 15, 2016 at 11:10 AM DuyHai Doan 
>> wrote:
>>
>>> "But I need rows together to work with them (indexing etc)"
>>>
>>> What do you mean rows together ? You mean that you want to fetch a
>>> single row instead of 1 row per property right ?
>>>
>>> In this case, the map might be the solution:
>>>
>>> CREATE TABLE generic_with_maps(
>>>object_id uuid
>>>boolean_map map
>>>text_map map
>>>long_map map,
>>>...
>>>PRIMARY KEY(object_id)
>>> );
>>>
>>> The trick here is to store all the fields of the object in different
>>> map, depending on the type of the field.
>>>
>>> The map key is always text and it contains the name of the field.
>>>
>>> Example
>>>
>>> {
>>>"id": ,
>>> "name": "John DOE",
>>> "age":  32,
>>> "last_visited_date":  "2016-09-10 12:01:03",
>>> }
>>>
>>> INSERT INTO generic_with_maps(id, map_text, map_long, map_date)
>>> VALUES(xxx, {'name': 'John DOE'}, {'age': 32}, {'last_visited_date': 
>>> '2016-09-10
>>> 12:01:03'});
>>>
>>> When you do a select, you'll get a SINGLE row returned. But then you
>>> need to extract all the properties from different maps, not a big deal
>>>
>>> On Thu, Sep 15, 2016 at 7:54 PM, Dorian Hoxha 
>>> wrote:
>>>
 @DuyHai
 Yes, that's another case, the "entity" model used in rdbms. But I need
 rows together to work with them (indexing etc).

 @sfespace
 The map is needed when you have a dynamic schema. I don't have a
 dynamic schema (may have, and will use the map if I do). I just have
 thousands of schemas. One user needs 10 integers, while another user needs
 20 booleans, and another needs 30 integers, or a combination of them all.

 On Thu, Sep 15, 2016 at 7:46 PM, DuyHai Doan 
 wrote:

> "Another possible alternative is to use a single map column"
>
> --> how do you manage the different types then ? Because maps in
> Cassandra are strongly typed
>
> Unless you set the type of map value to blob, in this case you might
> as well store all the object as a single blob column
>
> On Thu, Sep 15, 2016 at 6:13 PM, sfesc...@gmail.com <
> sfesc...@gmail.com> wrote:
>
>> Another possible alternative is to use a single map column.
>>
>>
>> On Thu, Sep 15, 2016 at 7:19 AM Dorian Hoxha 
>> wrote:
>>
>>> Since I will only have 1 table with that many columns, and the other
>>> tables will be "normal" tables with max 30 columns, and the memory of 2K
>>> columns won't be that big, I'm gonna guess I'll be fine.
>>>
>>> The data model is too dynamic, the alternative would be to create a
>>> table for each user which will have even more overhead since the number 
>>> of
>>> users is in the several thousands/millions.
>>>
>>>
>>> On Thu, Sep 15, 2016 at 3:04 PM, DuyHai Doan 
>>> wrote:
>>>
 There is no real limit in term of number of columns in a table, I
 would say that the impact of having a lot of columns is the amount of 
 meta
 data C* needs to keep in memory for encoding/decoding each row.

 Now, if you have a table with 1000+ columns, the problem is
 probably your data model...

 On Thu, Sep 15, 2016 at 2:59 PM, Dorian Hoxha <
 dorian.ho...@gmail.com> wrote:

> Is there alot of overhead with having a big number of columns in a
> table ? Not unbounded, but say, would 2000 be a problem(I think 
> that's the
> maximum I'll need) ?
>
> Thank You
>


>>>
>

>>>
>


Re: Maximum number of columns in a table

2016-09-15 Thread Dorian Hoxha
Yes that makes more sense. But the problem is I can't use secondary
indexing "where int25=5", while with normal columns I can.

On Thu, Sep 15, 2016 at 8:23 PM, sfesc...@gmail.com 
wrote:

> I agree a single blob would also work (I do that in some cases). The
> reason for the map is if you need more flexible updating. I think your
> solution of a map/data type works well.
>
> On Thu, Sep 15, 2016 at 11:10 AM DuyHai Doan  wrote:
>
>> "But I need rows together to work with them (indexing etc)"
>>
>> What do you mean rows together ? You mean that you want to fetch a single
>> row instead of 1 row per property right ?
>>
>> In this case, the map might be the solution:
>>
>> CREATE TABLE generic_with_maps(
>>object_id uuid
>>boolean_map map
>>text_map map
>>long_map map,
>>...
>>PRIMARY KEY(object_id)
>> );
>>
>> The trick here is to store all the fields of the object in different map,
>> depending on the type of the field.
>>
>> The map key is always text and it contains the name of the field.
>>
>> Example
>>
>> {
>>"id": ,
>> "name": "John DOE",
>> "age":  32,
>> "last_visited_date":  "2016-09-10 12:01:03",
>> }
>>
>> INSERT INTO generic_with_maps(id, map_text, map_long, map_date)
>> VALUES(xxx, {'name': 'John DOE'}, {'age': 32}, {'last_visited_date': 
>> '2016-09-10
>> 12:01:03'});
>>
>> When you do a select, you'll get a SINGLE row returned. But then you need
>> to extract all the properties from different maps, not a big deal
>>
>> On Thu, Sep 15, 2016 at 7:54 PM, Dorian Hoxha 
>> wrote:
>>
>>> @DuyHai
>>> Yes, that's another case, the "entity" model used in rdbms. But I need
>>> rows together to work with them (indexing etc).
>>>
>>> @sfespace
>>> The map is needed when you have a dynamic schema. I don't have a dynamic
>>> schema (may have, and will use the map if I do). I just have thousands of
>>> schemas. One user needs 10 integers, while another user needs 20 booleans,
>>> and another needs 30 integers, or a combination of them all.
>>>
>>> On Thu, Sep 15, 2016 at 7:46 PM, DuyHai Doan 
>>> wrote:
>>>
 "Another possible alternative is to use a single map column"

 --> how do you manage the different types then ? Because maps in
 Cassandra are strongly typed

 Unless you set the type of map value to blob, in this case you might as
 well store all the object as a single blob column

 On Thu, Sep 15, 2016 at 6:13 PM, sfesc...@gmail.com  wrote:

> Another possible alternative is to use a single map column.
>
>
> On Thu, Sep 15, 2016 at 7:19 AM Dorian Hoxha 
> wrote:
>
>> Since I will only have 1 table with that many columns, and the other
>> tables will be "normal" tables with max 30 columns, and the memory of 2K
>> columns won't be that big, I'm gonna guess I'll be fine.
>>
>> The data model is too dynamic, the alternative would be to create a
>> table for each user which will have even more overhead since the number 
>> of
>> users is in the several thousands/millions.
>>
>>
>> On Thu, Sep 15, 2016 at 3:04 PM, DuyHai Doan 
>> wrote:
>>
>>> There is no real limit in term of number of columns in a table, I
>>> would say that the impact of having a lot of columns is the amount of 
>>> meta
>>> data C* needs to keep in memory for encoding/decoding each row.
>>>
>>> Now, if you have a table with 1000+ columns, the problem is probably
>>> your data model...
>>>
>>> On Thu, Sep 15, 2016 at 2:59 PM, Dorian Hoxha <
>>> dorian.ho...@gmail.com> wrote:
>>>
 Is there alot of overhead with having a big number of columns in a
 table ? Not unbounded, but say, would 2000 be a problem(I think that's 
 the
 maximum I'll need) ?

 Thank You

>>>
>>>
>>

>>>
>>


Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-15 Thread DuyHai Doan
Ok I get around the issue about %w%a%

So this will be interpreter first by the CQL parser as LIKE CONTAINS with
searched term = w%a

And then things get complicated

1) if you're using NonTokeninzingAnalyzer or NoOpAnalyzer, everything is
fine, the % in 'w%a' is interpreted as simple literal and not wildcard
character

2) if you're using StandardAnalyzer, it's an entirely different story.
During the parsing of the search predicates by the query planer, the term
'w%a' is passed to the analyzer (StandardAnalyzer here):
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Operation.java#L303-L323

The StandardAnalyzer is tokenizing the search term so 'w%a' becomes 2
distinct token, 'w' OR 'a'. Why does it ignore the % ? Because according to
Unicode line breaking rule, % is a separator, read here:
http://www.unicode.org/Public/UNIDATA/LineBreak.txt

Nowhere in the source code we can see this, in fact you'll need to look
into the JFlex grammar file
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/analyzer/StandardTokenizerImpl.jflex
to see a reference to Unicode word breaking rules ...

So indeed when using StandardAnalyzer, any % character will be interpreter
as a separator so our LIKE '%w%a%' is indeed transformed into a LIKE '%w%'
OR LIKE '%a%' e.g all words containing 'w' OR 'a', irrespective of their
relative position to each other ...

Why is it an OR predicate and not an AND predicate ? The answer is a
comment in the source code here:
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Operation.java#L290-L295

I'll end by a famous sentence : "It is not a bug, it is a feature"  :D

On Thu, Sep 15, 2016 at 4:11 PM, DuyHai Doan  wrote:

> Currently SASI can only understand the % for the beginning (suffix) or
> ending (prefix) position.
>
> Any expression containing the % in the middle like %w%a% will not be
> interpreter by SASI as wildcard.
>
> %w%a% will translate into "Give me all results containing w%a
>
> On Thu, Sep 15, 2016 at 3:58 PM, Mikhail Krupitskiy <
> mikhail.krupits...@jetbrains.com> wrote:
>
>> Thank you for the investigation. Will wait for a fix and news.
>>
>> Probably it’s not a directly related question but what do you think about
>> CASSANDRA-12573? Let me know if it’s better to create a separate thread for
>> it.
>>
>> Thanks,
>> Mikhail
>>
>>
>> On 15 Sep 2016, at 16:02, DuyHai Doan  wrote:
>>
>> Ok so I've found the source of the issue, it's pretty well hidden because
>> it is NOT in the SASI source code directly.
>>
>> Here is the method where C* determines what kind of LIKE expression
>> you're using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
>>
>> https://github.com/apache/cassandra/blob/trunk/src/java/org/
>> apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778
>>
>> As you can see, it's pretty simple, maybe too simple. Indeed, they forget
>> to remove escape character BEFORE doing the matching so if your search is 
>> LIKE
>> '%%esc%', the detected expression is LIKE_CONTAINS.
>>
>> A possible fix would be:
>>
>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on
>> the column data type)
>> 2) remove the escape character e.g. before parsing OR use some advanced
>> regex to exclude the %% from parsing e.g
>>
>> Step 2) is dead easy but step 1) is harder because I don't know if
>> converting the bytebuffer into String at this stage of the CQL parser is
>> expensive or not (in term of computation)
>>
>> Let me try a patch
>>
>>
>>
>> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan 
>> wrote:
>>
>>> Ok you're right, I get your point
>>>
>>> LIKE '%%esc%' --> startWith('%esc')
>>>
>>> LIKE 'escape%%' -->  = 'escape%'
>>>
>>> What I strongly suspect is that in the source code of SASI, we parse the
>>> % xxx % expression BEFORE applying escape. That will explain the observed
>>> behavior. E.g:
>>>
>>> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
>>>
>>> LIKE 'escape%%' parsed as xxx% where xxx =escape%
>>>
>>> Let me check in the source code and try to reproduce the issue
>>>
>>>
>>>
>>> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy <
>>> mikhail.krupits...@jetbrains.com> wrote:
>>>
 Looks like we have different understanding of what results are expected.
 I based my understanding on http://docs.datastax.com/en
 /cql/3.3/cql/cql_using/useSASIIndex.html
 According to the doc ‘esc’ is a pattern for exact match and I guess
 that there is no semantical difference between two LIKE patterns (both of
 patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.

 SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
 *containing* '%esc' so *%esc*apeme is a possible match and also escape
 *%esc*

 Why ‘containing’? I expect that it should be ’starting’..


 SELECT * FROM escape 

Re: Maximum number of columns in a table

2016-09-15 Thread sfesc...@gmail.com
I agree a single blob would also work (I do that in some cases). The reason
for the map is if you need more flexible updating. I think your solution of
a map/data type works well.

On Thu, Sep 15, 2016 at 11:10 AM DuyHai Doan  wrote:

> "But I need rows together to work with them (indexing etc)"
>
> What do you mean rows together ? You mean that you want to fetch a single
> row instead of 1 row per property right ?
>
> In this case, the map might be the solution:
>
> CREATE TABLE generic_with_maps(
>object_id uuid
>boolean_map map
>text_map map
>long_map map,
>...
>PRIMARY KEY(object_id)
> );
>
> The trick here is to store all the fields of the object in different map,
> depending on the type of the field.
>
> The map key is always text and it contains the name of the field.
>
> Example
>
> {
>"id": ,
> "name": "John DOE",
> "age":  32,
> "last_visited_date":  "2016-09-10 12:01:03",
> }
>
> INSERT INTO generic_with_maps(id, map_text, map_long, map_date)
> VALUES(xxx, {'name': 'John DOE'}, {'age': 32}, {'last_visited_date': 
> '2016-09-10
> 12:01:03'});
>
> When you do a select, you'll get a SINGLE row returned. But then you need
> to extract all the properties from different maps, not a big deal
>
> On Thu, Sep 15, 2016 at 7:54 PM, Dorian Hoxha 
> wrote:
>
>> @DuyHai
>> Yes, that's another case, the "entity" model used in rdbms. But I need
>> rows together to work with them (indexing etc).
>>
>> @sfespace
>> The map is needed when you have a dynamic schema. I don't have a dynamic
>> schema (may have, and will use the map if I do). I just have thousands of
>> schemas. One user needs 10 integers, while another user needs 20 booleans,
>> and another needs 30 integers, or a combination of them all.
>>
>> On Thu, Sep 15, 2016 at 7:46 PM, DuyHai Doan 
>> wrote:
>>
>>> "Another possible alternative is to use a single map column"
>>>
>>> --> how do you manage the different types then ? Because maps in
>>> Cassandra are strongly typed
>>>
>>> Unless you set the type of map value to blob, in this case you might as
>>> well store all the object as a single blob column
>>>
>>> On Thu, Sep 15, 2016 at 6:13 PM, sfesc...@gmail.com 
>>> wrote:
>>>
 Another possible alternative is to use a single map column.


 On Thu, Sep 15, 2016 at 7:19 AM Dorian Hoxha 
 wrote:

> Since I will only have 1 table with that many columns, and the other
> tables will be "normal" tables with max 30 columns, and the memory of 2K
> columns won't be that big, I'm gonna guess I'll be fine.
>
> The data model is too dynamic, the alternative would be to create a
> table for each user which will have even more overhead since the number of
> users is in the several thousands/millions.
>
>
> On Thu, Sep 15, 2016 at 3:04 PM, DuyHai Doan 
> wrote:
>
>> There is no real limit in term of number of columns in a table, I
>> would say that the impact of having a lot of columns is the amount of 
>> meta
>> data C* needs to keep in memory for encoding/decoding each row.
>>
>> Now, if you have a table with 1000+ columns, the problem is probably
>> your data model...
>>
>> On Thu, Sep 15, 2016 at 2:59 PM, Dorian Hoxha > > wrote:
>>
>>> Is there alot of overhead with having a big number of columns in a
>>> table ? Not unbounded, but say, would 2000 be a problem(I think that's 
>>> the
>>> maximum I'll need) ?
>>>
>>> Thank You
>>>
>>
>>
>
>>>
>>
>


Re: Maximum number of columns in a table

2016-09-15 Thread DuyHai Doan
"But I need rows together to work with them (indexing etc)"

What do you mean rows together ? You mean that you want to fetch a single
row instead of 1 row per property right ?

In this case, the map might be the solution:

CREATE TABLE generic_with_maps(
   object_id uuid
   boolean_map map
   text_map map
   long_map map,
   ...
   PRIMARY KEY(object_id)
);

The trick here is to store all the fields of the object in different map,
depending on the type of the field.

The map key is always text and it contains the name of the field.

Example

{
   "id": ,
"name": "John DOE",
"age":  32,
"last_visited_date":  "2016-09-10 12:01:03",
}

INSERT INTO generic_with_maps(id, map_text, map_long, map_date)
VALUES(xxx, {'name': 'John DOE'}, {'age': 32}, {'last_visited_date':
'2016-09-10
12:01:03'});

When you do a select, you'll get a SINGLE row returned. But then you need
to extract all the properties from different maps, not a big deal

On Thu, Sep 15, 2016 at 7:54 PM, Dorian Hoxha 
wrote:

> @DuyHai
> Yes, that's another case, the "entity" model used in rdbms. But I need
> rows together to work with them (indexing etc).
>
> @sfespace
> The map is needed when you have a dynamic schema. I don't have a dynamic
> schema (may have, and will use the map if I do). I just have thousands of
> schemas. One user needs 10 integers, while another user needs 20 booleans,
> and another needs 30 integers, or a combination of them all.
>
> On Thu, Sep 15, 2016 at 7:46 PM, DuyHai Doan  wrote:
>
>> "Another possible alternative is to use a single map column"
>>
>> --> how do you manage the different types then ? Because maps in
>> Cassandra are strongly typed
>>
>> Unless you set the type of map value to blob, in this case you might as
>> well store all the object as a single blob column
>>
>> On Thu, Sep 15, 2016 at 6:13 PM, sfesc...@gmail.com 
>> wrote:
>>
>>> Another possible alternative is to use a single map column.
>>>
>>>
>>> On Thu, Sep 15, 2016 at 7:19 AM Dorian Hoxha 
>>> wrote:
>>>
 Since I will only have 1 table with that many columns, and the other
 tables will be "normal" tables with max 30 columns, and the memory of 2K
 columns won't be that big, I'm gonna guess I'll be fine.

 The data model is too dynamic, the alternative would be to create a
 table for each user which will have even more overhead since the number of
 users is in the several thousands/millions.


 On Thu, Sep 15, 2016 at 3:04 PM, DuyHai Doan 
 wrote:

> There is no real limit in term of number of columns in a table, I
> would say that the impact of having a lot of columns is the amount of meta
> data C* needs to keep in memory for encoding/decoding each row.
>
> Now, if you have a table with 1000+ columns, the problem is probably
> your data model...
>
> On Thu, Sep 15, 2016 at 2:59 PM, Dorian Hoxha 
> wrote:
>
>> Is there alot of overhead with having a big number of columns in a
>> table ? Not unbounded, but say, would 2000 be a problem(I think that's 
>> the
>> maximum I'll need) ?
>>
>> Thank You
>>
>
>

>>
>


Re: Maximum number of columns in a table

2016-09-15 Thread Dorian Hoxha
@DuyHai
Yes, that's another case, the "entity" model used in rdbms. But I need rows
together to work with them (indexing etc).

@sfespace
The map is needed when you have a dynamic schema. I don't have a dynamic
schema (may have, and will use the map if I do). I just have thousands of
schemas. One user needs 10 integers, while another user needs 20 booleans,
and another needs 30 integers, or a combination of them all.

On Thu, Sep 15, 2016 at 7:46 PM, DuyHai Doan  wrote:

> "Another possible alternative is to use a single map column"
>
> --> how do you manage the different types then ? Because maps in Cassandra
> are strongly typed
>
> Unless you set the type of map value to blob, in this case you might as
> well store all the object as a single blob column
>
> On Thu, Sep 15, 2016 at 6:13 PM, sfesc...@gmail.com 
> wrote:
>
>> Another possible alternative is to use a single map column.
>>
>>
>> On Thu, Sep 15, 2016 at 7:19 AM Dorian Hoxha 
>> wrote:
>>
>>> Since I will only have 1 table with that many columns, and the other
>>> tables will be "normal" tables with max 30 columns, and the memory of 2K
>>> columns won't be that big, I'm gonna guess I'll be fine.
>>>
>>> The data model is too dynamic, the alternative would be to create a
>>> table for each user which will have even more overhead since the number of
>>> users is in the several thousands/millions.
>>>
>>>
>>> On Thu, Sep 15, 2016 at 3:04 PM, DuyHai Doan 
>>> wrote:
>>>
 There is no real limit in term of number of columns in a table, I would
 say that the impact of having a lot of columns is the amount of meta data
 C* needs to keep in memory for encoding/decoding each row.

 Now, if you have a table with 1000+ columns, the problem is probably
 your data model...

 On Thu, Sep 15, 2016 at 2:59 PM, Dorian Hoxha 
 wrote:

> Is there alot of overhead with having a big number of columns in a
> table ? Not unbounded, but say, would 2000 be a problem(I think that's the
> maximum I'll need) ?
>
> Thank You
>


>>>
>


Re: CASSANDRA-5376: CQL IN clause on last key not working when schema includes set,list or map

2016-09-15 Thread Samba
any update on this issue?

the quoted JIRA issue (CASSANDRA-5376) is resolved as fixed in 1.2.4 but it
is still not possible (even in 3.7)  to use IN operator in queries that
fetch collection columns.

is the fix only to report better error message that this is not possible or
was it fixed then but the issue resurfaced in regression?

could you please confirm one way or the other?

Thanks and Regards,
Samba

On Tue, Sep 6, 2016 at 6:34 PM, Samba  wrote:

> Hi,
>
> "CASSANDRA-5376: CQL IN clause on last key not working when schema
> includes set,list or map"
>
> is marked resolved in 1.2.4 but i still see the issue (not an Assertion
> Error, but an query validation message)
>
> was the issue resolved only to report proper error message or was it fixed
> to support retrieving collections when query contains IN clause of
> partition/cluster (last) columns?
>
> If it was fixed properly to support retrieving collections with IN clause,
> then is it a bug in 3.7 release that i get the same message?
>
> Could you please explain, if it not fixed as intended, if there are plans
> to support this in future?
>
> Thanks & Regards,
> Samba
>


Re: Maximum number of columns in a table

2016-09-15 Thread DuyHai Doan
"Another possible alternative is to use a single map column"

--> how do you manage the different types then ? Because maps in Cassandra
are strongly typed

Unless you set the type of map value to blob, in this case you might as
well store all the object as a single blob column

On Thu, Sep 15, 2016 at 6:13 PM, sfesc...@gmail.com 
wrote:

> Another possible alternative is to use a single map column.
>
>
> On Thu, Sep 15, 2016 at 7:19 AM Dorian Hoxha 
> wrote:
>
>> Since I will only have 1 table with that many columns, and the other
>> tables will be "normal" tables with max 30 columns, and the memory of 2K
>> columns won't be that big, I'm gonna guess I'll be fine.
>>
>> The data model is too dynamic, the alternative would be to create a table
>> for each user which will have even more overhead since the number of users
>> is in the several thousands/millions.
>>
>>
>> On Thu, Sep 15, 2016 at 3:04 PM, DuyHai Doan 
>> wrote:
>>
>>> There is no real limit in term of number of columns in a table, I would
>>> say that the impact of having a lot of columns is the amount of meta data
>>> C* needs to keep in memory for encoding/decoding each row.
>>>
>>> Now, if you have a table with 1000+ columns, the problem is probably
>>> your data model...
>>>
>>> On Thu, Sep 15, 2016 at 2:59 PM, Dorian Hoxha 
>>> wrote:
>>>
 Is there alot of overhead with having a big number of columns in a
 table ? Not unbounded, but say, would 2000 be a problem(I think that's the
 maximum I'll need) ?

 Thank You

>>>
>>>
>>


Re: Maximum number of columns in a table

2016-09-15 Thread sfesc...@gmail.com
Another possible alternative is to use a single map column.

On Thu, Sep 15, 2016 at 7:19 AM Dorian Hoxha  wrote:

> Since I will only have 1 table with that many columns, and the other
> tables will be "normal" tables with max 30 columns, and the memory of 2K
> columns won't be that big, I'm gonna guess I'll be fine.
>
> The data model is too dynamic, the alternative would be to create a table
> for each user which will have even more overhead since the number of users
> is in the several thousands/millions.
>
>
> On Thu, Sep 15, 2016 at 3:04 PM, DuyHai Doan  wrote:
>
>> There is no real limit in term of number of columns in a table, I would
>> say that the impact of having a lot of columns is the amount of meta data
>> C* needs to keep in memory for encoding/decoding each row.
>>
>> Now, if you have a table with 1000+ columns, the problem is probably your
>> data model...
>>
>> On Thu, Sep 15, 2016 at 2:59 PM, Dorian Hoxha 
>> wrote:
>>
>>> Is there alot of overhead with having a big number of columns in a table
>>> ? Not unbounded, but say, would 2000 be a problem(I think that's the
>>> maximum I'll need) ?
>>>
>>> Thank You
>>>
>>
>>
>


Re: Maximum number of columns in a table

2016-09-15 Thread DuyHai Doan
"The data model is too dynamic"

--> then create a table to cope with dynamic data types. Example

CREATE TABLE dynamic_data(
 object_id uuid,
 property_name text,
 property_type text,
 bool_value boolean,
 long_value bigint,
 decimal_value double,
 text_value text,
 date_value timestamp,
 uuid_value uuid,
 ...,
PRIMARY KEY ((object_id), property_name)
);

Consider the following object in JSON format:

{
   "id": ,
"name": "John DOE",
"age":  32,
"last_visited_date":  "2016-09-10 12:01:03",
}

It would result into

BEGIN UNLOGGED BATCH
INSERT INTO dynamic_data(object_id, property_name, property_type,
text_value) VALUES(xxx, 'name', 'John DOE');
INSERT INTO dynamic_data(object_id, property_name, property_type,
long_value) VALUES(xxx, 'age', 32);
INSERT INTO dynamic_data(object_id, property_name, property_type,
date_value) VALUES(xxx, 'last_visited_date', '2016-09-10 12:01:03');
APPLY BATCH;

You can use safely unlogged batch because the partition key is the same for
all rows so C* is clever enough to coalesce all the inserts into a single
mutation. There will be no overhead because of the batch.

To fetch all values of the object: SELECT * FROM dynamic_data WHERE
object_id = xxx LIMIT 1000;

To delete the whole object, use delete by partition key: DELETE FROM
dynamic_date WHERE object_id = xxx;

To delete a single property, provide also the property name: DELETE FROM
dynamic_date WHERE object_id = xxx AND property_name = 'last_visited_date';

To add a new property to an existing object, just insert:  INSERT INTO
dynamic_data(object_id,
property_name, property_type, bool_value) VALUES(xxx, 'is_married', false);

The only drawback of this data model is that it is abstract e.g. by just
looking at the schema you cannot really tell what kind of data is contains,
but it is precisely what you look for ...


On Thu, Sep 15, 2016 at 4:19 PM, Dorian Hoxha 
wrote:

> Since I will only have 1 table with that many columns, and the other
> tables will be "normal" tables with max 30 columns, and the memory of 2K
> columns won't be that big, I'm gonna guess I'll be fine.
>
> The data model is too dynamic, the alternative would be to create a table
> for each user which will have even more overhead since the number of users
> is in the several thousands/millions.
>
>
> On Thu, Sep 15, 2016 at 3:04 PM, DuyHai Doan  wrote:
>
>> There is no real limit in term of number of columns in a table, I would
>> say that the impact of having a lot of columns is the amount of meta data
>> C* needs to keep in memory for encoding/decoding each row.
>>
>> Now, if you have a table with 1000+ columns, the problem is probably your
>> data model...
>>
>> On Thu, Sep 15, 2016 at 2:59 PM, Dorian Hoxha 
>> wrote:
>>
>>> Is there alot of overhead with having a big number of columns in a table
>>> ? Not unbounded, but say, would 2000 be a problem(I think that's the
>>> maximum I'll need) ?
>>>
>>> Thank You
>>>
>>
>>
>


Re: Is to ok restart DECOMMISION

2016-09-15 Thread sai krishnam raju potturi
hi Laxmi;
  what's the size of data per node? If the data is really huge, then let
the decommission process continue. Else; stop the cassandra process on the
decommissioning node, and from another node in the datacenter, do a
"nodetool removenode host-id". This might speed up the decommissioning
process since the streaming will be from 2 replicas rather than just one.
See if unthrottling the streamthroughput might help.

   Make sure there are no tcp sessions in hung state. If you any TCP
sessions in hung state, alter the tcp parameters.

sudo sysctl -w net.core.wmem_max = 16777216
sudo sysctl -w net.core.rmem_max = 16777216
sudo sysctl -w net.ipv4.tcp_window_scaling = 1
sudo sysctl -w net.ipv4.tcp_keepalive_time = 1800
sudo sysctl -w net.ipv4.tcp_keepalive_probes = 9
sudo sysctl -w net.ipv4.tcp_keepalive_intvl = 75


thanks

On Thu, Sep 15, 2016 at 9:28 AM, laxmikanth sadula 
wrote:

> I started decommssioned a node in our cassandra cluster.
> But its taking too long time (more than 12 hrs) , so I would like to
> restart(stop/kill the node & restart 'node decommission' again)..
>
> Does killing node/stopping decommission and restarting decommission will
> cause any issues to cluster?
>
> Using c*-2.0.17 , 2 Data centers, each DC with 3 groups each , each group
> with 3 nodes with RF-3
>
> --
> Thanks...!
>


Re: Is to ok restart DECOMMISION

2016-09-15 Thread Kaide Mu
As far as I know restarting decommission shouldn't cause any problem to
your cluster, but please note that decommission is not resumable in your
Cassandra version (Resumable support will be introduced in 3.10), thus by
restarting it you will restart the whole process.

On Thu, Sep 15, 2016, 3:29 PM laxmikanth sadula 
wrote:

> I started decommssioned a node in our cassandra cluster.
> But its taking too long time (more than 12 hrs) , so I would like to
> restart(stop/kill the node & restart 'node decommission' again)..
>
> Does killing node/stopping decommission and restarting decommission will
> cause any issues to cluster?
>
> Using c*-2.0.17 , 2 Data centers, each DC with 3 groups each , each group
> with 3 nodes with RF-3
>
>
> --
> Thanks...!
>


Re: Maximum number of columns in a table

2016-09-15 Thread Dorian Hoxha
Since I will only have 1 table with that many columns, and the other tables
will be "normal" tables with max 30 columns, and the memory of 2K columns
won't be that big, I'm gonna guess I'll be fine.

The data model is too dynamic, the alternative would be to create a table
for each user which will have even more overhead since the number of users
is in the several thousands/millions.

On Thu, Sep 15, 2016 at 3:04 PM, DuyHai Doan  wrote:

> There is no real limit in term of number of columns in a table, I would
> say that the impact of having a lot of columns is the amount of meta data
> C* needs to keep in memory for encoding/decoding each row.
>
> Now, if you have a table with 1000+ columns, the problem is probably your
> data model...
>
> On Thu, Sep 15, 2016 at 2:59 PM, Dorian Hoxha 
> wrote:
>
>> Is there alot of overhead with having a big number of columns in a table
>> ? Not unbounded, but say, would 2000 be a problem(I think that's the
>> maximum I'll need) ?
>>
>> Thank You
>>
>
>


Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-15 Thread DuyHai Doan
Currently SASI can only understand the % for the beginning (suffix) or
ending (prefix) position.

Any expression containing the % in the middle like %w%a% will not be
interpreter by SASI as wildcard.

%w%a% will translate into "Give me all results containing w%a

On Thu, Sep 15, 2016 at 3:58 PM, Mikhail Krupitskiy <
mikhail.krupits...@jetbrains.com> wrote:

> Thank you for the investigation. Will wait for a fix and news.
>
> Probably it’s not a directly related question but what do you think about
> CASSANDRA-12573? Let me know if it’s better to create a separate thread for
> it.
>
> Thanks,
> Mikhail
>
>
> On 15 Sep 2016, at 16:02, DuyHai Doan  wrote:
>
> Ok so I've found the source of the issue, it's pretty well hidden because
> it is NOT in the SASI source code directly.
>
> Here is the method where C* determines what kind of LIKE expression you're
> using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
>
> https://github.com/apache/cassandra/blob/trunk/src/java/
> org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#
> L733-L778
>
> As you can see, it's pretty simple, maybe too simple. Indeed, they forget
> to remove escape character BEFORE doing the matching so if your search is LIKE
> '%%esc%', the detected expression is LIKE_CONTAINS.
>
> A possible fix would be:
>
> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on
> the column data type)
> 2) remove the escape character e.g. before parsing OR use some advanced
> regex to exclude the %% from parsing e.g
>
> Step 2) is dead easy but step 1) is harder because I don't know if
> converting the bytebuffer into String at this stage of the CQL parser is
> expensive or not (in term of computation)
>
> Let me try a patch
>
>
>
> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan  wrote:
>
>> Ok you're right, I get your point
>>
>> LIKE '%%esc%' --> startWith('%esc')
>>
>> LIKE 'escape%%' -->  = 'escape%'
>>
>> What I strongly suspect is that in the source code of SASI, we parse the
>> % xxx % expression BEFORE applying escape. That will explain the observed
>> behavior. E.g:
>>
>> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
>>
>> LIKE 'escape%%' parsed as xxx% where xxx =escape%
>>
>> Let me check in the source code and try to reproduce the issue
>>
>>
>>
>> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy <
>> mikhail.krupits...@jetbrains.com> wrote:
>>
>>> Looks like we have different understanding of what results are expected.
>>> I based my understanding on http://docs.datastax.com/en
>>> /cql/3.3/cql/cql_using/useSASIIndex.html
>>> According to the doc ‘esc’ is a pattern for exact match and I guess that
>>> there is no semantical difference between two LIKE patterns (both of
>>> patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
>>>
>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
>>> *containing* '%esc' so *%esc*apeme is a possible match and also escape
>>> *%esc*
>>>
>>> Why ‘containing’? I expect that it should be ’starting’..
>>>
>>>
>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
>>> *starting* with 'escape%' so *escape%*me is a valid result and also
>>> *escape%*esc
>>>
>>> Why ’starting’? I expect that it should be ‘exact matching’.
>>>
>>> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it
>>> returns nothing (CASSANDRA-12573).
>>>
>>> What I’m missing?
>>>
>>> Thanks,
>>> Mikhail
>>>
>>> On 13 Sep 2016, at 19:31, DuyHai Doan  wrote:
>>>
>>> CREATE CUSTOM INDEX ON test.escape(val) USING '
>>> org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode':
>>> 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa
>>> si.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'};
>>>
>>> I don't see any problem in the results you got
>>>
>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
>>> *containing* '%esc' so *%esc*apeme is a possible match and also escape
>>> *%esc*
>>>
>>> Why ‘containing’? I expect that it should be ’starting’..
>>>
>>>
>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
>>> *starting* with 'escape%' so *escape%*me is a valid result and also
>>> *escape%*esc
>>>
>>> Why ’starting’? I expect that it should be ‘exact matching’.
>>>
>>>
>>> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy <
>>> mikhail.krupits...@jetbrains.com> wrote:
>>>
 Thanks for the reply.
 Could you please provide what index definition did you use?
 With the index from my script I get the following results:

 cqlsh:test> select * from escape;

  id | val
 +---
   1 | %escapeme
   2 | escape%me
 *  3 | escape%esc*

 Contains search

 cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';

  id | val
 +---
   1 | %escapeme
   3
 * | escape%esc*(2 rows)


 Prefix search

 cqlsh:test> SELECT * 

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-15 Thread Mikhail Krupitskiy
Thank you for the investigation. Will wait for a fix and news.

Probably it’s not a directly related question but what do you think about 
CASSANDRA-12573? Let me know if it’s better to create a separate thread for it.

Thanks,
Mikhail

> On 15 Sep 2016, at 16:02, DuyHai Doan  wrote:
> 
> Ok so I've found the source of the issue, it's pretty well hidden because it 
> is NOT in the SASI source code directly.
> 
> Here is the method where C* determines what kind of LIKE expression you're 
> using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
> 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778
>  
> 
> 
> As you can see, it's pretty simple, maybe too simple. Indeed, they forget to 
> remove escape character BEFORE doing the matching so if your search is LIKE 
> '%%esc%', the detected expression is LIKE_CONTAINS.
> 
> A possible fix would be:
> 
> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on the 
> column data type)
> 2) remove the escape character e.g. before parsing OR use some advanced regex 
> to exclude the %% from parsing e.g
> 
> Step 2) is dead easy but step 1) is harder because I don't know if converting 
> the bytebuffer into String at this stage of the CQL parser is expensive or 
> not (in term of computation)
> 
> Let me try a patch  
> 
> 
> 
> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan  > wrote:
> Ok you're right, I get your point
> 
> LIKE '%%esc%' --> startWith('%esc')
> 
> LIKE 'escape%%' -->  = 'escape%'
> 
> What I strongly suspect is that in the source code of SASI, we parse the % 
> xxx % expression BEFORE applying escape. That will explain the observed 
> behavior. E.g:
> 
> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
> 
> LIKE 'escape%%' parsed as xxx% where xxx =escape%
> 
> Let me check in the source code and try to reproduce the issue
> 
> 
> 
> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy 
> > 
> wrote:
> Looks like we have different understanding of what results are expected.
> I based my understanding on 
> http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html 
> 
> According to the doc ‘esc’ is a pattern for exact match and I guess that 
> there is no semantical difference between two LIKE patterns (both of patterns 
> should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
> 
>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results 
>> containing '%esc' so %escapeme is a possible match and also escape%esc
> Why ‘containing’? I expect that it should be ’starting’..
>> 
>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results starting 
>> with 'escape%' so escape%me is a valid result and also escape%esc
> Why ’starting’? I expect that it should be ‘exact matching’.
> 
> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it returns 
> nothing (CASSANDRA-12573).
> 
> What I’m missing?
> 
> Thanks,
> Mikhail
> 
>> On 13 Sep 2016, at 19:31, DuyHai Doan > > wrote:
>> 
>> CREATE CUSTOM INDEX ON test.escape(val) USING 'org.apache.cassandra.index.sa 
>> si.SASIIndex' WITH OPTIONS = {'mode': 
>> 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa 
>> si.analyzer.NonTokenizingAnalyzer', 
>> 'case_sensitive': 'false'};
>> 
>> I don't see any problem in the results you got
>> 
>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results 
>> containing '%esc' so %escapeme is a possible match and also escape%esc
> Why ‘containing’? I expect that it should be ’starting’..
>> 
>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results starting 
>> with 'escape%' so escape%me is a valid result and also escape%esc
> Why ’starting’? I expect that it should be ‘exact matching’.
> 
>> 
>> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy 
>> > 
>> wrote:
>> Thanks for the reply.
>> Could you please provide what index definition did you use?
>> With the index from my script I get the following results:
>> 
>> cqlsh:test> select * from escape;
>> 
>>  id | val
>> +---
>>   1 | %escapeme
>>   2 | escape%me
>>   3 | escape%esc
>> 
>> Contains search
>> 
>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>> 
>>  id | val
>> +---
>>   1 | %escapeme
>>   3 | escape%esc
>> (2 rows)
>> 
>> 
>> Prefix search
>> 
>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>> 
>>  id | val
>> +---
>>   2 

Is to ok restart DECOMMISION

2016-09-15 Thread laxmikanth sadula
I started decommssioned a node in our cassandra cluster.
But its taking too long time (more than 12 hrs) , so I would like to
restart(stop/kill the node & restart 'node decommission' again)..

Does killing node/stopping decommission and restarting decommission will
cause any issues to cluster?

Using c*-2.0.17 , 2 Data centers, each DC with 3 groups each , each group
with 3 nodes with RF-3

-- 
Thanks...!


Re: Read timeouts on primary key queries

2016-09-15 Thread Joseph Tech
I added the error logs and see that the timeouts are in a range b/n 2 to
7s. Samples below:

 Query error after 5354 ms: [4 bound values] 
 Query error after 6658 ms: [4 bound values] 
 Query error after 4596 ms: [4 bound values] 
 Query error after 2068 ms: [4 bound values] 
 Query error after 2904 ms: [4 bound values] 

There is no specific socket timeout set on the client side, so it would
take the default of 12s. The read_request_timeout_in_ms is set to 5s. In
this case, how do the errors happen in <5s ? . Is there any other factor
that would cause a fail-fast scenario during the read?

Thanks,
Joseph




On Wed, Sep 7, 2016 at 5:26 PM, Joseph Tech  wrote:

> Thanks, Romain for the detailed explanation. We use log4j 2 and i have
> added the driver logging for slow/error queries, will see if it helps to
> provide any pattern once in Prod.
>
> I tried getendpoints and getsstables for some of the timed out keys and
> most of them listed only 1 SSTable .There were a few which showed 2
> SSTables. There is no specific trend on the keys, it's completely based on
> the user access, and the same keys return results instantly from cqlsh
>
> On Tue, Sep 6, 2016 at 1:57 PM, Romain Hardouin 
> wrote:
>
>> There is nothing special in the two sstablemetadata outuputs but if the
>> timeouts are due to a network split or overwhelmed node or something like
>> that you won't see anything here. That said, if you have the keys which
>> produced the timeouts then, yes, you can look for a regular pattern (i.e.
>> always the same keys?).
>>
>> You can find sstables for a given key with nodetool:
>> nodetool getendpoints   
>> Then you can run the following command on one/each node of the enpoints:
>> nodetool getsstables   
>>
>> If many sstables are shown in the previous command it means that your
>> data is fragmented but thanks to LCS this number should be low.
>>
>> I think the most usefull actions now would be:
>>
>> * 1) *Enable DEBUG for o.a.c.db.ConsistencyLevel, it won't spam your log
>> file, you will see the following when errors will occur:
>> - Local replicas [, ...] are insufficient to satisfy
>> LOCAL_QUORUM requirement of X live nodes in ''
>>
>> You are using C* 2.1 but you can have a look at the C* 2.2
>> logback.xml: https://github.com/apache/cassandra/blob/cassan
>> dra-2.2/conf/logback.xml
>> I'm using it on production, it's better because it creates a separate
>> debug.log file with a asynchronous appender.
>>
>>Watch out when enabling:
>>
>> 
>>
>>Because the default logback configuration set all o.a.c in DEBUG:
>>
>> 
>>
>>Instead you can set:
>>
>>
>>> level="DEBUG"/>
>>
>> Also, if you want to restrict debug.log to DEBUG level only (instead
>> of DEBUG+INFO+...) you can add a LevelFilter to ASYNCDEBUGLOG in
>> logback.xml:
>>
>> 
>>   DEBUG
>>   ACCEPT
>>   DENY
>> 
>>
>>   Thus, the debug.log file will be empty unless some Consistency issues
>> happen.
>>
>> * 2) *Enable slow queries log at the driver level with a QueryLogger:
>>
>>Cluster cluster = ...
>>// log queries longer than 1 second, see also withDynamicThreshold
>>QueryLogger queryLogger = QueryLogger.builder(cluster).w
>> ithConstantThreshold(1000).build();
>>cluster.register(queryLogger);
>>
>> Then in your driver logback file:
>>
>> > level="DEBUG" />
>>
>>  *3) *And/or: you mentioned that you use DSE so you can enable slow
>> queries logging in dse.yaml (cql_slow_log_options)
>>
>> Best,
>>
>> Romain
>>
>>
>> Le Lundi 5 septembre 2016 20h05, Joseph Tech  a
>> écrit :
>>
>>
>> Attached are the sstablemeta outputs from 2 SSTables of size 28 MB and 52
>> MB (out2). The records are inserted with different TTLs based on their
>> nature ; test records with 1 day, typeA records with 6 months, typeB
>> records with 1 year etc. There are also explicit DELETEs from this table,
>> though it's much lower than the rate of inserts.
>>
>> I am not sure how to interpret this output, or if it's the right SSTables
>> that were picked. Please advise. Is there a way to get the sstables
>> corresponding to the keys that timed out, though they are accessible later.
>>
>> On Mon, Sep 5, 2016 at 10:58 PM, Anshu Vajpayee > > wrote:
>>
>> We have seen read time out issue in cassandra due to high droppable
>> tombstone ratio for repository.
>>
>> Please check for high droppable tombstone ratio for your repo.
>>
>> On Mon, Sep 5, 2016 at 8:11 PM, Romain Hardouin 
>> wrote:
>>
>> Yes dclocal_read_repair_chance will reduce the cross-DC traffic and
>> latency, so you can swap the values ( https://issues.apache.org/ji
>> ra/browse/CASSANDRA-7320
>>  ). I guess the
>> sstable_size_in_mb was set to 50 because back in the day (C* 1.0) the
>> default size was way too small: 5 MB. So maybe someone 

Re: Maximum number of columns in a table

2016-09-15 Thread DuyHai Doan
There is no real limit in term of number of columns in a table, I would say
that the impact of having a lot of columns is the amount of meta data C*
needs to keep in memory for encoding/decoding each row.

Now, if you have a table with 1000+ columns, the problem is probably your
data model...

On Thu, Sep 15, 2016 at 2:59 PM, Dorian Hoxha 
wrote:

> Is there alot of overhead with having a big number of columns in a table ?
> Not unbounded, but say, would 2000 be a problem(I think that's the maximum
> I'll need) ?
>
> Thank You
>


Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-15 Thread DuyHai Doan
Ok so I've found the source of the issue, it's pretty well hidden because
it is NOT in the SASI source code directly.

Here is the method where C* determines what kind of LIKE expression you're
using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)

https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778

As you can see, it's pretty simple, maybe too simple. Indeed, they forget
to remove escape character BEFORE doing the matching so if your search is LIKE
'%%esc%', the detected expression is LIKE_CONTAINS.

A possible fix would be:

1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on
the column data type)
2) remove the escape character e.g. before parsing OR use some advanced
regex to exclude the %% from parsing e.g

Step 2) is dead easy but step 1) is harder because I don't know if
converting the bytebuffer into String at this stage of the CQL parser is
expensive or not (in term of computation)

Let me try a patch



On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan  wrote:

> Ok you're right, I get your point
>
> LIKE '%%esc%' --> startWith('%esc')
>
> LIKE 'escape%%' -->  = 'escape%'
>
> What I strongly suspect is that in the source code of SASI, we parse the %
> xxx % expression BEFORE applying escape. That will explain the observed
> behavior. E.g:
>
> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
>
> LIKE 'escape%%' parsed as xxx% where xxx =escape%
>
> Let me check in the source code and try to reproduce the issue
>
>
>
> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy <
> mikhail.krupits...@jetbrains.com> wrote:
>
>> Looks like we have different understanding of what results are expected.
>> I based my understanding on http://docs.datastax.com/en
>> /cql/3.3/cql/cql_using/useSASIIndex.html
>> According to the doc ‘esc’ is a pattern for exact match and I guess that
>> there is no semantical difference between two LIKE patterns (both of
>> patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
>>
>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
>> *containing* '%esc' so *%esc*apeme is a possible match and also escape
>> *%esc*
>>
>> Why ‘containing’? I expect that it should be ’starting’..
>>
>>
>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
>> *starting* with 'escape%' so *escape%*me is a valid result and also
>> *escape%*esc
>>
>> Why ’starting’? I expect that it should be ‘exact matching’.
>>
>> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it
>> returns nothing (CASSANDRA-12573).
>>
>> What I’m missing?
>>
>> Thanks,
>> Mikhail
>>
>> On 13 Sep 2016, at 19:31, DuyHai Doan  wrote:
>>
>> CREATE CUSTOM INDEX ON test.escape(val) USING '
>> org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode':
>> 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa
>> si.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'};
>>
>> I don't see any problem in the results you got
>>
>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
>> *containing* '%esc' so *%esc*apeme is a possible match and also escape
>> *%esc*
>>
>> Why ‘containing’? I expect that it should be ’starting’..
>>
>>
>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
>> *starting* with 'escape%' so *escape%*me is a valid result and also
>> *escape%*esc
>>
>> Why ’starting’? I expect that it should be ‘exact matching’.
>>
>>
>> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy <
>> mikhail.krupits...@jetbrains.com> wrote:
>>
>>> Thanks for the reply.
>>> Could you please provide what index definition did you use?
>>> With the index from my script I get the following results:
>>>
>>> cqlsh:test> select * from escape;
>>>
>>>  id | val
>>> +---
>>>   1 | %escapeme
>>>   2 | escape%me
>>> *  3 | escape%esc*
>>>
>>> Contains search
>>>
>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>
>>>  id | val
>>> +---
>>>   1 | %escapeme
>>>   3
>>> * | escape%esc*(2 rows)
>>>
>>>
>>> Prefix search
>>>
>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>
>>>  id | val
>>> +---
>>>   2 | escape%me
>>>   3
>>> * | escape%esc*
>>>
>>> Thanks,
>>> Mikhail
>>>
>>> On 13 Sep 2016, at 18:16, DuyHai Doan  wrote:
>>>
>>> Use % to escape %
>>>
>>> cqlsh:test> select * from escape;
>>>
>>>  id | val
>>> +---
>>>   1 | %escapeme
>>>   2 | escape%me
>>>
>>>
>>> Contains search
>>>
>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>
>>>  id | val
>>> +---
>>>   1 | %escapeme
>>>
>>> (1 rows)
>>>
>>>
>>> Prefix search
>>>
>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>
>>>  id | val
>>> +---
>>>   2 | escape%me
>>>
>>> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy <
>>> mikhail.krupits...@jetbrains.com> wrote:
>>>
 Hi Cassandra guys,


Maximum number of columns in a table

2016-09-15 Thread Dorian Hoxha
Is there alot of overhead with having a big number of columns in a table ?
Not unbounded, but say, would 2000 be a problem(I think that's the maximum
I'll need) ?

Thank You


Re: Streaming Process: How can we speed it up?

2016-09-15 Thread Vasileios Vlachos
Thanks for sharing your experience Ben

On 15 Sep 2016 11:35 am, "Ben Slater"  wrote:

> We’ve successfully used the rsynch method you outline quite a few times in
> situations where we’ve had clusters that take forever to add new nodes
> (mainly due to secondary indexes) and need to do a quick replacement for
> one reason or another. As you mention, the main disadvantage we ran into is
> that the node doesn’t get cleaned up through the replacement process like a
> newly streamed node does (plus the extra operational complexity).
>
> Cheers
> Ben
>
> On Thu, 15 Sep 2016 at 19:47 Vasileios Vlachos 
> wrote:
>
>> Hello and thanks for your responses,
>>
>> OK, so increasing stream_throughput_outbound_megabits_per_sec makes no
>> difference. Any ideas why streaming is limited to only two of the three
>> nodes available?
>>
>> As an alternative to slow streaming I tried this:
>>
>>   - install C* on a new node, stop the service and delete
>> /var/lib/cassandra/*
>>  - rsync /etc/cassandra from old node to new node
>>  - rsync /var/lib/cassandra from old node to new node
>>  - stop C* on the old node
>>  - rsync /var/lib/cassandra from old node to new node
>>  - move the old node to a different IP
>>  - move the new node to the old node's original IP
>>  - start C* on the new node (no need for the replace_node option in
>> cassandra-env.sh)
>>
>> This technique has been successful so far for a demo cluster with fewer
>> data. The only disadvantage for us is that we were hoping that by streaming
>> the SSTables to the new node, tombstones would be discarded (freeing a lot
>> of disk space on our live cluster). This is exactly what happened for the
>> one node we streamed so far; unfortunately, the slow streaming generates a
>> lot of hints which makes recovery a very long process.
>>
>> Do you guys see any other problems with the rsync method that I've
>> skipped?
>>
>> Regarding the tombstones issue (if we finally do what I described above),
>> I'm thinking sstablsplit. Then compaction should deal with it (I think). I
>> have not used sstablesplit in the past, so another thing I'd like to ask is
>> if you guys find this a good/bad idea for what I'm trying to do.
>>
>> Many thanks,
>> Vasilis
>>
>> On Mon, Sep 12, 2016 at 6:42 PM, Jeff Jirsa  wrote:
>>
>>>
>>>
>>> On 2016-09-12 09:38 (-0700), daemeon reiydelle 
>>> wrote:
>>> > Re. throughput. That looks slow for jumbo with 10g. Check your
>>> networks.
>>> >
>>> >
>>>
>>> It's extremely unlikely you'll be able to saturate a 10g link with a
>>> single instance cassandra.
>>>
>>> Faster Cassandra streaming is a work in progress - being able to send
>>> more than one file at a time is probably the most obvious area for
>>> improvement, and being able to better deal with the CPU / garbage generated
>>> on the receiving side is just behind that. You'll likely be able to stream
>>> 10-15 MB/s per sending server or cpu core, whichever is less (in a vnode
>>> setup, you'll be cpu bound - in a single-token setup, you'll be stream
>>> bound).
>>>
>>>
>>>
>> --
> 
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798
>


Re: Streaming Process: How can we speed it up?

2016-09-15 Thread Ben Slater
We’ve successfully used the rsynch method you outline quite a few times in
situations where we’ve had clusters that take forever to add new nodes
(mainly due to secondary indexes) and need to do a quick replacement for
one reason or another. As you mention, the main disadvantage we ran into is
that the node doesn’t get cleaned up through the replacement process like a
newly streamed node does (plus the extra operational complexity).

Cheers
Ben

On Thu, 15 Sep 2016 at 19:47 Vasileios Vlachos 
wrote:

> Hello and thanks for your responses,
>
> OK, so increasing stream_throughput_outbound_megabits_per_sec makes no
> difference. Any ideas why streaming is limited to only two of the three
> nodes available?
>
> As an alternative to slow streaming I tried this:
>
>   - install C* on a new node, stop the service and delete
> /var/lib/cassandra/*
>  - rsync /etc/cassandra from old node to new node
>  - rsync /var/lib/cassandra from old node to new node
>  - stop C* on the old node
>  - rsync /var/lib/cassandra from old node to new node
>  - move the old node to a different IP
>  - move the new node to the old node's original IP
>  - start C* on the new node (no need for the replace_node option in
> cassandra-env.sh)
>
> This technique has been successful so far for a demo cluster with fewer
> data. The only disadvantage for us is that we were hoping that by streaming
> the SSTables to the new node, tombstones would be discarded (freeing a lot
> of disk space on our live cluster). This is exactly what happened for the
> one node we streamed so far; unfortunately, the slow streaming generates a
> lot of hints which makes recovery a very long process.
>
> Do you guys see any other problems with the rsync method that I've skipped?
>
> Regarding the tombstones issue (if we finally do what I described above),
> I'm thinking sstablsplit. Then compaction should deal with it (I think). I
> have not used sstablesplit in the past, so another thing I'd like to ask is
> if you guys find this a good/bad idea for what I'm trying to do.
>
> Many thanks,
> Vasilis
>
> On Mon, Sep 12, 2016 at 6:42 PM, Jeff Jirsa  wrote:
>
>>
>>
>> On 2016-09-12 09:38 (-0700), daemeon reiydelle 
>> wrote:
>> > Re. throughput. That looks slow for jumbo with 10g. Check your networks.
>> >
>> >
>>
>> It's extremely unlikely you'll be able to saturate a 10g link with a
>> single instance cassandra.
>>
>> Faster Cassandra streaming is a work in progress - being able to send
>> more than one file at a time is probably the most obvious area for
>> improvement, and being able to better deal with the CPU / garbage generated
>> on the receiving side is just behind that. You'll likely be able to stream
>> 10-15 MB/s per sending server or cpu core, whichever is less (in a vnode
>> setup, you'll be cpu bound - in a single-token setup, you'll be stream
>> bound).
>>
>>
>>
> --

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798


Re: Streaming Process: How can we speed it up?

2016-09-15 Thread Vasileios Vlachos
Hello and thanks for your responses,

OK, so increasing stream_throughput_outbound_megabits_per_sec makes no
difference. Any ideas why streaming is limited to only two of the three
nodes available?

As an alternative to slow streaming I tried this:

  - install C* on a new node, stop the service and delete
/var/lib/cassandra/*
 - rsync /etc/cassandra from old node to new node
 - rsync /var/lib/cassandra from old node to new node
 - stop C* on the old node
 - rsync /var/lib/cassandra from old node to new node
 - move the old node to a different IP
 - move the new node to the old node's original IP
 - start C* on the new node (no need for the replace_node option in
cassandra-env.sh)

This technique has been successful so far for a demo cluster with fewer
data. The only disadvantage for us is that we were hoping that by streaming
the SSTables to the new node, tombstones would be discarded (freeing a lot
of disk space on our live cluster). This is exactly what happened for the
one node we streamed so far; unfortunately, the slow streaming generates a
lot of hints which makes recovery a very long process.

Do you guys see any other problems with the rsync method that I've skipped?

Regarding the tombstones issue (if we finally do what I described above),
I'm thinking sstablsplit. Then compaction should deal with it (I think). I
have not used sstablesplit in the past, so another thing I'd like to ask is
if you guys find this a good/bad idea for what I'm trying to do.

Many thanks,
Vasilis

On Mon, Sep 12, 2016 at 6:42 PM, Jeff Jirsa  wrote:

>
>
> On 2016-09-12 09:38 (-0700), daemeon reiydelle  wrote:
> > Re. throughput. That looks slow for jumbo with 10g. Check your networks.
> >
> >
>
> It's extremely unlikely you'll be able to saturate a 10g link with a
> single instance cassandra.
>
> Faster Cassandra streaming is a work in progress - being able to send more
> than one file at a time is probably the most obvious area for improvement,
> and being able to better deal with the CPU / garbage generated on the
> receiving side is just behind that. You'll likely be able to stream 10-15
> MB/s per sending server or cpu core, whichever is less (in a vnode setup,
> you'll be cpu bound - in a single-token setup, you'll be stream bound).
>
>
>


Re: race condition for quorum consistency

2016-09-15 Thread Alexander Dejanovski
I haven't been very accurate in my first answer indeed, which was
misleading.
Apache Cassandra guarantees that if all queries are ran at least at quorum,
a client writing successfully (as in the cluster acknowledged the write)
then reading his previous write will see the correct value unless another
client updated it between the write and the read (which would be a race
condition). Same goes for two different clients if the first issues a
successful write and only after that the second reads the value.
Quorum provides consistency guaranty if queries are fired in sequence.

Without diving into complex scenarios where it may work because of read
repair and the fact that everything is async, Ken's use case was : C1
writes, it is not successful yet, C2 and C3 read at the approx. same time.
Once again, in this case C2 and C3 could be reading a different value as
C1's mutation could be in pending state on some nodes. Considering we have
nodes A, B and C :

   - Node A has received the write from C1, nodes B and C have not
   - C2 reads from A and B, there's a digest mismatch which triggers a
   foreground read repair (background read repairs are triggered at CL ONE) >
   it gets the up to date value that was written by C1
   - C3 reads from B and C, there's no digest mismatch and the value is not
   up to date with A > it does not get the value written by C1


Cheers,


On Thu, Sep 15, 2016 at 12:10 AM Tyler Hobbs  wrote:

>
> On Wed, Sep 14, 2016 at 3:49 PM, Nicolas Douillet <
> nicolas.douil...@gmail.com> wrote:
>
>>
>>-
>>- during read requests, cassandra will ask to one node the data and
>>to the others involved in the CL a digest, and if all digests do not 
>> match,
>>will ask for them the entire data, handle the merge and finally will ask 
>> to
>>those nodes a background repair. Your write may have succeed during this
>>time.
>>
>>
> This is very good info, but as a minor correction, the repair here will
> happen in the foreground before the response is returned to the client.
> So, at least from a single client's perspective, you get monotonic reads.
>
>
> --
> Tyler Hobbs
> DataStax 
>
-- 
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com