Re: Question About Reaper

2018-05-21 Thread Alexander Dejanovski
You won't be able to have less segments than vnodes, so just use 256
segments per node, use parallel as repair parallelism, and set intensity to
1.

You apparently have more than 3TB per node, and that kind of density is
always challenging when it comes to run "fast" repairs.

Cheers,

Le mar. 22 mai 2018 à 07:28, Surbhi Gupta  a
écrit :

> We are on Dse 4.8.15 and it is cassandra 2.1.
> What are the best configuration to use for reaper for 144 nodes with 256
> vnodes and it shows around 532TB data when we start opscenter repairs.
>
> We need to finish repair soon.
>
> On Mon, May 21, 2018 at 10:53 AM Alexander Dejanovski <
> a...@thelastpickle.com> wrote:
>
>> Hi Subri,
>>
>> Reaper might indeed be your best chance to reduce the overhead of vnodes
>> there.
>> The latest betas include a new feature that will group vnodes sharing the
>> same replicas in the same segment. This will allow to have less segments
>> than vnodes, and is available with Cassandra 2.2 and onwards (the
>> improvement is especially beneficial with Cassandra 3.0+ as such token
>> ranges will be repaired in a single session).
>>
>> We have a gitter that you can join if you want to ask questions.
>>
>> Cheers,
>>
>> Le lun. 21 mai 2018 à 15:29, Surbhi Gupta  a
>> écrit :
>>
>>> Thanks Abdul
>>>
>>> On Mon, May 21, 2018 at 6:28 AM Abdul Patel  wrote:
>>>
 We have a paramater in reaper yaml file called
 repairManagerSchrdulingIntervalSeconds default is 10 seconds   , i tested
 with 8,6,5 seconds and found 5 seconds optimal for my environment ..you go
 down further but it will have cascading effects in cpu and memory
 consumption.
 So test well.


 On Monday, May 21, 2018, Surbhi Gupta  wrote:

> Thanks a lot for your inputs,
> Abdul, how did u tune reaper?
>
> On Sun, May 20, 2018 at 10:10 AM Jonathan Haddad 
> wrote:
>
>> FWIW the largest deployment I know about is a single reaper instance
>> managing 50 clusters and over 2000 nodes.
>>
>> There might be bigger, but I either don’t know about it or can’t
>> remember.
>>
>> On Sun, May 20, 2018 at 10:04 AM Abdul Patel 
>> wrote:
>>
>>> Hi,
>>>
>>> I recently tested reaper and it actually helped us alot. Even with
>>> our small footprint 18 node reaper takes close to 6 hrs.>> 13
>>> hrs ,i was able to tune it 50%>. But it really depends on number nodes. 
>>> For
>>> example if you have 4 nodes then it runs on 4*256 =1024 
>>> segements ,
>>> so for your env. Ut will be 256*144 close to 36k segements.
>>> Better test on poc box how much time it takes and then proceed
>>> further ..i have tested so far in 1 dc only , we can actually have 
>>> seperate
>>> reaper instance handling seperate dc but havent tested it yet.
>>>
>>>
>>> On Sunday, May 20, 2018, Surbhi Gupta 
>>> wrote:
>>>
 Hi,

 We have a cluster with 144 nodes( 3 datacenter) with 256 Vnodes .
 When we tried to start repairs from opscenter then it showed
 1.9Million ranges to repair .
 And even after doing compaction and strekamthroughput to 0 ,
 opscenter is not able to help us much to finish repair in 9 days 
 timeframe .

 What is your thought on Reaper ?
 Do you think , Reaper might be able to help us in this scenario ?

 Thanks
 Surbhi


 --
>> Jon Haddad
>> http://www.rustyrazorblade.com
>> twitter: rustyrazorblade
>>
>>
>>
>
>
>>>
>>> --
>> -
>> Alexander Dejanovski
>> France
>> @alexanderdeja
>>
>> Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>>
>> --
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: Question About Reaper

2018-05-21 Thread Surbhi Gupta
We are on Dse 4.8.15 and it is cassandra 2.1.
What are the best configuration to use for reaper for 144 nodes with 256
vnodes and it shows around 532TB data when we start opscenter repairs.

We need to finish repair soon.

On Mon, May 21, 2018 at 10:53 AM Alexander Dejanovski <
a...@thelastpickle.com> wrote:

> Hi Subri,
>
> Reaper might indeed be your best chance to reduce the overhead of vnodes
> there.
> The latest betas include a new feature that will group vnodes sharing the
> same replicas in the same segment. This will allow to have less segments
> than vnodes, and is available with Cassandra 2.2 and onwards (the
> improvement is especially beneficial with Cassandra 3.0+ as such token
> ranges will be repaired in a single session).
>
> We have a gitter that you can join if you want to ask questions.
>
> Cheers,
>
> Le lun. 21 mai 2018 à 15:29, Surbhi Gupta  a
> écrit :
>
>> Thanks Abdul
>>
>> On Mon, May 21, 2018 at 6:28 AM Abdul Patel  wrote:
>>
>>> We have a paramater in reaper yaml file called
>>> repairManagerSchrdulingIntervalSeconds default is 10 seconds   , i tested
>>> with 8,6,5 seconds and found 5 seconds optimal for my environment ..you go
>>> down further but it will have cascading effects in cpu and memory
>>> consumption.
>>> So test well.
>>>
>>>
>>> On Monday, May 21, 2018, Surbhi Gupta  wrote:
>>>
 Thanks a lot for your inputs,
 Abdul, how did u tune reaper?

 On Sun, May 20, 2018 at 10:10 AM Jonathan Haddad 
 wrote:

> FWIW the largest deployment I know about is a single reaper instance
> managing 50 clusters and over 2000 nodes.
>
> There might be bigger, but I either don’t know about it or can’t
> remember.
>
> On Sun, May 20, 2018 at 10:04 AM Abdul Patel 
> wrote:
>
>> Hi,
>>
>> I recently tested reaper and it actually helped us alot. Even with
>> our small footprint 18 node reaper takes close to 6 hrs.> hrs ,i was able to tune it 50%>. But it really depends on number nodes. 
>> For
>> example if you have 4 nodes then it runs on 4*256 =1024 
>> segements ,
>> so for your env. Ut will be 256*144 close to 36k segements.
>> Better test on poc box how much time it takes and then proceed
>> further ..i have tested so far in 1 dc only , we can actually have 
>> seperate
>> reaper instance handling seperate dc but havent tested it yet.
>>
>>
>> On Sunday, May 20, 2018, Surbhi Gupta 
>> wrote:
>>
>>> Hi,
>>>
>>> We have a cluster with 144 nodes( 3 datacenter) with 256 Vnodes .
>>> When we tried to start repairs from opscenter then it showed
>>> 1.9Million ranges to repair .
>>> And even after doing compaction and strekamthroughput to 0 ,
>>> opscenter is not able to help us much to finish repair in 9 days 
>>> timeframe .
>>>
>>> What is your thought on Reaper ?
>>> Do you think , Reaper might be able to help us in this scenario ?
>>>
>>> Thanks
>>> Surbhi
>>>
>>>
>>> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>
>
>


>>
>> --
> -
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
>


RE: [EXTERNAL] IN clause of prepared statement

2018-05-21 Thread onmstester onmstester
It seems that there is no way doing this using Cassandra and even something 
like spark won't help because i'm going to read from a big Cassandra partition 
(bottleneck is reading from Cassandra)

Sent using Zoho Mail






 On Tue, 22 May 2018 09:08:55 +0430 onmstester onmstester 
onmstes...@zoho.com wrote 




I try that too, using select ALL_NON_Collection_Columns ..., encoutered error:

IN restrictions are not supported on indexed columns



Sent using Zoho Mail






 On Mon, 21 May 2018 20:10:29 +0430 Durity, Sean R 
sean_r_dur...@homedepot.com wrote 











One of the columns you are selecting is a list or map or other kind of 
collection. You can’t do that with an IN clause against a clustering column. 
Either don’t select the collection column OR don’t use the IN clause. Cassandra 
is trying to protect itself (and you) from a query that won’t scale well. Honor 
that.

 

As a good practice, you shouldn’t do select * (as a production query) against 
any database. You want to list the columns you actually want to select. That 
way a later “alter table add column” (or similar) doesn’t cause unpredictable 
results to the application.

 

 

Sean Durity


From: onmstester onmstester onmstes...@zoho.com 
 Sent: Sunday, May 20, 2018 10:13 AM
 To: user user@cassandra.apache.org
 Subject: [EXTERNAL] IN clause of prepared statement


 

The table is  something like


 


Samples


...


partition key (partition,resource,(timestamp,metric_name)


 


creating prepared statement :


session.prepare("select * from samples where partition=:partition and 
resource=:resource and timestamp=:start and timestamp=:end and 
metric_name in :metric_names")


 


failed with  exception:


 


can not restrict clustering columns by IN relations when a collection is 
selected by the query


 


The query is OK using cqlsh. using column names in select did not help.


Is there anyway to achieve this in Cassandra? I'm aware of performance problems 
of this query but it does not matter in my case!


 


I'm using datastax driver 3.2 and Apache cassandra 3.11.2


Sent using  Zoho Mail


 



 





The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.







RE: [EXTERNAL] IN clause of prepared statement

2018-05-21 Thread onmstester onmstester
I try that too, using select ALL_NON_Collection_Columns ..., encoutered error:

IN restrictions are not supported on indexed columns


Sent using Zoho Mail






 On Mon, 21 May 2018 20:10:29 +0430 Durity, Sean R 
sean_r_dur...@homedepot.com wrote 




One of the columns you are selecting is a list or map or other kind of 
collection. You can’t do that with an IN clause against a clustering column. 
Either don’t select the collection column OR don’t use the IN clause. Cassandra 
is trying to protect itself (and you) from a query that won’t scale well. Honor 
that.

 

As a good practice, you shouldn’t do select * (as a production query) against 
any database. You want to list the columns you actually want to select. That 
way a later “alter table add column” (or similar) doesn’t cause unpredictable 
results to the application.

 

 

Sean Durity


From: onmstester onmstester onmstes...@zoho.com 
 Sent: Sunday, May 20, 2018 10:13 AM
 To: user user@cassandra.apache.org
 Subject: [EXTERNAL] IN clause of prepared statement


 

The table is  something like


 


Samples


...


partition key (partition,resource,(timestamp,metric_name)


 


creating prepared statement :


session.prepare("select * from samples where partition=:partition and 
resource=:resource and timestamp=:start and timestamp=:end and 
metric_name in :metric_names")


 


failed with  exception:


 


can not restrict clustering columns by IN relations when a collection is 
selected by the query


 


The query is OK using cqlsh. using column names in select did not help.


Is there anyway to achieve this in Cassandra? I'm aware of performance problems 
of this query but it does not matter in my case!


 


I'm using datastax driver 3.2 and Apache cassandra 3.11.2


Sent using  Zoho Mail


 



 





The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.







How is Token function work in Cassandra

2018-05-21 Thread Goutham reddy
I would like to know how the Token Function works in Cassandra. In what
scenario it is best used. Secondly can a range query performed on token
function on a composite primary key. Any help is highly appreciated.

Thanks and Regards,
Goutham Reddy Aenugu.
-- 
Regards
Goutham Reddy


Re: Client ID logging

2018-05-21 Thread Andy Tolbert
CASSANDRA-13665 
adds a 'nodetool clientlist' command which I think would be helpful in this
circumstance.  That feature is targeted for C* 4.0 however.

You could use something like lsof  to
see what active TCP connections there are to the host servers running your
C* cluster to capture the IP addresses of the clients connected to your
cluster.

Thanks,
Andy

On Mon, May 21, 2018 at 1:42 PM, Hannu Kröger  wrote:

> Hmm, I think that by default not but you can create a hook to log that.
> Create a wrapper for PasswordAuthenticator class for example and use that.
> Or if you don’t use authentication you can create your own query handler.
>
> Hannu
>
> James Lovato  kirjoitti 21.5.2018 kello 21.37:
>
> Hi guys,
>
>
>
> Can standard OSS Cassandra 3 do logging of who connects to it?  We have a
> cluster in 3 DCs and our devs want to see if the client is crossing across
> DC (even though they have DCLOCAL set from their DS driver).
>
>
>
> Thanks,
> James
>
>


Re: Client ID logging

2018-05-21 Thread Hannu Kröger
Hmm, I think that by default not but you can create a hook to log that. Create 
a wrapper for PasswordAuthenticator class for example and use that. Or if you 
don’t use authentication you can create your own query handler.

Hannu

> James Lovato  kirjoitti 21.5.2018 kello 21.37:
> 
> Hi guys,
>  
> Can standard OSS Cassandra 3 do logging of who connects to it?  We have a 
> cluster in 3 DCs and our devs want to see if the client is crossing across DC 
> (even though they have DCLOCAL set from their DS driver).
>  
> Thanks,
> James


Client ID logging

2018-05-21 Thread James Lovato
Hi guys,

Can standard OSS Cassandra 3 do logging of who connects to it?  We have a 
cluster in 3 DCs and our devs want to see if the client is crossing across DC 
(even though they have DCLOCAL set from their DS driver).

Thanks,
James


Re: Cassandra few nodes having high mem consumption

2018-05-21 Thread Abdul Patel
Additonally the cqlsh was taking lil time to login ans immediatly the
message popped up in log lik
PERIODIC-COMMIT-LOG-SYNCER .
Seems commutlog isnt able to vommit to disk .any ideas?
I have ran nodetool flush and restarted nodes ..
But wanted to kmow the root cause.

On Monday, May 21, 2018, Abdul Patel  wrote:

> Hi
>
> I have few cassandra nodes throwing suddwnly 80% memory usage , this
> happened 1 week after upgrading from 3.1.0 to 3.11.2 , no errors in log .
> Is their a way i can find high cpu or memory consuming process in
> cassnadra?
>


Re: Question About Reaper

2018-05-21 Thread Alexander Dejanovski
Hi Subri,

Reaper might indeed be your best chance to reduce the overhead of vnodes
there.
The latest betas include a new feature that will group vnodes sharing the
same replicas in the same segment. This will allow to have less segments
than vnodes, and is available with Cassandra 2.2 and onwards (the
improvement is especially beneficial with Cassandra 3.0+ as such token
ranges will be repaired in a single session).

We have a gitter that you can join if you want to ask questions.

Cheers,

Le lun. 21 mai 2018 à 15:29, Surbhi Gupta  a
écrit :

> Thanks Abdul
>
> On Mon, May 21, 2018 at 6:28 AM Abdul Patel  wrote:
>
>> We have a paramater in reaper yaml file called
>> repairManagerSchrdulingIntervalSeconds default is 10 seconds   , i tested
>> with 8,6,5 seconds and found 5 seconds optimal for my environment ..you go
>> down further but it will have cascading effects in cpu and memory
>> consumption.
>> So test well.
>>
>>
>> On Monday, May 21, 2018, Surbhi Gupta  wrote:
>>
>>> Thanks a lot for your inputs,
>>> Abdul, how did u tune reaper?
>>>
>>> On Sun, May 20, 2018 at 10:10 AM Jonathan Haddad 
>>> wrote:
>>>
 FWIW the largest deployment I know about is a single reaper instance
 managing 50 clusters and over 2000 nodes.

 There might be bigger, but I either don’t know about it or can’t
 remember.

 On Sun, May 20, 2018 at 10:04 AM Abdul Patel 
 wrote:

> Hi,
>
> I recently tested reaper and it actually helped us alot. Even with our
> small footprint 18 node reaper takes close to 6 hrs. ,i was able to tune it 50%>. But it really depends on number nodes. For
> example if you have 4 nodes then it runs on 4*256 =1024 segements 
> ,
> so for your env. Ut will be 256*144 close to 36k segements.
> Better test on poc box how much time it takes and then proceed further
> ..i have tested so far in 1 dc only , we can actually have seperate reaper
> instance handling seperate dc but havent tested it yet.
>
>
> On Sunday, May 20, 2018, Surbhi Gupta 
> wrote:
>
>> Hi,
>>
>> We have a cluster with 144 nodes( 3 datacenter) with 256 Vnodes .
>> When we tried to start repairs from opscenter then it showed
>> 1.9Million ranges to repair .
>> And even after doing compaction and strekamthroughput to 0 ,
>> opscenter is not able to help us much to finish repair in 9 days 
>> timeframe .
>>
>> What is your thought on Reaper ?
>> Do you think , Reaper might be able to help us in this scenario ?
>>
>> Thanks
>> Surbhi
>>
>>
>> --
 Jon Haddad
 http://www.rustyrazorblade.com
 twitter: rustyrazorblade



>>>
>>> --
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


RE: [EXTERNAL] IN clause of prepared statement

2018-05-21 Thread Durity, Sean R
One of the columns you are selecting is a list or map or other kind of 
collection. You can’t do that with an IN clause against a clustering column. 
Either don’t select the collection column OR don’t use the IN clause. Cassandra 
is trying to protect itself (and you) from a query that won’t scale well. Honor 
that.

As a good practice, you shouldn’t do select * (as a production query) against 
any database. You want to list the columns you actually want to select. That 
way a later “alter table add column” (or similar) doesn’t cause unpredictable 
results to the application.


Sean Durity
From: onmstester onmstester 
Sent: Sunday, May 20, 2018 10:13 AM
To: user 
Subject: [EXTERNAL] IN clause of prepared statement

The table is  something like

Samples
...
partition key (partition,resource,(timestamp,metric_name)

creating prepared statement :
session.prepare("select * from samples where partition=:partition and 
resource=:resource and timestamp>=:start and timestamp<=:end and metric_name in 
:metric_names")

failed with  exception:

can not restrict clustering columns by IN relations when a collection is 
selected by the query

The query is OK using cqlsh. using column names in select did not help.
Is there anyway to achieve this in Cassandra? I'm aware of performance problems 
of this query but it does not matter in my case!

I'm using datastax driver 3.2 and Apache cassandra 3.11.2

Sent using Zoho 
Mail





The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


Cassandra few nodes having high mem consumption

2018-05-21 Thread Abdul Patel
Hi

I have few cassandra nodes throwing suddwnly 80% memory usage , this
happened 1 week after upgrading from 3.1.0 to 3.11.2 , no errors in log .
Is their a way i can find high cpu or memory consuming process in cassnadra?


Re: Question About Reaper

2018-05-21 Thread Surbhi Gupta
Thanks Abdul

On Mon, May 21, 2018 at 6:28 AM Abdul Patel  wrote:

> We have a paramater in reaper yaml file called
> repairManagerSchrdulingIntervalSeconds default is 10 seconds   , i tested
> with 8,6,5 seconds and found 5 seconds optimal for my environment ..you go
> down further but it will have cascading effects in cpu and memory
> consumption.
> So test well.
>
>
> On Monday, May 21, 2018, Surbhi Gupta  wrote:
>
>> Thanks a lot for your inputs,
>> Abdul, how did u tune reaper?
>>
>> On Sun, May 20, 2018 at 10:10 AM Jonathan Haddad 
>> wrote:
>>
>>> FWIW the largest deployment I know about is a single reaper instance
>>> managing 50 clusters and over 2000 nodes.
>>>
>>> There might be bigger, but I either don’t know about it or can’t
>>> remember.
>>>
>>> On Sun, May 20, 2018 at 10:04 AM Abdul Patel 
>>> wrote:
>>>
 Hi,

 I recently tested reaper and it actually helped us alot. Even with our
 small footprint 18 node reaper takes close to 6 hrs.>>> ,i was able to tune it 50%>. But it really depends on number nodes. For
 example if you have 4 nodes then it runs on 4*256 =1024 segements ,
 so for your env. Ut will be 256*144 close to 36k segements.
 Better test on poc box how much time it takes and then proceed further
 ..i have tested so far in 1 dc only , we can actually have seperate reaper
 instance handling seperate dc but havent tested it yet.


 On Sunday, May 20, 2018, Surbhi Gupta  wrote:

> Hi,
>
> We have a cluster with 144 nodes( 3 datacenter) with 256 Vnodes .
> When we tried to start repairs from opscenter then it showed
> 1.9Million ranges to repair .
> And even after doing compaction and strekamthroughput to 0 , opscenter
> is not able to help us much to finish repair in 9 days timeframe .
>
> What is your thought on Reaper ?
> Do you think , Reaper might be able to help us in this scenario ?
>
> Thanks
> Surbhi
>
>
> --
>>> Jon Haddad
>>> http://www.rustyrazorblade.com
>>> twitter: rustyrazorblade
>>>
>>>
>>>
>>
>>


Re: Question About Reaper

2018-05-21 Thread Abdul Patel
We have a paramater in reaper yaml file called
repairManagerSchrdulingIntervalSeconds default is 10 seconds   , i tested
with 8,6,5 seconds and found 5 seconds optimal for my environment ..you go
down further but it will have cascading effects in cpu and memory
consumption.
So test well.

On Monday, May 21, 2018, Surbhi Gupta  wrote:

> Thanks a lot for your inputs,
> Abdul, how did u tune reaper?
>
> On Sun, May 20, 2018 at 10:10 AM Jonathan Haddad 
> wrote:
>
>> FWIW the largest deployment I know about is a single reaper instance
>> managing 50 clusters and over 2000 nodes.
>>
>> There might be bigger, but I either don’t know about it or can’t
>> remember.
>>
>> On Sun, May 20, 2018 at 10:04 AM Abdul Patel  wrote:
>>
>>> Hi,
>>>
>>> I recently tested reaper and it actually helped us alot. Even with our
>>> small footprint 18 node reaper takes close to 6 hrs.>> ,i was able to tune it 50%>. But it really depends on number nodes. For
>>> example if you have 4 nodes then it runs on 4*256 =1024 segements ,
>>> so for your env. Ut will be 256*144 close to 36k segements.
>>> Better test on poc box how much time it takes and then proceed further
>>> ..i have tested so far in 1 dc only , we can actually have seperate reaper
>>> instance handling seperate dc but havent tested it yet.
>>>
>>>
>>> On Sunday, May 20, 2018, Surbhi Gupta  wrote:
>>>
 Hi,

 We have a cluster with 144 nodes( 3 datacenter) with 256 Vnodes .
 When we tried to start repairs from opscenter then it showed 1.9Million
 ranges to repair .
 And even after doing compaction and strekamthroughput to 0 , opscenter
 is not able to help us much to finish repair in 9 days timeframe .

 What is your thought on Reaper ?
 Do you think , Reaper might be able to help us in this scenario ?

 Thanks
 Surbhi


 --
>> Jon Haddad
>> http://www.rustyrazorblade.com
>> twitter: rustyrazorblade
>>
>>
>>


Cassandra insert from Spark slows down when running executors on the same node

2018-05-21 Thread Javier Pareja
 Hello,

I have a Spark Streaming job reading data from kafka, processing it and
inserting it into Cassandra. The job is running on a cluster with 3
machines. I use Mesos to submit the job with 3 executors using 1 core each.
The problem is that when all executors are running on the same node, the
insertion stage onto Cassandra becomes x5 times slower. When the executors
run one on each machine, the insertion runs as expected.

I am using Datastax cassandra driver for the insertion of the stream.

For now all that I can do is to kill the submission and try again. Because
I can't predicts how Mesos assigns resources, I might have to submit it
several times until it works.
Does anyone know what could be wrong?  Any idea of what can I look into?
Network, Cassandra Max Host Connections, shared VM objects...?

Best Regards,
Francisco


Re: performance on reading only the specific nonPk column

2018-05-21 Thread sujeet jog
Thanks Kurt, that answers my question.

@nandan,   id, timestamp ensures unique primary-key.


On Mon, May 21, 2018 at 2:23 PM, kurt greaves  wrote:

> Every column will be retrieved (that's populated) from disk and the
> requested column will then be sliced out in memory and sent back.
>
> On 21 May 2018 at 08:34, sujeet jog  wrote:
>
>> Folks,
>>
>> consider a table with 100 metrics with (id , timestamp ) as key,
>> if one wants to do a selective metric read
>>
>> select m1 from table where id = 10 and timestamp >= '2017-01-02
>> :00:00:00'
>> and timestamp <= '2017-01-02 04:00:00'
>>
>> does the read on the specific node happen first bringing all the metrics
>> m1 - m100 and then the metric is  sliced in memory and retrieve ,  or the
>> disk read happens only on the sliced data m1 without bringing m1- m100  ?
>>
>> here partition & clustering key is provided in the query, the question is
>> more towards efficiency operation on this schema for read.
>>
>> create table {
>> id : Int,.
>> timestamp : timestamp ,
>> m1 : Int,
>> m2  : Int,
>> m3  Int,
>> m4  Int,
>> ..
>> ..
>> m100 : Int
>>
>> Primary Key ( id, timestamp )
>> }
>>
>> Thanks
>>
>
>


Re: performance on reading only the specific nonPk column

2018-05-21 Thread kurt greaves
Every column will be retrieved (that's populated) from disk and the
requested column will then be sliced out in memory and sent back.

On 21 May 2018 at 08:34, sujeet jog  wrote:

> Folks,
>
> consider a table with 100 metrics with (id , timestamp ) as key,
> if one wants to do a selective metric read
>
> select m1 from table where id = 10 and timestamp >= '2017-01-02
> :00:00:00'
> and timestamp <= '2017-01-02 04:00:00'
>
> does the read on the specific node happen first bringing all the metrics
> m1 - m100 and then the metric is  sliced in memory and retrieve ,  or the
> disk read happens only on the sliced data m1 without bringing m1- m100  ?
>
> here partition & clustering key is provided in the query, the question is
> more towards efficiency operation on this schema for read.
>
> create table {
> id : Int,.
> timestamp : timestamp ,
> m1 : Int,
> m2  : Int,
> m3  Int,
> m4  Int,
> ..
> ..
> m100 : Int
>
> Primary Key ( id, timestamp )
> }
>
> Thanks
>


Re: performance on reading only the specific nonPk column

2018-05-21 Thread @Nandan@
First question:- [Just as Concern]
How are you making sure that your PK is giving Uniqueness?
For Example:- At the same time, 10 users will write data then how's your
schema going to tackle that.

Now on your question:-
does the read on the specific node happen first bringing all the metrics m1
- m100 and then the metric is  sliced in memory and retrieve ,  or the disk
read happens only on the sliced data m1 without bringing m1- m100  ?
In case of Selection, READ process will took place like below:-
First Cassandra will look into for ID = 10 then it will look in your
clustering range based on your timestamp given.



On Mon, May 21, 2018 at 4:34 PM, sujeet jog  wrote:

> Folks,
>
> consider a table with 100 metrics with (id , timestamp ) as key,
> if one wants to do a selective metric read
>
> select m1 from table where id = 10 and timestamp >= '2017-01-02
> :00:00:00'
> and timestamp <= '2017-01-02 04:00:00'
>
> does the read on the specific node happen first bringing all the metrics
> m1 - m100 and then the metric is  sliced in memory and retrieve ,  or the
> disk read happens only on the sliced data m1 without bringing m1- m100  ?
>
> here partition & clustering key is provided in the query, the question is
> more towards efficiency operation on this schema for read.
>
> create table {
> id : Int,.
> timestamp : timestamp ,
> m1 : Int,
> m2  : Int,
> m3  Int,
> m4  Int,
> ..
> ..
> m100 : Int
>
> Primary Key ( id, timestamp )
> }
>
> Thanks
>


performance on reading only the specific nonPk column

2018-05-21 Thread sujeet jog
Folks,

consider a table with 100 metrics with (id , timestamp ) as key,
if one wants to do a selective metric read

select m1 from table where id = 10 and timestamp >= '2017-01-02
:00:00:00'
and timestamp <= '2017-01-02 04:00:00'

does the read on the specific node happen first bringing all the metrics m1
- m100 and then the metric is  sliced in memory and retrieve ,  or the disk
read happens only on the sliced data m1 without bringing m1- m100  ?

here partition & clustering key is provided in the query, the question is
more towards efficiency operation on this schema for read.

create table {
id : Int,.
timestamp : timestamp ,
m1 : Int,
m2  : Int,
m3  Int,
m4  Int,
..
..
m100 : Int

Primary Key ( id, timestamp )
}

Thanks