Re: Repair and NodeSync

2020-04-02 Thread J.B. Langston
NodeSync can place higher load on your cluster than traditional repair. I
have seen some clusters that were OK with OpsCenter repair service or
manual repairs get overloaded with NodeSync.  So I would recommend testing
NodeSync out with a realistic workload to make sure your cluster can handle
it, and be ready to disable it quickly if there are signs of overloading
(high latencies, timeouts, or high CPU or I/O utilization).  Currently
NodeSync is only available for DSE, not Apache Cassandra.

The primary advantage of NodeSync is that it is operationally simpler. It
uses the read repair mechanism, so there is less possibility of a hanging
or long running repair, you would need to restart the whole process if a
node that was streaming goes down.

On Thu, Apr 2, 2020 at 10:57 AM Aakash Pandhi
 wrote:

> Hi All,
>
> I am reviewing our data sync procedures to improve so need your input on
> NodeSync.
>
> Are there any cons of implementing NodeSync over Repair? Is NodeSync a
> future direction for cluster wide data sync?
>
> Thank You,
> Aakash
>


Repair and NodeSync

2020-04-02 Thread Aakash Pandhi
Hi All, 
I am reviewing our data sync procedures to improve so need your input on 
NodeSync. 
Are there any cons of implementing NodeSync over Repair? Is NodeSync a future 
direction for cluster wide data sync? 
Thank You,Aakash

Re: Query data through python using IN clause

2020-04-02 Thread Nitan Kainth
Thanks Alex.

On Thu, Apr 2, 2020 at 1:39 AM Alex Ott  wrote:

> Hi
>
> Working code is below, but I want to warn you - prefer not to use IN with
> partition keys - because you'll have different partition key values,
> coordinator node will need to perform queries to other hosts that hold
> these partition keys, and this slow downs the operation, and adds an
> additional load to the coordinating node.  If you execute queries in
> parallel (using async) for every of combination of pk1 & pk2, and then
> consolidate data application side - this could be faster than query with
> IN.
>
> Answer:
>
> You need to pass list as value of temp - IN expects list there...
>
> query = session.prepare("select * from test.table1 where pk1 IN ? and
> pk2=0 and ck1 > ? AND ck1 < ?;")
> temp = [1,2,3]
>
> import dateutil.parser
>
> ck1 = dateutil.parser.parse('2020-01-01T00:00:00Z')
> ck2 = dateutil.parser.parse('2021-01-01T00:00:00Z')
>
> rows = session.execute(query, (temp, ck1, ck2))
> for row in rows:
> print row
>
>
>
>
> Nitan Kainth  at "Wed, 1 Apr 2020 18:21:54 -0500" wrote:
>  NK> Hi There,
>
>  NK> I am trying to read data from table as below structure:
>
>  NK> table1(
>  NK> pk1 bigint,
>  NK> pk2 bigint,
>  NK> ck1 timestamp,
>  NK> value text,
>  NK> primary key((pk1,pk2),ck1);
>
>  NK> query = session.prepare("select * from table1 where pk IN ? and pk2=0
> and ck1 > ? AND ck1 < ?;")
>
>  NK> temp = 1,2,3
>
>  NK> runq = session.execute(query2, (temp,ck1, ck1))
>
>  NK> TypeError: Received an argument of invalid type for column
> "in(bam_user)". Expected: ,
> Got:
>  NK> ; (cannot convert argument
> to integer)
>
>  NK> I found examples for prepared statements for inserts but couldn't
> find any for select and not able to make it to work.
>
>  NK> Any suggestions?
>
>
>
> --
> With best wishes,Alex Ott
> Principal Architect, DataStax
> http://datastax.com/
>


Re: Query data through python using IN clause

2020-04-02 Thread Alex Ott
Hi

Working code is below, but I want to warn you - prefer not to use IN with
partition keys - because you'll have different partition key values,
coordinator node will need to perform queries to other hosts that hold
these partition keys, and this slow downs the operation, and adds an
additional load to the coordinating node.  If you execute queries in
parallel (using async) for every of combination of pk1 & pk2, and then
consolidate data application side - this could be faster than query with IN.

Answer:

You need to pass list as value of temp - IN expects list there...

query = session.prepare("select * from test.table1 where pk1 IN ? and pk2=0 and 
ck1 > ? AND ck1 < ?;")
temp = [1,2,3]

import dateutil.parser

ck1 = dateutil.parser.parse('2020-01-01T00:00:00Z')
ck2 = dateutil.parser.parse('2021-01-01T00:00:00Z')

rows = session.execute(query, (temp, ck1, ck2))
for row in rows:
print row




Nitan Kainth  at "Wed, 1 Apr 2020 18:21:54 -0500" wrote:
 NK> Hi There,

 NK> I am trying to read data from table as below structure:

 NK> table1(
 NK> pk1 bigint,
 NK> pk2 bigint,
 NK> ck1 timestamp,
 NK> value text,
 NK> primary key((pk1,pk2),ck1);

 NK> query = session.prepare("select * from table1 where pk IN ? and pk2=0 and 
ck1 > ? AND ck1 < ?;")

 NK> temp = 1,2,3

 NK> runq = session.execute(query2, (temp,ck1, ck1))

 NK> TypeError: Received an argument of invalid type for column "in(bam_user)". 
Expected: , Got:
 NK> ; (cannot convert argument to 
integer)

 NK> I found examples for prepared statements for inserts but couldn't find any 
for select and not able to make it to work. 

 NK> Any suggestions?



-- 
With best wishes,Alex Ott
Principal Architect, DataStax
http://datastax.com/

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org