Re: Repair and NodeSync
NodeSync can place higher load on your cluster than traditional repair. I have seen some clusters that were OK with OpsCenter repair service or manual repairs get overloaded with NodeSync. So I would recommend testing NodeSync out with a realistic workload to make sure your cluster can handle it, and be ready to disable it quickly if there are signs of overloading (high latencies, timeouts, or high CPU or I/O utilization). Currently NodeSync is only available for DSE, not Apache Cassandra. The primary advantage of NodeSync is that it is operationally simpler. It uses the read repair mechanism, so there is less possibility of a hanging or long running repair, you would need to restart the whole process if a node that was streaming goes down. On Thu, Apr 2, 2020 at 10:57 AM Aakash Pandhi wrote: > Hi All, > > I am reviewing our data sync procedures to improve so need your input on > NodeSync. > > Are there any cons of implementing NodeSync over Repair? Is NodeSync a > future direction for cluster wide data sync? > > Thank You, > Aakash >
Repair and NodeSync
Hi All, I am reviewing our data sync procedures to improve so need your input on NodeSync. Are there any cons of implementing NodeSync over Repair? Is NodeSync a future direction for cluster wide data sync? Thank You,Aakash
Re: Query data through python using IN clause
Thanks Alex. On Thu, Apr 2, 2020 at 1:39 AM Alex Ott wrote: > Hi > > Working code is below, but I want to warn you - prefer not to use IN with > partition keys - because you'll have different partition key values, > coordinator node will need to perform queries to other hosts that hold > these partition keys, and this slow downs the operation, and adds an > additional load to the coordinating node. If you execute queries in > parallel (using async) for every of combination of pk1 & pk2, and then > consolidate data application side - this could be faster than query with > IN. > > Answer: > > You need to pass list as value of temp - IN expects list there... > > query = session.prepare("select * from test.table1 where pk1 IN ? and > pk2=0 and ck1 > ? AND ck1 < ?;") > temp = [1,2,3] > > import dateutil.parser > > ck1 = dateutil.parser.parse('2020-01-01T00:00:00Z') > ck2 = dateutil.parser.parse('2021-01-01T00:00:00Z') > > rows = session.execute(query, (temp, ck1, ck2)) > for row in rows: > print row > > > > > Nitan Kainth at "Wed, 1 Apr 2020 18:21:54 -0500" wrote: > NK> Hi There, > > NK> I am trying to read data from table as below structure: > > NK> table1( > NK> pk1 bigint, > NK> pk2 bigint, > NK> ck1 timestamp, > NK> value text, > NK> primary key((pk1,pk2),ck1); > > NK> query = session.prepare("select * from table1 where pk IN ? and pk2=0 > and ck1 > ? AND ck1 < ?;") > > NK> temp = 1,2,3 > > NK> runq = session.execute(query2, (temp,ck1, ck1)) > > NK> TypeError: Received an argument of invalid type for column > "in(bam_user)". Expected: , > Got: > NK> ; (cannot convert argument > to integer) > > NK> I found examples for prepared statements for inserts but couldn't > find any for select and not able to make it to work. > > NK> Any suggestions? > > > > -- > With best wishes,Alex Ott > Principal Architect, DataStax > http://datastax.com/ >
Re: Query data through python using IN clause
Hi Working code is below, but I want to warn you - prefer not to use IN with partition keys - because you'll have different partition key values, coordinator node will need to perform queries to other hosts that hold these partition keys, and this slow downs the operation, and adds an additional load to the coordinating node. If you execute queries in parallel (using async) for every of combination of pk1 & pk2, and then consolidate data application side - this could be faster than query with IN. Answer: You need to pass list as value of temp - IN expects list there... query = session.prepare("select * from test.table1 where pk1 IN ? and pk2=0 and ck1 > ? AND ck1 < ?;") temp = [1,2,3] import dateutil.parser ck1 = dateutil.parser.parse('2020-01-01T00:00:00Z') ck2 = dateutil.parser.parse('2021-01-01T00:00:00Z') rows = session.execute(query, (temp, ck1, ck2)) for row in rows: print row Nitan Kainth at "Wed, 1 Apr 2020 18:21:54 -0500" wrote: NK> Hi There, NK> I am trying to read data from table as below structure: NK> table1( NK> pk1 bigint, NK> pk2 bigint, NK> ck1 timestamp, NK> value text, NK> primary key((pk1,pk2),ck1); NK> query = session.prepare("select * from table1 where pk IN ? and pk2=0 and ck1 > ? AND ck1 < ?;") NK> temp = 1,2,3 NK> runq = session.execute(query2, (temp,ck1, ck1)) NK> TypeError: Received an argument of invalid type for column "in(bam_user)". Expected: , Got: NK> ; (cannot convert argument to integer) NK> I found examples for prepared statements for inserts but couldn't find any for select and not able to make it to work. NK> Any suggestions? -- With best wishes,Alex Ott Principal Architect, DataStax http://datastax.com/ - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org