Re: Requesting some details for my use case

2016-01-05 Thread Bhuvan Rawal
se read documentation on differences between nosql and RDBMS. > > thanks. > > On Tue, Jan 5, 2016 at 6:20 AM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > >> Hi All, >> >> Im planning to shift from SQL database to a columnar nosql database, we >> have streamline

Re: Requesting some details for my use case

2016-01-07 Thread Bhuvan Rawal
to meet latency requirements, and then to scale up load > capacity by adding nodes. > > -- Jack Krupansky > > On Tue, Jan 5, 2016 at 4:02 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > >> *Thanks Jack* *for the detailed advice*. >> >> Yes it i

Requesting some details for my use case

2016-01-05 Thread Bhuvan Rawal
o specific requirement of immediate data consistency amongst nodes. Regards, Bhuvan Rawal SDE

Re: Requesting some details for my use case

2016-01-05 Thread Bhuvan Rawal
gesting and querying data. It's a bit more sophisticated than > just a simple JDBC interface. Most of your queries will need to be > rewritten anyway even though the CQL syntax does indeed look a lot like > SQL, but much of that will be because your data model will need to be made > NoSQL-

Re: Requesting some details for my use case

2016-01-05 Thread Bhuvan Rawal
alytics need, Cassandra is not > what you want. If you've just made the same mistake that 99% of people > make, well, now you know. Cassandra historically has been referred to as a > "Column Family" data store, which is easily mistaken for columnar. > > > On Tue, Jan 5, 2016 at 3

sstabledump failing for system keyspace tables

2016-06-11 Thread Bhuvan Rawal
I have been trying to obtain json dump of batches table using sstabledump but I get this exception: $ sstabledump /sstable/data/system/batches-919a4bc57a333573b03e13fc3f68b465/ma-277-big-Data.db Exception in thread "main" org.apache.cassandra.exceptions.ConfigurationException: Cannot use abstract

Re: select query on entire primary key returning more than one row in result

2016-06-14 Thread Bhuvan Rawal
Jira CASSANDRA-12003 Has been created for the same. On Tue, Jun 14, 2016 at 11:54 PM, Atul Saroha wrote: > Hi Tyler, > > This issue is mainly visible for tables having static columns, still > investigating. > We

Re: select query on entire primary key returning more than one row in result

2016-06-14 Thread Bhuvan Rawal
t; > On Tue, Jun 14, 2016 at 1:41 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > >> Jira CASSANDRA-12003 >> <https://issues.apache.org/jira/browse/CASSANDRA-12003> Has been created >> for the same. >> >> On Tue, Jun 14, 2016 at 11:54 PM, Atul S

Re: select query on entire primary key returning more than one row in result

2016-06-14 Thread Bhuvan Rawal
gt;>> narrow the issue down farther. I've played around locally with similar >>> schemas (sans the stratio indices) and couldn't reproduce the issue. >>> >>> On Tue, Jun 14, 2016 at 1:41 PM, Bhuvan Rawal <bhu1ra...@gmail.com> >>> wrote: >>> >>

Re: select query on entire primary key returning more than one row in result

2016-06-14 Thread Bhuvan Rawal
I have verified this issue to be fixed in 3.6 and 3.7. And the issue mentioned on this thread is fixed as well. On Wed, Jun 15, 2016 at 12:43 AM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > Joel, > > If we look at the schema carefully: > > CREATE TABLE test0 ( > pk i

Re: select query on entire primary key returning more than one row in result

2016-06-14 Thread Bhuvan Rawal
DRA-12003 > as a duplicate of CASSANDRA-11513. > > On Tue, Jun 14, 2016 at 4:21 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > >> Joel, >> >> Thanks for your reply, I have checked and found that the behavior is same >> in case of CASSANDRA-11513 >>

Re: Installing Cassandra from Tarball

2016-06-13 Thread Bhuvan Rawal
Hi Steve, Please find the responses in line: WARN 15:41:58 Unable to lock JVM memory (ENOMEM). This can result in part > of the JVM being swapped out, especially with mmapped I/O enabled. Increase > RLIMIT_MEMLOCK or run Cassandra as root. > You can edit -* /etc/security/limits.conf *and put

Re: Node Stuck while restarting

2016-05-30 Thread Bhuvan Rawal
. On Sun, May 29, 2016 at 7:12 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > Hi Mike, > > PFA the details you asked for: and some others if that helps: > we are using jvm params > -Xms8G > -Xmx8G > > MAX_HEAP_SIZE: & HEAP_NEWSIZE: is not being set and possibly ca

Node Stuck while restarting

2016-05-29 Thread Bhuvan Rawal
Hi, We are running a 6 Node cluster in 2 DC on DSC 3.0.3, with 3 Node each. One of the node was showing UNREACHABLE on other nodes in nodetool describecluster and on that node it was showing all others UNREACHABLE and as a measure we restarted the node. But on doing that it is stuck possibly at

Re: Node Stuck while restarting

2016-05-29 Thread Bhuvan Rawal
_mb > memtable_offheap_space_in_mb > > > Regards, > Mike Yeap > > > > On Sun, May 29, 2016 at 6:18 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > >> Hi, >> >> We are running a 6 Node cluster in 2 DC on DSC 3.0.3, with 3 Node each. >> One

Re: select query on entire primary key returning more than one row in result

2016-06-14 Thread Bhuvan Rawal
SSANDRA-11513 (or something else entirely) fixed the issue. > > On Tue, Jun 14, 2016 at 2:13 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > >> Joel, >> >> If we look at the schema carefully: >> >> CREATE TABLE test0 ( >> pk int, >>

Re: High Heap Memory usage during nodetool repair in Cassandra 3.0.3

2016-06-22 Thread Bhuvan Rawal
ions) as well. >> You can verify that by setting column_index_cache_size_in_kb in c.yaml to >> a really high value like 1000 - if you see the same behaviour in 3.7 >> with that setting, there’s not much you can do except upgrading to 3.7 as >> that change went into 3.6 and not into 3.0.x. >&g

Slow nodetool response time

2016-06-22 Thread Bhuvan Rawal
Hi, We have been facing slowness in getting response from nodetool for any of its subcommand. On the same version on AWS it responds really fast but on local 1 node machine or local DC cluster it performs very slow. On Local DC : *$ time nodetool version* ReleaseVersion: 3.0.3 real 0m*17.582s*

Re: Backup strategy

2016-06-16 Thread Bhuvan Rawal
Hi Vasu, Planet Cassandra has a documentation page for basic info about migrating to cassandra from MySQL. What to expect and what not to. It can be found here . I had a look at this slide

Re: Backup strategy

2016-06-16 Thread Bhuvan Rawal
that feature. Depending on the use case, you can use 1 or 2 or both. On Fri, Jun 17, 2016 at 4:46 AM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > What kind of data are we talking here? > Is it time series data with infrequent updates and only inserts or > frequently updated data. How frequen

Re: Backup strategy

2016-06-16 Thread Bhuvan Rawal
actices > > Thanks, > Vasu > > On Jun 16, 2016, at 6:51 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > > Hi Vasu, > > Planet Cassandra has a documentation page for basic info about migrating > to cassandra from MySQL. What to expect and what not to. It can be found

Re: High Heap Memory usage during nodetool repair in Cassandra 3.0.3

2016-06-22 Thread Bhuvan Rawal
sstables and vnodes > > On Wed, Jun 22, 2016 at 2:50 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > >> Thanks for the info Paulo, Robert. I tried further testing with other >> parameters and it was prevalent. We could be either 11739, 11206. But im >> spektical ab

Re: Slow nodetool response time

2016-06-22 Thread Bhuvan Rawal
d predictably scalable to any > size. With more than 500 customers in 45 countries, DataStax is the > database technology and transactional backbone of choice for the worlds > most innovative companies such as Netflix, Adobe, Intuit, and eBay. > > On Wed, Jun 22, 2016 at 9:03 AM, Bhuvan

Want inputs about super column family vs map/list

2016-02-04 Thread Bhuvan Rawal
Hi All, There are two ways to achieve this : 1. Using super column family: raman | atul | bhuvan --- 1234 | 5678 | 2345 OR Using single Collection column : Phone Number --- Map I would like to know which

Re: CASSANDRA-8072

2016-02-08 Thread Bhuvan Rawal
Hi Ted, Have you specified the listen_address and rpc_address? What addresses are there in the seed list? Have you started seed first and after waiting for 30 seconds started other nodes? On Tue, Feb 9, 2016 at 12:14 AM, Ted Yu wrote: > Hi, > I am trying to setup a

Re: specifying listen_address

2016-02-08 Thread Bhuvan Rawal
Hi Ted, Are you sure the path to yaml is correct? For me(DSE 4.8.4) it is /etc/dse/cassandra/cassandra.yaml On Mon, Feb 8, 2016 at 11:22 PM, Ted Yu wrote: > Hi, > I downloaded and expanded DSE 4.8.4 > > When I specify the following in resources/dse/conf/dse.yaml : > >

Re: specifying listen_address

2016-02-08 Thread Bhuvan Rawal
etProperty(PropertyUtils.java:132) > at > org.yaml.snakeyaml.introspector.PropertyUtils.getProperty(PropertyUtils.java:121) > > Thanks > > On Mon, Feb 8, 2016 at 10:04 AM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > >> Hi Ted, >> >> Are you sure the path to

Re: CASSANDRA-8072

2016-02-08 Thread Bhuvan Rawal
AM, Ted Yu <yuzhih...@gmail.com> wrote: > Here it is: > http://pastebin.com/QEdjtAj6 > > XX.YY is localhost in this case. > > On Mon, Feb 8, 2016 at 11:03 AM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > >> could you paste your cassandra.yaml here, except for c

Re: CASSANDRA-8072

2016-02-08 Thread Bhuvan Rawal
t; bq. What addresses are there in the seed list? > > The IP of the seed node. > > I haven't come to starting non-seed node(s) yet. > > Thanks for the quick response. > > On Mon, Feb 8, 2016 at 10:50 AM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > >> Hi Ted, >> >&g

Re: Getting error while issuing Cassandra stress

2016-01-22 Thread Bhuvan Rawal
nology and transactional backbone of choice for the worlds > most innovative companies such as Netflix, Adobe, Intuit, and eBay. > > On Fri, Jan 22, 2016 at 1:45 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > >> Hi Sebastian, >> >> I had attached nodetool status

Re: Getting error while issuing Cassandra stress

2016-01-22 Thread Bhuvan Rawal
ilt to be agile, always-on, and predictably scalable to any > size. With more than 500 customers in 45 countries, DataStax is the > database technology and transactional backbone of choice for the worlds > most innovative companies such as Netflix, Adobe, Intuit, and eBay. > > On Fri,

Re: Getting error while issuing Cassandra stress

2016-01-22 Thread Bhuvan Rawal
; > ----- > Alain > > The Last Pickle > http://www.thelastpickle.com > > > 2016-01-22 9:57 GMT+01:00 Bhuvan Rawal <bhu1ra...@gmail.com>: > >> Hi, >> >> i have created a POC cluster with 2

Re: Getting error while issuing Cassandra stress

2016-01-22 Thread Bhuvan Rawal
rack1 UN 10.41.55.22 218.72 KB 256 ? c7ba84fd-7992-41de-8c88-11574a72db99 rack1 Regards, Bhuvan Rawal On Sat, Jan 23, 2016 at 12:11 AM, Sebastian Estevez < sebastian.este...@datastax.com> wrote: > The output of `nodetool status` would help us diagnose. > > All the bes

Re: Getting error while issuing Cassandra stress

2016-01-23 Thread Bhuvan Rawal
drant-odbms> >>> >>> DataStax is the fastest, most scalable distributed database technology, >>> delivering Apache Cassandra to the world’s most innovative enterprises. >>> Datastax is built to be agile, always-on, and predictably scalable to any >>> size.

Re: Getting error while issuing Cassandra stress

2016-01-23 Thread Bhuvan Rawal
is easy enough to test, so it might be worth > it. Maybe the first thing I would try, just out of curiosity. > > C*heers, > > - > Alain > > The Last Pickle > http://www.thelastpickle.com > > 2016-01-23 11:13 GMT+01:00 Bhuvan Rawal <bhu1ra...@gmail.com>: &g

Need Feedback about cassandra-stress tests

2016-01-23 Thread Bhuvan Rawal
timeline 55336 785 785 6094 threadCount 406 total 55336 785 785 6094 threadCount 609 timeline 69326 831 831 6449 threadCount 609 total 69326 831 831 6449 threadCount 913 timeline 94283 837 837 6482 threadCount 913 total 94283 837 837 6482 Am I missing something here? Thanks & Regards, Bhuvan Rawal

Re: Getting error while issuing Cassandra stress

2016-01-22 Thread Bhuvan Rawal
technology and transactional backbone of choice for the worlds > most innovative companies such as Netflix, Adobe, Intuit, and eBay. > > On Fri, Jan 22, 2016 at 4:37 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > >> Yes im specifying -node parameter to stress, otherwise it th

Re: installing DSE

2016-02-12 Thread Bhuvan Rawal
I believe you missed this note : 1. Attention: Depending on your environment, you might need to replace @ in your email address with %40 and escape any character in your password that is used in your operating system's command line. Examples: \! and \| . On Sat, Feb 13, 2016 at 3:15

Re: Which version of Cassandra 3.x is production ready.

2016-03-16 Thread Bhuvan Rawal
This has been discussed in the past : https://www.mail-archive.com/user@cassandra.apache.org/msg45990.html This link should be useful for your case: https://www.eventbrite.com/engineering/what-version-of-cassandra -should-i-run/ 3.0.4 comes with a ton of features from 2.1.x which is considered

Creation of Async Datacenter for Monitoring purposes and Engineering Services purpose

2016-04-13 Thread Bhuvan Rawal
Hi All, We have 2 Running Datacenters in physically seperate DC's with 3 Nodes each. There is a requirement of an Audit DC for issuing queries which will not be concerned with live application traffic. Live Data delay of 1-2 Hours is acceptable. It is essential that replication to this DC to not

Re: How to create an additional cluster in Cassandra exclusively for Analytics Purpose

2016-03-05 Thread Bhuvan Rawal
> On Sun, Mar 6, 2016 at 12:25 AM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > >> Thanks Sean and Nirmallaya. >> >> @Jack, We are going with DSC right now and plan to use spark and later >> solr over the analytics DC. The use case is to have olap and oltp >>

Re: How to create an additional cluster in Cassandra exclusively for Analytics Purpose

2016-03-05 Thread Bhuvan Rawal
gt; >> Separation of workloads is one of the key powers of a Cassandra cluster. >> >> >> >> You may want to look at different configurations for the analytics cluster – smaller replication factor, more memory per node, more disk per node, perhaps less vnodes. Oth

Re: How to create an additional cluster in Cassandra exclusively for Analytics Purpose

2016-03-07 Thread Bhuvan Rawal
. > > On Mon, Mar 7, 2016 at 10:13 AM Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > >> Well thats certainly true, there are these points worth discussing here : >> >> 1. Scatter Gather queries - Especially if the cluster size is large. Say >> we have a 20 no

Re: How to create an additional cluster in Cassandra exclusively for Analytics Purpose

2016-03-07 Thread Bhuvan Rawal
Krupansky > > On Sun, Mar 6, 2016 at 1:29 AM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > >> Yes Jack, we are rolling out with Stratio right now, we will assess the >> performance benefit it yields and can go for ElasticSearch/Solr later. >> >> As per your exp

Re: Can we lengthy big column names in cassandra 3.0.3

2016-03-30 Thread Bhuvan Rawal
It has been discussed in past in https://issues.apache.org/jira/browse/CASSANDRA-4175. I believe it is fixed in https://issues.apache.org/jira/browse/CASSANDRA-8099, though we have not evaluated the performance. Will be glad if someone can reply with benchmarks. On Wed, Mar 30, 2016 at 4:49 PM,

Re: Balancing tokens over 2 datacenter

2016-04-13 Thread Bhuvan Rawal
This could be because of the way you have configured the policy, have a look at the below links for configuring the policy https://datastax.github.io/python-driver/api/cassandra/policies.html http://stackoverflow.com/questions/22813045/ability-to-write-to-a-particular-cassandra-node Regards,

Re: Hi Memory consumption with Copy command

2016-04-23 Thread Bhuvan Rawal
already transferred in ~5 Minutes. Just a final question before we close this thread, at this performance level would you recommend sstable loader or copy command? On Sat, Apr 23, 2016 at 2:00 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > Thanks Stefania for the informative answer. The

Re: Hi Memory consumption with Copy command

2016-04-23 Thread Bhuvan Rawal
ERATION is not making progress. > > Do let us know if you still have problems, as this is new functionality. > > With best regards, > Stefania > > > On Sat, Apr 23, 2016 at 6:34 AM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > >> Hi, >> >> Im tryin

Blocking read repair giving consistent data but not repairing existing data

2016-05-12 Thread Bhuvan Rawal
Hi, We are using dsc 3.0.3 on total of *6 Nodes*,* 2 DC's, 3 Node Each, RF-3* so every node has complete data. Now we are facing a situation on a table with 1 partition key, 2 clustering columns and 4 normal columns. Out of the 6 5 nodes has a single value and Partition key, 2 clustering key for

Re: A question to 'paging' support in DataStax java driver

2016-05-09 Thread Bhuvan Rawal
Hi Doan, What does it have to do being eventual consistency? Lets assume a scenario with complete consistency and we are at page X, and at the same time some inserts/updates happened at page X-2 and we jumped to that. User will see inconsistent page in that case as well, right? Also in such cases

Re: Increasing replication factor and repair doesn't seem to work

2016-05-24 Thread Bhuvan Rawal
Hi Luke, You mentioned that replication factor was increased from 1 to 2. In that case was the node bearing ip 10.128.0.20 carried around 3GB data earlier? You can run nodetool repair with option -local to initiate repair local datacenter for gce-us-central1. Also you may suspect that if a lot

Re: Increasing replication factor and repair doesn't seem to work

2016-05-24 Thread Bhuvan Rawal
> is almost a GB lower and then of course 10.128.0.20 which is missing over > 5 GB of data. I tried running nodetool -local on both DCs and it didn't > fix either one. > > Am I running into a bug of some kind? > > On Tue, May 24, 2016 at 4:06 PM Bhuvan Rawal <bhu1ra...@gmail.

Performance impact of wide rows on read heavy workload

2016-07-21 Thread Bhuvan Rawal
Hi, We are trying to evaluate read performance impact of having a wide row by pushing a partition out into clustering column. From all the information I could gather[1] [2]

Re: [Multi DC] Old Data Not syncing from Existing cluster to new Cluster

2017-01-30 Thread Bhuvan Rawal
Hi Abhishek, nodetool status output can be misleading at times. In order to ensure data is in sync, schedule a repair for the imapcted keyspaces. Regards, On Mon, Jan 30, 2017 at 10:13 AM, Abhishek Kumar Maheshwari < abhishek.maheshw...@timesinternet.in> wrote: > But how I will tell rebuild

Re: High disk io read load

2017-02-20 Thread Bhuvan Rawal
fault of 64kb (https://cl.ly/2w0V3U1q1I1Y). I guess > setting a read ahead of 8kb is totally pointless if CS reads 64kb if it > only has to fetch a single row, right? Are there recommendations for that > setting? > > 2017-02-19 19:15 GMT+01:00 Bhuvan Rawal <bhu1ra...@gmail.com

Re: High disk io read load

2017-02-18 Thread Bhuvan Rawal
16 <0800%20166076416> >> /dev/sda >> rw16 512 4096 2048800164151296 /dev/sda1 >> rw16 512 4096 0800166076416 <0800%20166076416> >> /dev/sdb >> rw16 512 4096 2048800165027840 /dev/sdb1 >> rw

Re: High disk io read load

2017-02-19 Thread Bhuvan Rawal
Hi Edward, This could have been a valid case here but if hotspots indeed existed then along with really high disk io , the node should have been doing proportionate high network io as well. - higher queries per second as well. But from the output shared by Benjamin that doesnt appear to be the

Re: Logging queries

2017-02-18 Thread Bhuvan Rawal
.in> wrote: > Hi Bhuvan, > Thanks a lot! > > Any idea if something can be done for C* 2.X? > > Best, > Igor > > 2017-02-18 16:41 GMT-03:00 Bhuvan Rawal <bhu1ra...@gmail.com>: > >> Hi Igor, >> >> If you are using java driver, you can lo

Re: High disk io read load

2017-02-18 Thread Bhuvan Rawal
Hi Benjamin, What is the disk read ahead on both nodes? Regards, Bhuvan On Sun, Feb 19, 2017 at 1:58 AM, Benjamin Roth wrote: > This is status of the largest KS of these both nodes: > UN 10.23.71.10 437.91 GiB 512 49.1% >

Re: High disk io read load

2017-02-18 Thread Bhuvan Rawal
node. Regards, On Sun, Feb 19, 2017 at 2:07 AM, Benjamin Roth <benjamin.r...@jaumo.com> wrote: > cat /sys/block/sda/queue/read_ahead_kb > => 8 > > On all CS nodes. Is that what you mean? > > 2017-02-18 21:32 GMT+01:00 Bhuvan Rawal <bhu1ra...@gmail.com>: > &g

Re: Logging queries

2017-02-18 Thread Bhuvan Rawal
Hi Igor, If you are using java driver, you can log slow queries on client side using QueryLogger. https://docs.datastax.com/en/developer/java-driver/2.1/manual/logging/ Slow Query logger for server was introduced in C* 3.10 version. Details: https://issues.apache.org/jira/browse/CASSANDRA-12403

Isolation in case of Single Partition Writes and Batching with LWT

2016-09-06 Thread Bhuvan Rawal
Hi, We are working to solve on a multi threaded distributed design which in which a thread reads current state from Cassandra (Single partition ~ 20 Rows), does some computation and saves it back in. But it needs to be ensured that in between reading and writing by that thread any other thread

Bootstrapping multiple cassandra nodes simultaneously in existing dc

2016-09-11 Thread Bhuvan Rawal
Hi, We are running Cassandra 3.6 and want to bump up Cassandra nodes in an existing datacenter from 3 to 12 (plan to move to r3.xlarge machines to leverage more memory instead of m4.2xlarge). Bootstrapping a node would take 7-8 hours. If this activity is performed serially then it will take 5-6

Re: Isolation in case of Single Partition Writes and Batching with LWT

2016-09-09 Thread Bhuvan Rawal
artition and we are updating all using lwt the clients should read either all of them old or all of them during batch update. Will be glad if someone can clarify the above doubt. On Tue, Sep 6, 2016 at 11:18 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > Hi, > > We are working

High CPU usage by cqlsh when network is disconnected on client

2016-09-30 Thread Bhuvan Rawal
Hi, We are using Cassandra 3.6 and I have been facing this issue for a while. When I connect to a cassandra cluster using cqlsh and disconnect the network keeping cqlsh on, I get really high cpu utilization on client by cqlsh python process. On network reconnect things return back to normal.

Having secondary indices limited to analytics dc

2016-09-18 Thread Bhuvan Rawal
Hi, Is it possible to have secondary indices (SASI or native ones) defined on a table restricted to a particular DC? For instance it is very much possible in mysql to have a parent server on which writes are being done without any indices (other than the required ones), and to have indices on

Re: Having secondary indices limited to analytics dc

2016-09-18 Thread Bhuvan Rawal
ta + indexes) and in dc2 as > cassandra (having only data). > > On Sun, Sep 18, 2016 at 5:43 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > >> Hi, >> >> Is it possible to have secondary indices (SASI or native ones) defined on >> a table restricted to a partic

Re: Having secondary indices limited to analytics dc

2016-09-18 Thread Bhuvan Rawal
or it, it's a good idea (if it > doesn't exist already). Post back to the ML with the issue #. > > On Sun, Sep 18, 2016 at 12:26 PM Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > >> Can it be possible with change log feature implemented in CASSANDRA-8844 >> <https://

Re: An extremely fast cassandra table full scan utility

2016-10-03 Thread Bhuvan Rawal
; >> You can customize the options, setting fetch size, consistency level, >> degree of parallelism(number of threads) according to your need. >> >> You can visit https://github.com/siddv29/cfs to go through the code, see >> the logic behind it, or try it in your program. >

Re: An extremely fast cassandra table full scan utility

2016-10-03 Thread Bhuvan Rawal
Hi Jonathan, If full scan is a regular requirement then setting up a spark cluster in locality with Cassandra nodes makes perfect sense. But supposing that it is a one off requirement, say a weekly or a fortnightly task, a spark cluster could be an added overhead with additional capacity,

Iterating over a table with multiple producers [Python]

2016-09-25 Thread Bhuvan Rawal
Hi, Its a common occurrence where full scan of Cassandra table is required. One of the most common requirement is to get the count of rows in a table. As Cassandra doesn't keep count information stored anywhere (A node may not have any clue about writes happening on other nodes) when we aggregate

Cassandra Read / Write Benchmarks with Stress - Public listing

2016-10-19 Thread Bhuvan Rawal
Hi, Is there any public listing of cassandra performance test results with cstar or cassandra-stress for read and write, with mention of configurations modified from default and cassandra version. It would be useful to not redo and do optimisations for cassandra wrt Threadpools / JVM tuning /

Re: High system CPU during high write workload

2016-11-15 Thread Bhuvan Rawal
Hi Ben, Thanks for your reply, we tested the same workload on kernel version 4.6.4-1.el7.elrepo.x86_64 and found the issue to be not present there. This had resulted in really high CPU in write workloads -> area in which cassandra excels. Degrading performance by atleast 5x! I suggest this

Re: Incremental Repair Migration

2017-01-10 Thread Bhuvan Rawal
Hi Amit, You can try reaper, it makes repairs effortless. There are a host of other benefits but most importantly it offers a Single portal to manage & track ongoing as well as past repairs. For incremental repairs it breaks it into single segment per node, if you find that it's indeed the

Re: Strange issue wherein cassandra not being started from cron

2017-01-09 Thread Bhuvan Rawal
Hi Ajay, Have you had a look at cron logs? - mine is in path /var/log/cron Thanks & Regards, On Tue, Jan 10, 2017 at 9:45 AM, Ajay Garg wrote: > Hi All. > > Facing a very weird issue, wherein the command > > */etc/init.d/cassandra start* > > causes cassandra to start

Re: Bootstrapping multiple cassandra nodes simultaneously in existing dc

2016-12-03 Thread Bhuvan Rawal
e such expansion multiple times and can really recommend >>> bootstrapping a new DC and pointing your clients to it. The process is so >>> much faster and the documentation you referred to has worked out fine for >>> me. >>> >>> Cheers, >>> Jens >

Re: Reaper repair seems to "hang"

2017-01-03 Thread Bhuvan Rawal
the mailing list that I've created a GitHub issue at: https://github.com/ thelastpickle/cassandra-reaper/issues/39 Kind regards, Daniel On Wed, Jan 4, 2017 at 6:31 AM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > Hi Daniel, > > We faced a similar issue during repair with reaper. We ran re

Re: Reaper repair seems to "hang"

2017-01-03 Thread Bhuvan Rawal
Hi Daniel, We faced a similar issue during repair with reaper. We ran repair with more repair threads than number of cassandra nodes. But on and off repair was getting stuck and we had to do rolling restart of cluster or wait for lock time to expire (~1hr). We had a look at the stuck repair,

Re: why dose it still have to seach in SSTable when getting data in memtable in the read flow?

2017-03-27 Thread Bhuvan Rawal
Also Cassandra working unit is Cells so in a partition there may be possibility of some cells in a row being present in memtable and others may be located in memtable therefore the need of reconciling partition data. @Jason's point is valid too - User defined timestamp may put sstable cells

Re: [Cassandra 3.0.9 ] Disable “delete/Truncate/Drop”

2017-04-04 Thread Bhuvan Rawal
Hi Abhishek, You can restrict commands a user can issue by enabling authentication & authorization, then authorizing concerned user with appropriate privileges. For reference : http://cassandra.apache.org/doc/latest/cql/security.html Thanks, Bhuvan On Tue, Apr 4, 2017 at 1:58 PM, Abhishek

Re: nodetool status high load info

2017-04-12 Thread Bhuvan Rawal
Try nodetool tpstats - it can lead you to where your threads are stuck. There could be various reasons for load factor to go high like disk/cpu getting choked, you'll probably need to check dstat & iostat output along with Cassandra Threadpool stats to get a decent idea. On Wed, Apr 12, 2017 at

Re: how to recover a dead node using commit log when memtable is lost

2017-04-05 Thread Bhuvan Rawal
I beg to differ with @Matija here, IMO by default cassandra syncs data into commit log in a periodic fashion with a fsync period of 10 sec (Ref - https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L361). If a write is not written to disk and RF is 1 else CL is Local One & node goes

Re: scylladb

2017-03-09 Thread Bhuvan Rawal
I'd say the benchmark would be complete only when at the point of inflexion the necessary system benchmarks are provided. Looking at scylladb report it is unclear as to what system parameter was being the bottleneck. Also an observation - its mentioned in the report that they are using 1KB ro and

Re: scylladb

2017-03-11 Thread Bhuvan Rawal
une. > > Avi > > [1] https://www.infoq.com/presentations/scylladb > [2] http://www.scylladb.com/technology/cassandra-vs-scylla- > benchmark-cluster-1/ > > > On 03/10/2017 06:58 PM, Bhuvan Rawal wrote: > > Agreed C++ gives an added advantage to talk to underlying hardwa

Re: scylladb

2017-03-12 Thread Bhuvan Rawal
​ On Sun, Mar 12, 2017 at 2:42 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > Looking at the costs of cloud instances, it clearly appears the cost of > CPU dictates the overall cost of the instance. Having 2X more cores > increases cost by nearly 2X keeping other things same

Re: scylladb

2017-03-12 Thread Bhuvan Rawal
Looking at the costs of cloud instances, it clearly appears the cost of CPU dictates the overall cost of the instance. Having 2X more cores increases cost by nearly 2X keeping other things same as can be seen below as an example: (C3 may have slightly better processor but not more than 10-15%

Re: scylladb

2017-03-10 Thread Bhuvan Rawal
Agreed C++ gives an added advantage to talk to underlying hardware with better efficiency, it sound good but can a pice of code written in C++ give 1000% throughput than a Java app? Is TPC design 10X more performant than SEDA arch? And if C/C++ is indeed that fast how can Aerospike (which is

Re: tolerate how many nodes down in the cluster

2017-07-24 Thread Bhuvan Rawal
Hi Peng , This really depends on how you have configured your topology. Say if you have segregated your dc into 3 racks with 10 servers each. With RF of 3 you can safely assume your data to be available if one rack goes down. But if different servers amongst the racks fail then i guess you are

Re: Migrating a cluster

2017-05-01 Thread Bhuvan Rawal
+1 to Justin's answer! As an additional step it's always good to run a full repair before deleting data on existing nodes, as there is a possibility of ioexceptions during rebuild. (Things like https://issues.apache.org/jira/browse/CASSANDRA-12830) Also if you are on 3.8+ , you may go for CDC

Re: EC2 instance recommendations

2017-05-23 Thread Bhuvan Rawal
i3 instances will undoubtedly give you more meat for buck - easily 40K+ iops whereas on the other hand EBS maxes out at 20K PIOPS which is highly expensive (at times they can cost you significantly more than cost of instance). But they have ephemeral local storage and data is lost once instance is