Re: tolerate how many nodes down in the cluster

2017-07-24 Thread Bhuvan Rawal
Hi Peng ,

This really depends on how you have configured your topology. Say if you
have segregated your dc into 3 racks with 10 servers each. With RF of 3 you
can safely assume your data to be available if one rack goes down.

But if different servers amongst the racks fail then i guess you are not
guaranteeing data integrity with RF of 3 in that case you can at max lose 2
servers to be available. Best idea would be to plan failover modes
appropriately and letting cassandra know of the same.

Regards,
Bhuvan

On Mon, Jul 24, 2017 at 3:28 PM, Peng Xiao <2535...@qq.com> wrote:

> Hi,
>
> Suppose we have a 30 nodes cluster in one DC with RF=3,
> how many nodes can be down?can we tolerate 10 nodes down?
> it seems that we are not able to avoid  the data distribution 3 replicas
> in the 10 nodes?,
> then we can only tolerate 1 node down even we have 30 nodes?
> Could anyone please advise?
>
> Thanks
>


Re: EC2 instance recommendations

2017-05-23 Thread Bhuvan Rawal
i3 instances will undoubtedly give you more meat for buck - easily 40K+
iops whereas on the other hand EBS maxes out at 20K PIOPS which is highly
expensive (at times they can cost you significantly more than cost of
instance).
But they have ephemeral local storage and data is lost once instance is
stopped, you need to be prudent in case of i series, it is generally used
for large persistent caches.

Regards,
Bhuvan

On Tue, May 23, 2017 at 4:55 AM, Gopal, Dhruva 
wrote:

> Hi –
>
>   We’ve been running M4.2xlarge EC2 instances with 2-3 TB of storage and
> have been comparing this to I-3.2xlarge, which seems more cost effective
> when dealing with this amount of storage and from an IOPS perspective. Does
> anyone have any recommendations/ on the I-3s and how it performs overall,
> compared to the M4 equivalent? On the surface, without us having taken it
> through its paces performance-wise, it does seem to be pretty powerful. We
> just ran through an exercise with a RAIDed 200 TB volume (as opposed to a
> non RAIDed 3 TB volume) and were seeing a 20-30% improvement with the
> RAIDed setup, on a 6 node Cassandra ring. Just looking for any
> feedback/experience folks may have had with the I-3s.
>
>
>
> Regards,
>
> *DHRUVA GOPAL*
>
> *sr. MANAGER, ENGINEERING*
>
> *REPORTING, ANALYTICS AND BIG DATA*
>
> *+1 408.325.2011* *WORK*
>
> *+1 408.219.1094* *MOBILE*
>
> *UNITED STATES*
>
> *dhruva.go...@aspect.com  *
>
> *aspect.com *
>
> [image: escription: http://webapp2.aspect.com/EmailSigLogo-rev.jpg]
>
>
> This email (including any attachments) is proprietary to Aspect Software,
> Inc. and may contain information that is confidential. If you have received
> this message in error, please do not read, copy or forward this message.
> Please notify the sender immediately, delete it from your system and
> destroy any copies. You may not further disclose or distribute this email
> or its attachments.
>


Re: Migrating a cluster

2017-05-01 Thread Bhuvan Rawal
+1 to Justin's answer!

As an additional step it's always good to run a full repair before deleting
data on existing nodes, as there is a possibility of ioexceptions during
rebuild. (Things like https://issues.apache.org/jira/browse/CASSANDRA-12830)

Also if you are on 3.8+ , you may go for CDC approach and instead of adding
a dc you can instead create a new cluster. Though this will involve some
downtime. Probable steps in that case:
1. Create cluster with new hardware machines
2. Migrate existing sstables
3. Bring app down & load CDC data into new cluster for the time elapsed
during step 2
4. Bring app up


On 02-May-2017 7:08 AM, "Justin Cameron"  wrote:

Yes - this is the recommended way to migrate to another DC.

Before you start the migration you'll need to ensure
1. that the replication strategy of all your keyspaces is
NetworkTopologyStrategy (if not, change it to this using ALTER KEYSPACE),
and
2. that each of your clients is using the DcAwareRoundRobinPolicy load
balancing policy, and that the localDc parameter is set to the name of your
existing data centre. https://github.com/datastax/java-driver/tree/3.x/
manual/load_balancing#dcawareroundrobinpolicy

In addition to points 1&2, in order to ensure that your clients do not
contact nodes in the new data centre, you will also need to use a LOCAL
consistency level for all your queries (e.g. LOCAL_QUORUM instead of QUORUM)

Cheers,
Justin


On Tue, 2 May 2017 at 11:02 Voytek Jarnot  wrote:

> Have a scenario where it's necessary to migrate a cluster to a different
> set of hardware with minimal downtime. Setup is:
>
> Current cluster: 4 nodes, RF 3
> New cluster: 6 nodes, RF 3
>
> My initial inclination is to follow this writeup on setting up the 6 new
> nodes as a new DC: https://docs.datastax.com/en/cassandra/3.0/cassandra/
> operations/opsAddDCToCluster.html
>
> Basically, set up new DC, nodetool rebuild on new nodes to instruct
> Cassandra to migrate data, change client to hit new DC, kill original DC.
>
> First question - is this the recommended way to migrate an in-use cluster
> to new hardware?
>
> Secondly, on the assumption that it is: That link gives the impression
> that DC-aware clients will not hit the "remote" DC - is that the case for
> the Java driver? We don't currently explicitly set PoolingOptions
> ConnectionsPerHost for HostDistance.REMOTE to 0 - seems like that would be
> an important thing to do?
>
> Thank you.
>
-- 


*Justin Cameron*Senior Software Engineer





This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: nodetool status high load info

2017-04-12 Thread Bhuvan Rawal
Try nodetool tpstats - it can lead you to where your threads are stuck.
There could be various reasons for load factor to go high like disk/cpu
getting choked, you'll probably need to check dstat & iostat output along
with Cassandra Threadpool stats to get a decent idea.

On Wed, Apr 12, 2017 at 1:48 PM, Osman YOZGATLIOGLU <
osman.yozgatlio...@krontech.com> wrote:

> Hello,
>
> Nodetool status shows much more than actual data size.
> When I restart node, it shows normal a while and increase load in time.
> Where should I look?
>
> Cassandra 3.0.8, jdk 1.8.121
>
> Regards,
> Osman
>
>
> This e-mail message, including any attachments, is for the sole use of the
> person to whom it has been sent, and may contain information that is
> confidential or legally protected. If you are not the intended recipient or
> have received this message in error, you are not authorized to copy,
> distribute, or otherwise use this message or its attachments. Please notify
> the sender immediately by return e-mail and permanently delete this message
> and any attachments. KRON makes no warranty that this e-mail is error or
> virus free.
>


Re: how to recover a dead node using commit log when memtable is lost

2017-04-05 Thread Bhuvan Rawal
I beg to differ with @Matija here, IMO by default cassandra syncs data into
commit log in a periodic fashion with a fsync period of 10 sec (Ref -
https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L361).
If a write is not written to disk and RF is 1 else CL is Local One & node
goes down then there could be potential data loss and client would expect
data to be present.

Therefore a good strategy would be to either have RF 3 and write with
quorum. Or if thats not a feasible option then use Batch Mode for commitlog
sync - That could lead to much higher disk io overhead - (say if you fsync
every 10 ms in batch (write latency in this case will be 10 ms as write
threads will be blocked for 10ms.  assuming continuous writes- you would be
issuing 1000/10 IO write IO to disk - 100 IOPS. If thats kept to 1ms to
reduce write latency to 1ms then IOPS becomes 1000)).

So in case of batch mode it get tricky to balance latency & disk
utilisation. Testing this setting thoroughly on dev env would be
recommended as it can adversely affect performance. We had done some
benchmarks and had found 50ms to be ideal for our use case but thats
subjective as it leads to write latencies in excess of 50ms which could be
really high for some use cases. Though with modern day ssd's batch option
can be worthwhile to experiment with.

A good description is also given here -
http://stackoverflow.com/a/31033900/3646120

On Thu, Apr 6, 2017 at 12:30 AM, Matija Gobec  wrote:

> Flushes have nothing to do with data persistence and node failure. Each
> write is acknowledged only when data has been written to the commit log AND
> memtable. That solves the issues of node failures and data consistency.
> When the node boots back up it replays commit log files and you don't loose
> data that was already written to that node.
>
> On Wed, Apr 5, 2017 at 6:22 PM, preetika tyagi 
> wrote:
>
>> Hi,
>>
>> I read in Cassandra architecture documentation that if a node dies and
>> there is some data in memtable which hasn't been written to the sstable,
>> the commit log replay happens (assuming the commit log had been flushed to
>> disk) when the node restarts and hence the data can be recovered.
>>
>> However, I was wondering if a node is fully dead for some reason with
>> consistency level 1 (replication factor 3 but let's say it dies right after
>> it finishes a request and hence the data hasn't been replicated to other
>> nodes yet) and the data is only available in commit log on that node. Is
>> there a way to recover data from this node which includes both sstable and
>> commit log so that we can use it to replace it with a new node where we
>> could replay commit log to recover the data?
>>
>> Thanks,
>> Preetika
>>
>
>


Re: [Cassandra 3.0.9 ] Disable “delete/Truncate/Drop”

2017-04-04 Thread Bhuvan Rawal
Hi Abhishek,

You can restrict commands a user can issue by enabling authentication &
authorization, then authorizing concerned user with appropriate privileges.

For reference : http://cassandra.apache.org/doc/latest/cql/security.html

Thanks,
Bhuvan

On Tue, Apr 4, 2017 at 1:58 PM, Abhishek Kumar Maheshwari <
abhishek.maheshw...@timesinternet.in> wrote:

> Hi all,
>
>
>
> There is any way to disable “delete/Truncate/Drop” command on Cassandra?
>
>
>
> If yes then how we can implement this?
>
>
>
> *Thanks & Regards,*
> *Abhishek Kumar Maheshwari*
> *+91- 805591 (Mobile)*
>
> Times Internet Ltd. | A Times of India Group Company
>
> FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
>
> *P** Please do not print this email unless it is absolutely necessary.
> Spread environmental awareness.*
>
>
> The “Times Cartoonist Hunt” is now your chance to be the next legendary
> cartoonist. Send us 2 original cartoons, one on current affairs and the
> second on any subject of your choice. All entries must be uploaded on
> www.toicartoonisthunt.com by 5th April 2017. Alternatively, you can email
> your entries at toicarto...@gmail.com with your Name, Age, City and
> Mobile number. Gear up, the Hunt has begun!
>


Re: why dose it still have to seach in SSTable when getting data in memtable in the read flow?

2017-03-27 Thread Bhuvan Rawal
Also Cassandra working unit is Cells so in a partition there may be
possibility of some cells in a row being present in memtable and others may
be located in memtable therefore the need of reconciling partition data.

@Jason's point is valid too - User defined timestamp  may put sstable cells
ahead of memtable ones.

Thanks,
Bhuvan

On Mon, Mar 27, 2017 at 5:29 PM, jason zhao yang <
zhaoyangsingap...@gmail.com> wrote:

> Hi,
>
> Cassandra uses last-writetime-win strategy.
>
> In memory data doesn't mean it is the latest data due to custom write
> time, if data is also in Sstable, Cassandra has to read it and reconcile.
>
> Jasonstack
>
> On Mon, 27 Mar 2017 at 7:53 PM, 赵豫峰  wrote:
>
>> hello, I get the message that "If the memtable has the desired partition
>> data, then the data is read and then merged with the data from the
>> SSTables. The SSTable data is accessed as shown in the following steps."
>> in "how is data read?" chapter  in http://docs.datastax.com/en/
>> archived/cassandra/2.2/cassandra/dml/dmlAboutReads.html.
>>
>> I do not understand that why have to read SSTable when it has got target
>> data in memtable. If the data is in memtable, it means that data is lastest
>> one, is there any other reason that it still has to seach in SSTable?
>>
>> Thanks!
>>
>>
>> --
>> 赵豫峰
>>
>> 环信即时通讯云/研发
>>
>>
>


Re: scylladb

2017-03-12 Thread Bhuvan Rawal
​

On Sun, Mar 12, 2017 at 2:42 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:

> Looking at the costs of cloud instances, it clearly appears the cost of
> CPU dictates the overall cost of the instance. Having 2X more cores
> increases cost by nearly 2X keeping other things same as can be seen below
> as an example:
>
> (C3 may have slightly better processor but not more than 10-15% peformance
> increase)
>
> Optimising for fewer CPU cycles will invariably reduce costs by a large
> factor. On a modern day machine with SSD's where data density on node can
> be high more requests can be assumed to be served from single node, things
> get CPU bound. Perhaps its because it was invented at a time when SSD's did
> not exist. If we observe closely, many of cassandra defaults are assuming
> disk is rotational - number of flush writers, concurrent compactors, etc.
> The design suggest that too (Using Sequential io as far as possible. Infact
> thats the underlying philosophy for sequential sstable flushes and
> sequential commitlog files to avoid random io). Perhaps if it was designed
> currently things may look radically different.
>
> Comparing an average hard disk - ~200 iops  vs ~40K for ssd thats approx
> 200 times increase effectively increasing expectation from processor to
> serve significantly higher ops per second.
>
> In order to extract best from a modern day node it may need significant
> changes such like below :
> https://issues.apache.org/jira/browse/CASSANDRA-10989
> Possibly going forward the number of cores per node is only going to
> increase as it has been seen for last 5-6 years. In a way thats suggesting
> a significant change in design and possibly thats what scylladb is upto.
>
> "We found that we need a cpu scheduler which takes into account the
> priority of different tasks, such as repair, compaction, streaming, read
> operations and write operations."
> From my understanding in Cassandra as well compaction threads run on low
> nice priority - not sure about repair/streaming.
> http://grokbase.com/t/cassandra/user/14a85xpce7/significant-nice-cpu-usage
>
> Regards,
>
> On Sun, Mar 12, 2017 at 2:35 PM, Avi Kivity <a...@scylladb.com> wrote:
>
>> btw, for an example of how user-level tasks can be scheduled in a way
>> that cannot be done with kernel threads, see this pair of blog posts:
>>
>>
>>   http://www.scylladb.com/2016/04/14/io-scheduler-1/
>>
>>   http://www.scylladb.com/2016/04/29/io-scheduler-2/
>>
>>
>> There's simply no way to get this kind of control when you rely on the
>> kernel for scheduling and page cache management.  As a result you have to
>> overprovision your node and then you mostly underutilize it.
>>
>> On 03/12/2017 10:23 AM, Avi Kivity wrote:
>>
>>
>>
>> On 03/12/2017 12:19 AM, Kant Kodali wrote:
>>
>> My response is inline.
>>
>> On Sat, Mar 11, 2017 at 1:43 PM, Avi Kivity <a...@scylladb.com> wrote:
>>
>>> There are several issues at play here.
>>>
>>> First, a database runs a large number of concurrent operations, each of
>>> which only consumes a small amount of CPU. The high concurrency is need to
>>> hide latency: disk latency, or the latency of contacting a remote node.
>>>
>>
>> *Ok so you are talking about hiding I/O latency.  If all these I/O are
>> non-blocking system calls then a thread per core and callback mechanism
>> should suffice isn't it?*
>>
>>
>>
>> Scylla uses a mix of user-level threads and callbacks. Most of the code
>> uses callbacks (fronted by a future/promise API). SSTable writers
>> (memtable flush, compaction) use a user-level thread (internally
>> implemented using callbacks).  The important bit is multiplexing many
>> concurrent operations onto a single kernel thread.
>>
>>
>> This means that the scheduler will need to switch contexts very often. A
>>> kernel thread scheduler knows very little about the application, so it has
>>> to switch a lot of context.  A user level scheduler is tightly bound to the
>>> application, so it can perform the switching faster.
>>>
>>
>> *sure but this applies in other direction as well. A user level scheduler
>> has no idea about kernel level scheduler either.  There is literally no
>> coordination between kernel level scheduler and user level scheduler in
>> linux or any major OS. It may be possible with OS's that support scheduler
>> activation(LWP's) and upcall mechanism. *
>>
>>
>> There is no need for coordination, because the kernel scheduler has no
>> scheduling decisio

Re: scylladb

2017-03-12 Thread Bhuvan Rawal
Looking at the costs of cloud instances, it clearly appears the cost of CPU
dictates the overall cost of the instance. Having 2X more cores increases
cost by nearly 2X keeping other things same as can be seen below as an
example:

(C3 may have slightly better processor but not more than 10-15% peformance
increase)

Optimising for fewer CPU cycles will invariably reduce costs by a large
factor. On a modern day machine with SSD's where data density on node can
be high more requests can be assumed to be served from single node, things
get CPU bound. Perhaps its because it was invented at a time when SSD's did
not exist. If we observe closely, many of cassandra defaults are assuming
disk is rotational - number of flush writers, concurrent compactors, etc.
The design suggest that too (Using Sequential io as far as possible. Infact
thats the underlying philosophy for sequential sstable flushes and
sequential commitlog files to avoid random io). Perhaps if it was designed
currently things may look radically different.

Comparing an average hard disk - ~200 iops  vs ~40K for ssd thats approx
200 times increase effectively increasing expectation from processor to
serve significantly higher ops per second.

In order to extract best from a modern day node it may need significant
changes such like below :
https://issues.apache.org/jira/browse/CASSANDRA-10989
Possibly going forward the number of cores per node is only going to
increase as it has been seen for last 5-6 years. In a way thats suggesting
a significant change in design and possibly thats what scylladb is upto.

"We found that we need a cpu scheduler which takes into account the
priority of different tasks, such as repair, compaction, streaming, read
operations and write operations."
>From my understanding in Cassandra as well compaction threads run on low
nice priority - not sure about repair/streaming.
http://grokbase.com/t/cassandra/user/14a85xpce7/significant-nice-cpu-usage

Regards,

On Sun, Mar 12, 2017 at 2:35 PM, Avi Kivity  wrote:

> btw, for an example of how user-level tasks can be scheduled in a way that
> cannot be done with kernel threads, see this pair of blog posts:
>
>
>   http://www.scylladb.com/2016/04/14/io-scheduler-1/
>
>   http://www.scylladb.com/2016/04/29/io-scheduler-2/
>
>
> There's simply no way to get this kind of control when you rely on the
> kernel for scheduling and page cache management.  As a result you have to
> overprovision your node and then you mostly underutilize it.
>
> On 03/12/2017 10:23 AM, Avi Kivity wrote:
>
>
>
> On 03/12/2017 12:19 AM, Kant Kodali wrote:
>
> My response is inline.
>
> On Sat, Mar 11, 2017 at 1:43 PM, Avi Kivity  wrote:
>
>> There are several issues at play here.
>>
>> First, a database runs a large number of concurrent operations, each of
>> which only consumes a small amount of CPU. The high concurrency is need to
>> hide latency: disk latency, or the latency of contacting a remote node.
>>
>
> *Ok so you are talking about hiding I/O latency.  If all these I/O are
> non-blocking system calls then a thread per core and callback mechanism
> should suffice isn't it?*
>
>
>
> Scylla uses a mix of user-level threads and callbacks. Most of the code
> uses callbacks (fronted by a future/promise API). SSTable writers
> (memtable flush, compaction) use a user-level thread (internally
> implemented using callbacks).  The important bit is multiplexing many
> concurrent operations onto a single kernel thread.
>
>
> This means that the scheduler will need to switch contexts very often. A
>> kernel thread scheduler knows very little about the application, so it has
>> to switch a lot of context.  A user level scheduler is tightly bound to the
>> application, so it can perform the switching faster.
>>
>
> *sure but this applies in other direction as well. A user level scheduler
> has no idea about kernel level scheduler either.  There is literally no
> coordination between kernel level scheduler and user level scheduler in
> linux or any major OS. It may be possible with OS's that support scheduler
> activation(LWP's) and upcall mechanism. *
>
>
> There is no need for coordination, because the kernel scheduler has no
> scheduling decisions to make.  With one thread per core, bound to its core,
> the kernel scheduler can't make the wrong decision because it has just one
> choice.
>
>
> *Even then it is hard to say if it is all worth it (The research shows
> performance may not outweigh the complexity). Golang problem is exactly
> this if one creates 1000 go routines/green threads where each of them is
> making a blocking system call then it would create 1000 kernel threads
> underneath because it has no way to know that the kernel thread is blocked
> (no upcall). *
>
>
> All of the significant system calls we issue are through the main thread,
> either asynchronous or non-blocking.
>
> *And in non-blocking case I still don't even see a significant performance
> when 

Re: scylladb

2017-03-11 Thread Bhuvan Rawal
"Lastly, why don't you test Scylla yourself?  It's pretty easy to set up,
there's nothing to tune."
 - The details are indeed compelling to have a go ahead and test it for
specific use case.

If it works out good it can lead to good cost cut in infra costs as well as
having to manage less servers plus probably less time to bootstrap &
decommission nodes!

It will also be interesting to have a benchmark with Cassandra 3 version as
well, as the new storage engine is said to have better performance:
https://www.datastax.com/2015/12/storage-engine-30

Regards,
Bhuvan

On Sat, Mar 11, 2017 at 2:59 PM, Avi Kivity <a...@scylladb.com> wrote:

> There is no magic 10X bullet.  It's a mix of multiple factors, which can
> come up to less than 10X in some circumstances and more than 10X in others,
> as has been reported on this thread by others.
>
> TPC doesn't give _any_ advantage when you have just one core, and can give
> more than 10X on a machine with a large number of cores.  These are
> becoming more and more common, think of the recent AMD Naples announcement;
> with 32 cores per socket you can have 128 logical cores in a two-socket
> server; or the AWS i3.16xlarge instance with 32 cores / 64 vcpus.
>
> You're welcome to browse our site to learn more about the architecture, or
> watch this technical talk [1] I gave in QConSF that highlights some of the
> techniques we use.
>
> Of course it's possible to mistune Cassandra to give bad results, that is
> why we spent a lot more time tuning Cassandra and documenting everything
> than we spent on Scylla.  You can read the report in [2], it is very
> detailed, and provides a wealth of metrics like you'd expect.
>
> I'm not going to comment about the Aerospike numbers, I haven't studied
> them in detail.  And no, you can't multiply results like that unless they
> were done with very similar configurations and test harnesses.
>
> Lastly, why don't you test Scylla yourself?  It's pretty easy to set up,
> there's nothing to tune.
>
> Avi
>
> [1] https://www.infoq.com/presentations/scylladb
> [2] http://www.scylladb.com/technology/cassandra-vs-scylla-
> benchmark-cluster-1/
>
>
> On 03/10/2017 06:58 PM, Bhuvan Rawal wrote:
>
> Agreed C++ gives an added advantage to talk to underlying hardware with
> better efficiency, it sound good but can a pice of code written in C++ give
> 1000% throughput than a Java app? Is TPC design 10X more performant than
> SEDA arch?
>
> And if C/C++ is indeed that fast how can Aerospike (which is itself
> written in C) claim to be 10X faster than Scylla here
> http://www.aerospike.com/benchmarks/scylladb-initial/ ? (Combining your's
> and aerospike's benchmarks it appears that Aerospike is 100X performant
> than C* - I highly doubt that!! )
>
> For a moment lets forget about evaluating 2 different databases, one can
> observe 10X performance difference between a mistuned cassandra cluster and
> one thats tuned as per data model - there are so many Tunables in yaml as
> well as table configs.
>
> Idea is - in order to strengthen your claim, you need to provide complete
> system metrics (Disk, CPU, Network), the OPS increase starts to decay along
> with the configs used. Having plain ops per second and 99p latency is
> blackbox.
>
> Regards,
> Bhuvan
>
> On Fri, Mar 10, 2017 at 12:47 PM, Avi Kivity <a...@scylladb.com> wrote:
>
>> ScyllaDB engineer here.
>>
>> C++ is really an enabling technology here. It is directly responsible for
>> a small fraction of the gain by executing faster than Java.  But it is
>> indirectly responsible for the gain by allowing us direct control over
>> memory and threading.  Just as an example, Scylla starts by taking over
>> almost all of the machine's memory, and dynamically assigning it to
>> memtables, cache, and working memory needed to handle requests in flight.
>> Memory is statically partitioned across cores, allowing us to exploit NUMA
>> fully.  You can't do these things in Java.
>>
>> I would say the major contributors to Scylla performance are:
>>  - thread-per-core design
>>  - replacement of the page cache with a row cache
>>  - careful attention to many small details, each contributing a little,
>> but with a large overall impact
>>
>> While I'm here I can say that performance is not the only goal here, it
>> is stable and predictable performance over varying loads and during
>> maintenance operations like repair, without any special tuning.  We measure
>> the amount of CPU and I/O spent on foreground (user) and background
>> (maintenance) tasks and divide them fairly.  This work is not complete but
>> already makes operating Scylla a lot simpler.
>>
>

Re: scylladb

2017-03-10 Thread Bhuvan Rawal
Agreed C++ gives an added advantage to talk to underlying hardware with
better efficiency, it sound good but can a pice of code written in C++ give
1000% throughput than a Java app? Is TPC design 10X more performant than
SEDA arch?

And if C/C++ is indeed that fast how can Aerospike (which is itself written
in C) claim to be 10X faster than Scylla here
http://www.aerospike.com/benchmarks/scylladb-initial/ ? (Combining your's
and aerospike's benchmarks it appears that Aerospike is 100X performant
than C* - I highly doubt that!! )

For a moment lets forget about evaluating 2 different databases, one can
observe 10X performance difference between a mistuned cassandra cluster and
one thats tuned as per data model - there are so many Tunables in yaml as
well as table configs.

Idea is - in order to strengthen your claim, you need to provide complete
system metrics (Disk, CPU, Network), the OPS increase starts to decay along
with the configs used. Having plain ops per second and 99p latency is
blackbox.

Regards,
Bhuvan

On Fri, Mar 10, 2017 at 12:47 PM, Avi Kivity  wrote:

> ScyllaDB engineer here.
>
> C++ is really an enabling technology here. It is directly responsible for
> a small fraction of the gain by executing faster than Java.  But it is
> indirectly responsible for the gain by allowing us direct control over
> memory and threading.  Just as an example, Scylla starts by taking over
> almost all of the machine's memory, and dynamically assigning it to
> memtables, cache, and working memory needed to handle requests in flight.
> Memory is statically partitioned across cores, allowing us to exploit NUMA
> fully.  You can't do these things in Java.
>
> I would say the major contributors to Scylla performance are:
>  - thread-per-core design
>  - replacement of the page cache with a row cache
>  - careful attention to many small details, each contributing a little,
> but with a large overall impact
>
> While I'm here I can say that performance is not the only goal here, it is
> stable and predictable performance over varying loads and during
> maintenance operations like repair, without any special tuning.  We measure
> the amount of CPU and I/O spent on foreground (user) and background
> (maintenance) tasks and divide them fairly.  This work is not complete but
> already makes operating Scylla a lot simpler.
>
>
> On 03/10/2017 01:42 AM, Kant Kodali wrote:
>
> I dont think ScyllaDB performance is because of C++. The design decisions
> in scylladb are indeed different from Cassandra such as getting rid of SEDA
> and moving to TPC and so on.
>
> If someone thinks it is because of C++ then just show the benchmarks that
> proves it is indeed the C++ which gave 10X performance boost as ScyllaDB
> claims instead of stating it.
>
>
> On Thu, Mar 9, 2017 at 3:22 PM, Richard L. Burton III 
> wrote:
>
>> They spend an enormous amount of time focusing on performance. You can
>> expect them to continue on with their optimization and keep crushing it.
>>
>> P.S., I don't work for ScyllaDB.
>>
>> On Thu, Mar 9, 2017 at 6:02 PM, Rakesh Kumar 
>> wrote:
>>
>>> In all of their presentation they keep harping on the fact that scylladb
>>> is written in C++ and does not carry the overhead of Java.  Still the
>>> difference looks staggering.
>>> 
>>> From: daemeon reiydelle 
>>> Sent: Thursday, March 9, 2017 14:21
>>> To: user@cassandra.apache.org
>>> Subject: Re: scylladb
>>>
>>> The comparison is fair, and conservative. Did substantial performance
>>> comparisons for two clients, both results returned throughputs that were
>>> faster than the published comparisons (15x as I recall). At that time the
>>> client preferred to utilize a Cass COTS solution and use a caching solution
>>> for OLA compliance.
>>>
>>>
>>> ...
>>>
>>> Daemeon C.M. Reiydelle
>>> USA (+1) 415.501.0198 <%28%2B1%29%20415.501.0198>
>>> London (+44) (0) 20 8144 9872 <%28%2B44%29%20%280%29%2020%208144%209872>
>>>
>>> On Thu, Mar 9, 2017 at 11:04 AM, Robin Verlangen > ro...@us2.nl>> wrote:
>>> I was wondering how people feel about the comparison that's made here
>>> between Cassandra and ScyllaDB : http://www.scylladb.com/techno
>>> logy/ycsb-cassandra-scylla/#results-of-3-scylla-nodes-vs-30-
>>> cassandra-nodes
>>>
>>> They are claiming a 10x improvement, is that a fair comparison or maybe
>>> a somewhat coloured view of a (micro)benchmark in a specific setup? Any
>>> pros/cons known?
>>>
>>> Best regards,
>>>
>>> Robin Verlangen
>>> Chief Data Architect
>>>
>>> Disclaimer: The information contained in this message and attachments is
>>> intended solely for the attention and use of the named addressee and may be
>>> confidential. If you are not the intended recipient, you are reminded that
>>> the information remains the property of the sender. You must not use,
>>> disclose, distribute, copy, print or rely on this 

Re: scylladb

2017-03-09 Thread Bhuvan Rawal
I'd say the benchmark would be complete only when at the point of inflexion
the necessary system benchmarks are provided.

Looking at scylladb report it is unclear as to what system parameter was
being the bottleneck. Also an observation - its mentioned in the report
that they are using 1KB ro and probably using default compression settings
so this could be a possible bottleneck (everytime 64K object would be
picked off and decompressed even though record is 1/64th the size):
https://groups.google.com/forum/#!topic/nosql-databases/9pett319cgs
This would really cripple performance if its the case.

Tuning 99%ile would be tricky in case of java because of background GC
happening - that really has to do with how GC parameters are tuned for the
specific workload.

I believe its pertinent to evaluate cassandra defaults - 100 MB per core
new heap which is recommended the Compression size which can cause troubles.

On Fri, Mar 10, 2017 at 5:12 AM, Kant Kodali  wrote:

> I dont think ScyllaDB performance is because of C++. The design decisions
> in scylladb are indeed different from Cassandra such as getting rid of SEDA
> and moving to TPC and so on.
>
> If someone thinks it is because of C++ then just show the benchmarks that
> proves it is indeed the C++ which gave 10X performance boost as ScyllaDB
> claims instead of stating it.
>
>
> On Thu, Mar 9, 2017 at 3:22 PM, Richard L. Burton III 
> wrote:
>
>> They spend an enormous amount of time focusing on performance. You can
>> expect them to continue on with their optimization and keep crushing it.
>>
>> P.S., I don't work for ScyllaDB.
>>
>> On Thu, Mar 9, 2017 at 6:02 PM, Rakesh Kumar 
>> wrote:
>>
>>> In all of their presentation they keep harping on the fact that scylladb
>>> is written in C++ and does not carry the overhead of Java.  Still the
>>> difference looks staggering.
>>> 
>>> From: daemeon reiydelle 
>>> Sent: Thursday, March 9, 2017 14:21
>>> To: user@cassandra.apache.org
>>> Subject: Re: scylladb
>>>
>>> The comparison is fair, and conservative. Did substantial performance
>>> comparisons for two clients, both results returned throughputs that were
>>> faster than the published comparisons (15x as I recall). At that time the
>>> client preferred to utilize a Cass COTS solution and use a caching solution
>>> for OLA compliance.
>>>
>>>
>>> ...
>>>
>>> Daemeon C.M. Reiydelle
>>> USA (+1) 415.501.0198
>>> London (+44) (0) 20 8144 9872
>>>
>>> On Thu, Mar 9, 2017 at 11:04 AM, Robin Verlangen > ro...@us2.nl>> wrote:
>>> I was wondering how people feel about the comparison that's made here
>>> between Cassandra and ScyllaDB : http://www.scylladb.com/techno
>>> logy/ycsb-cassandra-scylla/#results-of-3-scylla-nodes-vs-30-
>>> cassandra-nodes
>>>
>>> They are claiming a 10x improvement, is that a fair comparison or maybe
>>> a somewhat coloured view of a (micro)benchmark in a specific setup? Any
>>> pros/cons known?
>>>
>>> Best regards,
>>>
>>> Robin Verlangen
>>> Chief Data Architect
>>>
>>> Disclaimer: The information contained in this message and attachments is
>>> intended solely for the attention and use of the named addressee and may be
>>> confidential. If you are not the intended recipient, you are reminded that
>>> the information remains the property of the sender. You must not use,
>>> disclose, distribute, copy, print or rely on this e-mail. If you have
>>> received this message in error, please contact the sender immediately and
>>> irrevocably delete this message and any copies.
>>>
>>> On Wed, Dec 16, 2015 at 11:52 AM, Carlos Rolo > r...@pythian.com>> wrote:
>>> No rain at all! But I almost had it running last weekend, but stopped
>>> short of installing it. Let's see if this one is for real!
>>>
>>> Regards,
>>>
>>> Carlos Juzarte Rolo
>>> Cassandra Consultant
>>>
>>> Pythian - Love your data
>>>
>>> rolo@pythian | Twitter: @cjrolo | Linkedin:
>>> linkedin.com/in/carlosjuzarterolo>> losjuzarterolo>
>>> Mobile: +351 91 891 81 00 | Tel: +1 613 565
>>> 8696 x1649
>>> www.pythian.com
>>>
>>> On Wed, Dec 16, 2015 at 12:38 AM, Dani Traphagen <
>>> dani.trapha...@datastax.com> wrote:
>>> You'll be the first Carlos.
>>>
>>> [Inline image 1]
>>>
>>> Had any rain lately? Curious how this went, if so.
>>>
>>> On Thu, Nov 12, 2015 at 4:36 AM, Jack Krupansky <
>>> jack.krupan...@gmail.com> wrote:
>>> I just did a Twitter search on scylladb and did not see any tweets about
>>> actual use, so far.
>>>
>>>
>>> -- Jack Krupansky
>>>
>>> On Wed, Nov 11, 2015 at 10:54 AM, Carlos Alonso >> > wrote:
>>> Any update about this?
>>>
>>> @Carlos Rolo, did you tried it? Thoughts?
>>>

Re: High disk io read load

2017-02-20 Thread Bhuvan Rawal
Hi Benjamin,

Yes, Read ahead of 8 would imply more IO count from disk but it should not
cause more data read off the disk as is happening in your case.

One probable reason for high disk io would be because the 512 vnode has
less page to RAM ratio of 22% (100G buff /437G data) as compared to 46%
(100G/237G). And as your avg record size is in bytes for every disk io you
are fetching complete 64K block to get a row.

Perhaps you can balance the node by adding equivalent RAM ?

Regards,
Bhuvan

On Mon, Feb 20, 2017 at 12:11 AM, Benjamin Roth <benjamin.r...@jaumo.com>
wrote:

> This is the output of sar: https://gist.github.com/anonymous/
> 9545fb69fbb28a20dc99b2ea5e14f4cd
> <https://www.google.com/url?q=https%3A%2F%2Fgist.github.com%2Fanonymous%2F9545fb69fbb28a20dc99b2ea5e14f4cd=D=1=AFQjCNH6r_GCSN0ZxmDx1f8xGRJPweV-EQ>
>
> It seems to me that there es not enough page cache to handle all data in a
> reasonable way.
> As pointed out yesterday, the read rate with empty page cache is ~800MB/s.
> Thats really (!!!) much for 4-5MB/s network output.
>
> I stumbled across the compression chunk size, which I always left
> untouched from the default of 64kb (https://cl.ly/2w0V3U1q1I1Y). I guess
> setting a read ahead of 8kb is totally pointless if CS reads 64kb if it
> only has to fetch a single row, right? Are there recommendations for that
> setting?
>
> 2017-02-19 19:15 GMT+01:00 Bhuvan Rawal <bhu1ra...@gmail.com>:
>
>> Hi Edward,
>>
>> This could have been a valid case here but if hotspots indeed existed
>> then along with really high disk io , the node should have been doing
>> proportionate high network io as well. -  higher queries per second as well.
>>
>> But from the output shared by Benjamin that doesnt appear to be the case
>> and things look balanced.
>>
>> Regards,
>>
>> On Sun, Feb 19, 2017 at 7:47 PM, Edward Capriolo <edlinuxg...@gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Sat, Feb 18, 2017 at 3:35 PM, Benjamin Roth <benjamin.r...@jaumo.com>
>>> wrote:
>>>
>>>> We are talking about a read IO increase of over 2000% with 512 tokens
>>>> compared to 256 tokens. 100% increase would be linear which would be
>>>> perfect. 200% would even okay, taking the RAM/Load ratio for caching into
>>>> account. But > 20x the read IO is really incredible.
>>>> The nodes are configured with puppet, they share the same roles and no
>>>> manual "optimizations" are applied. So I can't imagine, a different
>>>> configuration is responsible for it.
>>>>
>>>> 2017-02-18 21:28 GMT+01:00 Benjamin Roth <benjamin.r...@jaumo.com>:
>>>>
>>>>> This is status of the largest KS of these both nodes:
>>>>> UN  10.23.71.10  437.91 GiB  512  49.1%
>>>>> 2679c3fa-347e-4845-bfc1-c4d0bc906576  RAC1
>>>>> UN  10.23.71.9   246.99 GiB  256  28.3%
>>>>> 2804ef8a-26c8-4d21-9e12-01e8b6644c2f  RAC1
>>>>>
>>>>> So roughly as expected.
>>>>>
>>>>> 2017-02-17 23:07 GMT+01:00 kurt greaves <k...@instaclustr.com>:
>>>>>
>>>>>> what's the Owns % for the relevant keyspace from nodetool status?
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Benjamin Roth
>>>>> Prokurist
>>>>>
>>>>> Jaumo GmbH · www.jaumo.com
>>>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>>>> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
>>>>> <07161%203048801>
>>>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Benjamin Roth
>>>> Prokurist
>>>>
>>>> Jaumo GmbH · www.jaumo.com
>>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>>>> <+49%207161%203048801>
>>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>>
>>>
>>> When I read articles like this:
>>>
>>> http://www.doanduyhai.com/blog/?p=1930
>>>
>>> And see the word hot-spot.
>>>
>>> "Another performance consideration worth mentioning is hot-spot.
>>> Similar to manual denormalization, if your view partition key is chosen
>>> poorly, you’ll end up with hot spots in your cluster. A simple example with
>>> our *user* table is to create a materialized
>>>
>>> *view user_by_gender"It leads me to ask a question back: What can you
>>> say about hotspots in your data? Even if your nodes had the identical
>>> number of tokens this autho seems to suggesting that you still could have
>>> hotspots. Maybe the issue is you have a hotspot 2x hotspots, or your
>>> application has a hotspot that would be present even with perfect token
>>> balancing.*
>>>
>>>
>>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>


Re: High disk io read load

2017-02-19 Thread Bhuvan Rawal
Hi Edward,

This could have been a valid case here but if hotspots indeed existed then
along with really high disk io , the node should have been doing
proportionate high network io as well. -  higher queries per second as well.

But from the output shared by Benjamin that doesnt appear to be the case
and things look balanced.

Regards,

On Sun, Feb 19, 2017 at 7:47 PM, Edward Capriolo 
wrote:

>
>
> On Sat, Feb 18, 2017 at 3:35 PM, Benjamin Roth 
> wrote:
>
>> We are talking about a read IO increase of over 2000% with 512 tokens
>> compared to 256 tokens. 100% increase would be linear which would be
>> perfect. 200% would even okay, taking the RAM/Load ratio for caching into
>> account. But > 20x the read IO is really incredible.
>> The nodes are configured with puppet, they share the same roles and no
>> manual "optimizations" are applied. So I can't imagine, a different
>> configuration is responsible for it.
>>
>> 2017-02-18 21:28 GMT+01:00 Benjamin Roth :
>>
>>> This is status of the largest KS of these both nodes:
>>> UN  10.23.71.10  437.91 GiB  512  49.1%
>>> 2679c3fa-347e-4845-bfc1-c4d0bc906576  RAC1
>>> UN  10.23.71.9   246.99 GiB  256  28.3%
>>> 2804ef8a-26c8-4d21-9e12-01e8b6644c2f  RAC1
>>>
>>> So roughly as expected.
>>>
>>> 2017-02-17 23:07 GMT+01:00 kurt greaves :
>>>
 what's the Owns % for the relevant keyspace from nodetool status?

>>>
>>>
>>>
>>> --
>>> Benjamin Roth
>>> Prokurist
>>>
>>> Jaumo GmbH · www.jaumo.com
>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
>>> <07161%203048801>
>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>
>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>> <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>
> When I read articles like this:
>
> http://www.doanduyhai.com/blog/?p=1930
>
> And see the word hot-spot.
>
> "Another performance consideration worth mentioning is hot-spot. Similar
> to manual denormalization, if your view partition key is chosen poorly,
> you’ll end up with hot spots in your cluster. A simple example with our
> *user* table is to create a materialized
>
> *view user_by_gender"It leads me to ask a question back: What can you say
> about hotspots in your data? Even if your nodes had the identical number of
> tokens this autho seems to suggesting that you still could have hotspots.
> Maybe the issue is you have a hotspot 2x hotspots, or your application has
> a hotspot that would be present even with perfect token balancing.*
>
>


Re: High disk io read load

2017-02-18 Thread Bhuvan Rawal
108864   /dev/ram6
>> rw   256   512  4096  067108864   /dev/ram7
>> rw   256   512  4096  067108864   /dev/ram8
>> rw   256   512  4096  067108864   /dev/ram9
>> rw   256   512  4096  067108864   /dev/ram10
>> rw   256   512  4096  067108864   /dev/ram11
>> rw   256   512  4096  067108864   /dev/ram12
>> rw   256   512  4096  067108864   /dev/ram13
>> rw   256   512  4096  067108864   /dev/ram14
>> rw   256   512  4096  067108864   /dev/ram15
>> rw16   512  4096  0800166076416 <0800%20166076416>
>> /dev/sda
>> rw16   512  4096   2048800164151296   /dev/sda1
>> rw16   512  4096  0800166076416 <0800%20166076416>
>> /dev/sdb
>> rw16   512  4096   2048800165027840   /dev/sdb1
>> rw16   512  4096  0   1073741824000   /dev/dm-0
>> rw16   512  4096  0  2046820352   /dev/dm-1
>> rw16   512  4096  0  1023410176   /dev/dm-2
>>
>> 2017-02-18 21:41 GMT+01:00 Bhuvan Rawal <bhu1ra...@gmail.com>:
>>
>>> Hi Ben,
>>>
>>> If its same on both machines then something else could be the issue. We
>>> faced high disk io due to misconfigured read ahead which resulted in high
>>> amount of disk io for comparatively insignificant network transfer.
>>>
>>> Can you post output of blockdev --report for a normal node and 512 token
>>> node.
>>>
>>> Regards,
>>>
>>> On Sun, Feb 19, 2017 at 2:07 AM, Benjamin Roth <benjamin.r...@jaumo.com>
>>> wrote:
>>>
>>>> cat /sys/block/sda/queue/read_ahead_kb
>>>> => 8
>>>>
>>>> On all CS nodes. Is that what you mean?
>>>>
>>>> 2017-02-18 21:32 GMT+01:00 Bhuvan Rawal <bhu1ra...@gmail.com>:
>>>>
>>>>> Hi Benjamin,
>>>>>
>>>>> What is the disk read ahead on both nodes?
>>>>>
>>>>> Regards,
>>>>> Bhuvan
>>>>>
>>>>> On Sun, Feb 19, 2017 at 1:58 AM, Benjamin Roth <
>>>>> benjamin.r...@jaumo.com> wrote:
>>>>>
>>>>>> This is status of the largest KS of these both nodes:
>>>>>> UN  10.23.71.10  437.91 GiB  512  49.1%
>>>>>> 2679c3fa-347e-4845-bfc1-c4d0bc906576  RAC1
>>>>>> UN  10.23.71.9   246.99 GiB  256  28.3%
>>>>>> 2804ef8a-26c8-4d21-9e12-01e8b6644c2f  RAC1
>>>>>>
>>>>>> So roughly as expected.
>>>>>>
>>>>>> 2017-02-17 23:07 GMT+01:00 kurt greaves <k...@instaclustr.com>:
>>>>>>
>>>>>>> what's the Owns % for the relevant keyspace from nodetool status?
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Benjamin Roth
>>>>>> Prokurist
>>>>>>
>>>>>> Jaumo GmbH · www.jaumo.com
>>>>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>>>>> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
>>>>>> <07161%203048801>
>>>>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Benjamin Roth
>>>> Prokurist
>>>>
>>>> Jaumo GmbH · www.jaumo.com
>>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>>> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
>>>> <07161%203048801>
>>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>>
>>>
>>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
>> <07161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>


Re: High disk io read load

2017-02-18 Thread Bhuvan Rawal
Hi Ben,

If its same on both machines then something else could be the issue. We
faced high disk io due to misconfigured read ahead which resulted in high
amount of disk io for comparatively insignificant network transfer.

Can you post output of blockdev --report for a normal node and 512 token
node.

Regards,

On Sun, Feb 19, 2017 at 2:07 AM, Benjamin Roth <benjamin.r...@jaumo.com>
wrote:

> cat /sys/block/sda/queue/read_ahead_kb
> => 8
>
> On all CS nodes. Is that what you mean?
>
> 2017-02-18 21:32 GMT+01:00 Bhuvan Rawal <bhu1ra...@gmail.com>:
>
>> Hi Benjamin,
>>
>> What is the disk read ahead on both nodes?
>>
>> Regards,
>> Bhuvan
>>
>> On Sun, Feb 19, 2017 at 1:58 AM, Benjamin Roth <benjamin.r...@jaumo.com>
>> wrote:
>>
>>> This is status of the largest KS of these both nodes:
>>> UN  10.23.71.10  437.91 GiB  512  49.1%
>>> 2679c3fa-347e-4845-bfc1-c4d0bc906576  RAC1
>>> UN  10.23.71.9   246.99 GiB  256  28.3%
>>> 2804ef8a-26c8-4d21-9e12-01e8b6644c2f  RAC1
>>>
>>> So roughly as expected.
>>>
>>> 2017-02-17 23:07 GMT+01:00 kurt greaves <k...@instaclustr.com>:
>>>
>>>> what's the Owns % for the relevant keyspace from nodetool status?
>>>>
>>>
>>>
>>>
>>> --
>>> Benjamin Roth
>>> Prokurist
>>>
>>> Jaumo GmbH · www.jaumo.com
>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
>>> <07161%203048801>
>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>
>>
>>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>


Re: High disk io read load

2017-02-18 Thread Bhuvan Rawal
Hi Benjamin,

What is the disk read ahead on both nodes?

Regards,
Bhuvan

On Sun, Feb 19, 2017 at 1:58 AM, Benjamin Roth 
wrote:

> This is status of the largest KS of these both nodes:
> UN  10.23.71.10  437.91 GiB  512  49.1%
> 2679c3fa-347e-4845-bfc1-c4d0bc906576  RAC1
> UN  10.23.71.9   246.99 GiB  256  28.3%
> 2804ef8a-26c8-4d21-9e12-01e8b6644c2f  RAC1
>
> So roughly as expected.
>
> 2017-02-17 23:07 GMT+01:00 kurt greaves :
>
>> what's the Owns % for the relevant keyspace from nodetool status?
>>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>


Re: Logging queries

2017-02-18 Thread Bhuvan Rawal
Im not sure if you can create an index on system_traces keyspace for this
use case.

If the performance issue that you are trying to troubleshoot is consistent
than you can switch on tracing for a while and do dump of
system_traces.events table say using COPY into csv. You can do analysis on
that for finding the problematic query.

copy system_traces.events TO 'traces_dump.csv';

Also do make sure you dont set trace probability to a high number if
working on a production database as it can adversely impact performance.

Regards,

On Sun, Feb 19, 2017 at 1:28 AM, Igor Leão <igor.l...@ubee.in> wrote:

> Hi Bhuvan,
> Thanks a lot!
>
> Any idea if something can be done for C* 2.X?
>
> Best,
> Igor
>
> 2017-02-18 16:41 GMT-03:00 Bhuvan Rawal <bhu1ra...@gmail.com>:
>
>> Hi Igor,
>>
>> If you are using java driver, you can log slow queries on client side
>> using QueryLogger.
>> https://docs.datastax.com/en/developer/java-driver/2.1/manual/logging/
>>
>> Slow Query logger for server was introduced in C* 3.10 version. Details:
>> https://issues.apache.org/jira/browse/CASSANDRA-12403
>>
>> Regards,
>> Bhuvan
>>
>> On Sun, Feb 19, 2017 at 12:59 AM, Igor Leão <igor.l...@ubee.in> wrote:
>>
>>> Hi there,
>>>
>>> I'm wondering how to log queries from Cassandra. These queries can be
>>> either slow queries or all queries. The only constraint is that I should do
>>> this on server side.
>>>
>>> I tried using `nodetool settraceprobability`, which writes all queries
>>> to the keyspace `system_traces`. When I try to see which queries are slower
>>> than a given number, I get:
>>>
>>> Result: ```InvalidRequest: code=2200 [Invalid query] message="No
>>> secondary indexes on the restricted columns support the provided operators:
>>> "```
>>> Query: `select * from events where source_elapsed >= 1000;`
>>>
>>> My goal is to debug performance issues in a production database. I want
>>> to know which queries are degrading the performance of the db.
>>>
>>> Thanks in advance!
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>
>
> --
> Igor Leão  Site Reliability Engineer
>
> Mobile: +55 81 99727-1083 
> Skype: *igorvpcleao*
> Office: +55 81 4042-9757 
> Website: inlocomedia.com <http://www.inlocomedia.com/>
> [image: inlocomedia]
> <http://t.sidekickopen29.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XX45w6CwnN7dSpvzQZpw8W8pTc_456dVQFdQm8LT02?t=http%3A%2F%2Fwww.inlocomedia.com%2F=4991638468296704=9266b53b-57c9-4b38-d81a-d2f8f01ed355>
>  [image: LinkedIn]
> <http://t.sidekickopen29.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XX45w6CwnN7dSpvzQZpw8W8pTc_456dVQFdQm8LT02?t=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2Fin-loco-media=4991638468296704=9266b53b-57c9-4b38-d81a-d2f8f01ed355>
>  [image: Facebook] <https://www.facebook.com/inlocomedia> [image: Twitter]
> <http://t.sidekickopen29.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XX45w6CwnN7dSpvzQZpw8W8pTc_456dVQFdQm8LT02?t=https%3A%2F%2Ftwitter.com%2Finlocomedia=4991638468296704=9266b53b-57c9-4b38-d81a-d2f8f01ed355>
>
>
>
>
>
>
>


Re: Logging queries

2017-02-18 Thread Bhuvan Rawal
Hi Igor,

If you are using java driver, you can log slow queries on client side using
QueryLogger.
https://docs.datastax.com/en/developer/java-driver/2.1/manual/logging/

Slow Query logger for server was introduced in C* 3.10 version. Details:
https://issues.apache.org/jira/browse/CASSANDRA-12403

Regards,
Bhuvan

On Sun, Feb 19, 2017 at 12:59 AM, Igor Leão  wrote:

> Hi there,
>
> I'm wondering how to log queries from Cassandra. These queries can be
> either slow queries or all queries. The only constraint is that I should do
> this on server side.
>
> I tried using `nodetool settraceprobability`, which writes all queries to
> the keyspace `system_traces`. When I try to see which queries are slower
> than a given number, I get:
>
> Result: ```InvalidRequest: code=2200 [Invalid query] message="No secondary
> indexes on the restricted columns support the provided operators: "```
> Query: `select * from events where source_elapsed >= 1000;`
>
> My goal is to debug performance issues in a production database. I want to
> know which queries are degrading the performance of the db.
>
> Thanks in advance!
>
>
>
>
>
>
>


Re: [Multi DC] Old Data Not syncing from Existing cluster to new Cluster

2017-01-30 Thread Bhuvan Rawal
Hi Abhishek,

nodetool status output can be misleading at times.
In order to ensure data is in sync, schedule a repair for the imapcted
keyspaces.

Regards,

On Mon, Jan 30, 2017 at 10:13 AM, Abhishek Kumar Maheshwari <
abhishek.maheshw...@timesinternet.in> wrote:

> But how I will tell rebuild command source DC if I have more than 2 Dc?
>
>
>
> @dinking, yes I run the command, and it did some strange thing now:
>
>
>
> Datacenter: DRPOCcluster
>
> 
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  AddressLoad   Tokens   OwnsHost
> ID   Rack
>
> UN  172.29.XX.XXX  140.16 GB  256  ?   
> badf985b-37da-4735-b468-8d3a058d4b60
> 01
>
> UN  172.29. XX.XXX  82.04 GB   256  ?
> 317061b2-c19f-44ba-a776-bcd91c70bbdd  03
>
> UN  172.29. XX.XXX  85.29 GB   256  ?
> 9bf0d1dc-6826-4f3b-9c56-cec0c9ce3b6c  02
>
> Datacenter: dc_india
>
> 
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  AddressLoad   Tokens   OwnsHost
> ID   Rack
>
> UN  172.26. XX.XXX   79.09 GB   256  ?
> 3e8133ed-98b5-418d-96b5-690a1450cd30  RACK1
>
> UN  172.26. XX.XXX   79.39 GB   256  ?
> 7d3f5b25-88f9-4be7-b0f5-746619153543  RACK2
>
>
>
>
>
>
>
> In source DC (dc_india) we have near about 79 GB data. But in new DC each
> node has more than 79 GB data and Seed IP have near about 2 times data.
> Below is replication:
>
> Data Key Space:
>
> alter KEYSPACE wls WITH replication = {'class': 'NetworkTopologyStrategy',
> 'DRPOCcluster': '3','dc_india':'2'}  AND durable_writes = true;
>
> alter KEYSPACE adlog WITH replication = {'class':
> 'NetworkTopologyStrategy', 'DRPOCcluster': '3','dc_india':'2'}  AND
> durable_writes = true;
>
>
>
> New DC('DRPOCcluster') system Key Space:
>
>
>
> alter KEYSPACE system_distributed WITH replication = {'class':
> 'NetworkTopologyStrategy', 'DRPOCcluster': '3','dc_india':'0'}  AND
> durable_writes = true;
>
> alter KEYSPACE system_auth WITH replication = {'class':
> 'NetworkTopologyStrategy', 'DRPOCcluster': '3','dc_india':'0'}  AND
> durable_writes = true;
>
> alter KEYSPACE system_traces WITH replication = {'class':
> 'NetworkTopologyStrategy', 'DRPOCcluster': '3','dc_india':'0'}  AND
> durable_writes = true;
>
> alter KEYSPACE "OpsCenter" WITH replication = {'class':
> 'NetworkTopologyStrategy', 'DRPOCcluster': '3','dc_india':'0'}  AND
> durable_writes = true;
>
>
>
> Old  DC(‘dc_india’) system Key Space:
>
>
>
> alter KEYSPACE system_distributed WITH replication = {'class':
> 'NetworkTopologyStrategy', 'DRPOCcluster': '0','dc_india':'2'}  AND
> durable_writes = true;
>
> alter KEYSPACE system_auth WITH replication = {'class':
> 'NetworkTopologyStrategy', 'DRPOCcluster': '0','dc_india':'2'}  AND
> durable_writes = true;
>
> alter KEYSPACE system_traces WITH replication = {'class':
> 'NetworkTopologyStrategy', 'DRPOCcluster': '0','dc_india':'2'}  AND
> durable_writes = true;
>
> alter KEYSPACE "OpsCenter" WITH replication = {'class':
> 'NetworkTopologyStrategy', 'DRPOCcluster': '0','dc_india':'2'}  AND
> durable_writes = true;
>
>
>
> why this happening? I did soething wrong?
>
>
>
> *Thanks & Regards,*
> *Abhishek Kumar Maheshwari*
> *+91- 805591 (Mobile)*
>
> Times Internet Ltd. | A Times of India Group Company
>
> FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
>
> *P** Please do not print this email unless it is absolutely necessary.
> Spread environmental awareness.*
>
>
>
> *From:* kurt greaves [mailto:k...@instaclustr.com]
> *Sent:* Saturday, January 28, 2017 3:27 AM
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: [Multi DC] Old Data Not syncing from Existing cluster to
> new Cluster
>
>
>
> What Dikang said, in your original email you are passing -dc to rebuild.
> This is incorrect. Simply run nodetool rebuild  from each of the
> nodes in the new dc.
>
>
>
> On 28 Jan 2017 07:50, "Dikang Gu"  wrote:
>
> Have you run 'nodetool rebuild dc_india' on the new nodes?
>
>
>
> On Tue, Jan 24, 2017 at 7:51 AM, Benjamin Roth 
> wrote:
>
> Have you also altered RF of system_distributed as stated in the tutorial?
>
>
>
> 2017-01-24 16:45 GMT+01:00 Abhishek Kumar Maheshwari  timesinternet.in>:
>
> My Mistake,
>
>
>
> Both clusters are up and running.
>
>
>
> Datacenter: DRPOCcluster
>
> 
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  AddressLoad   Tokens   OwnsHost
> ID   Rack
>
> UN  172.29.XX.XX  1.65 GB   256  ?   
> badf985b-37da-4735-b468-8d3a058d4b60
> 01
>
> UN  172.29.XX.XX  1.64 GB   256  ?   
> 317061b2-c19f-44ba-a776-bcd91c70bbdd
> 03
>
> UN  172.29.XX.XX  1.64 GB   256  ?   
> 9bf0d1dc-6826-4f3b-9c56-cec0c9ce3b6c
> 02
>
> Datacenter: dc_india
>
> 
>
> 

Re: Incremental Repair Migration

2017-01-10 Thread Bhuvan Rawal
Hi Amit,

You can try reaper, it makes repairs effortless. There are a host of other
benefits but most importantly it offers a Single portal to manage & track
ongoing as well as past repairs.

 For incremental repairs it breaks it into single segment per node, if you
find that it's indeed the case, you may have to increase segment timeout
when you run it for the first time as it repairs whole set of sstables.

Regards,
Bhuvan

On Jan 10, 2017 8:44 PM, "Jonathan Haddad"  wrote:

Reaper suppers incremental repair.
On Mon, Jan 9, 2017 at 11:27 PM Amit Singh F 
wrote:

> Hi Jonathan,
>
>
>
> Really appreciate your response.
>
>
>
> It will not be possible for us to move to Reaper as of now, we are in
> process to migrate to Incremental repair.
>
>
>
> Also Running repair constantly will be costly affair in our case . For
> migrating to incremental repair with large set of dataset will take hours
> to be finished if we go ahead with procedure shared by Datastax.
>
>
>
> So any quick method to reduce that ?
>
>
>
> Regards
>
> Amit Singh
>
>
>
> *From:* Jonathan Haddad [mailto:j...@jonhaddad.com]
> *Sent:* Tuesday, January 10, 2017 11:50 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Incremental Repair Migration
>
>
>
> Your best bet is to just run repair constantly. We maintain an updated
> fork of Spotify's reaper tool to help manage it: https://github.com/
> thelastpickle/cassandra-reaper
>
> On Mon, Jan 9, 2017 at 10:04 PM Amit Singh F 
> wrote:
>
> Hi All,
>
>
>
> We are thinking of migrating from primary range repair (-pr) to
> incremental repair.
>
>
>
> Environment :
>
>
>
> · Cassandra 2.1.16
>
> • 25 Node cluster ,
>
> • RF 3
>
> • Data size up to 450 GB per nodes
>
>
>
> We found that running full repair will be taking around 8 hrs per node
> which *means 200 odd hrs*. for migrating the entire cluster to
> incremental repair. Even though there is zero downtime, it is quite
> unreasonable to ask for 200 hr maintenance window for migrating repairs.
>
>
>
> Just want to know how Cassandra users in community optimize the procedure
> to reduce migration time ?
>
>
>
> Thanks & Regards
>
> Amit Singh
>
>


Re: Strange issue wherein cassandra not being started from cron

2017-01-09 Thread Bhuvan Rawal
Hi Ajay,

Have you had a look at cron logs? - mine is in path /var/log/cron

Thanks & Regards,

On Tue, Jan 10, 2017 at 9:45 AM, Ajay Garg  wrote:

> Hi All.
>
> Facing a very weird issue, wherein the command
>
> */etc/init.d/cassandra start*
>
> causes cassandra to start when the command is run from command-line.
>
>
> However, if I put the above as a cron job
>
>
>
> ** * * * * /etc/init.d/cassandra start*
> cassandra never starts.
>
>
> I have checked, and "cron" service is running.
>
>
> Any ideas what might be wrong?
> I am pasting the cassandra script for brevity.
>
>
> Thanks and Regards,
> Ajay
>
>
> 
> 
> #! /bin/sh
> ### BEGIN INIT INFO
> # Provides:  cassandra
> # Required-Start:$remote_fs $network $named $time
> # Required-Stop: $remote_fs $network $named $time
> # Should-Start:  ntp mdadm
> # Should-Stop:   ntp mdadm
> # Default-Start: 2 3 4 5
> # Default-Stop:  0 1 6
> # Short-Description: distributed storage system for structured data
> # Description:   Cassandra is a distributed (peer-to-peer) system for
> #the management and storage of structured data.
> ### END INIT INFO
>
> # Author: Eric Evans 
>
> DESC="Cassandra"
> NAME=cassandra
> PIDFILE=/var/run/$NAME/$NAME.pid
> SCRIPTNAME=/etc/init.d/$NAME
> CONFDIR=/etc/cassandra
> WAIT_FOR_START=10
> CASSANDRA_HOME=/usr/share/cassandra
> FD_LIMIT=10
>
> [ -e /usr/share/cassandra/apache-cassandra.jar ] || exit 0
> [ -e /etc/cassandra/cassandra.yaml ] || exit 0
> [ -e /etc/cassandra/cassandra-env.sh ] || exit 0
>
> # Read configuration variable file if it is present
> [ -r /etc/default/$NAME ] && . /etc/default/$NAME
>
> # Read Cassandra environment file.
> . /etc/cassandra/cassandra-env.sh
>
> if [ -z "$JVM_OPTS" ]; then
> echo "Initialization failed; \$JVM_OPTS not set!" >&2
> exit 3
> fi
>
> export JVM_OPTS
>
> # Export JAVA_HOME, if set.
> [ -n "$JAVA_HOME" ] && export JAVA_HOME
>
> # Load the VERBOSE setting and other rcS variables
> . /lib/init/vars.sh
>
> # Define LSB log_* functions.
> # Depend on lsb-base (>= 3.0-6) to ensure that this file is present.
> . /lib/lsb/init-functions
>
> #
> # Function that returns 0 if process is running, or nonzero if not.
> #
> # The nonzero value is 3 if the process is simply not running, and 1 if the
> # process is not running but the pidfile exists (to match the exit codes
> for
> # the "status" command; see LSB core spec 3.1, section 20.2)
> #
> CMD_PATT="cassandra.+CassandraDaemon"
> is_running()
> {
> if [ -f $PIDFILE ]; then
> pid=`cat $PIDFILE`
> grep -Eq "$CMD_PATT" "/proc/$pid/cmdline" 2>/dev/null && return 0
> return 1
> fi
> return 3
> }
> #
> # Function that starts the daemon/service
> #
> do_start()
> {
> # Return
> #   0 if daemon has been started
> #   1 if daemon was already running
> #   2 if daemon could not be started
>
> ulimit -l unlimited
> ulimit -n "$FD_LIMIT"
>
> cassandra_home=`getent passwd cassandra | awk -F ':' '{ print $6; }'`
> heap_dump_f="$cassandra_home/java_`date +%s`.hprof"
> error_log_f="$cassandra_home/hs_err_`date +%s`.log"
>
> [ -e `dirname "$PIDFILE"` ] || \
> install -d -ocassandra -gcassandra -m755 `dirname $PIDFILE`
>
>
>
> start-stop-daemon -S -c cassandra -a /usr/sbin/cassandra -q -p
> "$PIDFILE" -t >/dev/null || return 1
>
> start-stop-daemon -S -c cassandra -a /usr/sbin/cassandra -b -p
> "$PIDFILE" -- \
> -p "$PIDFILE" -H "$heap_dump_f" -E "$error_log_f" >/dev/null ||
> return 2
>
> }
>
> #
> # Function that stops the daemon/service
> #
> do_stop()
> {
> # Return
> #   0 if daemon has been stopped
> #   1 if daemon was already stopped
> #   2 if daemon could not be stopped
> #   other if a failure occurred
> start-stop-daemon -K -p "$PIDFILE" -R TERM/30/KILL/5 >/dev/null
> RET=$?
> rm -f "$PIDFILE"
> return $RET
> }
>
> case "$1" in
>   start)
> [ "$VERBOSE" != no ] && log_daemon_msg "Starting $DESC" "$NAME"
> do_start
> case "$?" in
> 0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;;
> 2) [ "$VERBOSE" != no ] && log_end_msg 1 ;;
> esac
> ;;
>   stop)
> [ "$VERBOSE" != no ] && log_daemon_msg "Stopping $DESC" "$NAME"
> do_stop
> case "$?" in
> 0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;;
> 2) [ "$VERBOSE" != no ] && log_end_msg 1 ;;
> esac
> ;;
>   restart|force-reload)
> log_daemon_msg "Restarting $DESC" "$NAME"
> do_stop
> case "$?" in
>   0|1)
> do_start
> case "$?" in
>   0|1)
> do_start
> case "$?" in
> 0) log_end_msg 0 ;;
>   

Re: Reaper repair seems to "hang"

2017-01-03 Thread Bhuvan Rawal
Hi Daniel,

Looks like yours is a different case. If you're running incremental repair
for the first time it make take long time esp. if table is large. And
repair may seem to stuck even when things are working.

You can try nodetool compactionstats when repair appears stuck, you'll find
a validation compaction happening if that's indeed the case.

For the first incremental repair you can follow this doc, in further
repairs incremental repair should encounter very few sstables:
https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsRepairNodesMigration.html

Regards,
Bhuvan



On Jan 4, 2017 3:52 AM, "Daniel Kleviansky" <dan...@kleviansky.com> wrote:

Hi Bhuvan,

Thank you so very much for your detailed reply.
Just to ensure everyone is across the same information, and responses are
not duplicated across two different forums, I thought I'd share with the
mailing list that I've created a GitHub issue at: https://github.com/
thelastpickle/cassandra-reaper/issues/39

Kind regards,
Daniel

On Wed, Jan 4, 2017 at 6:31 AM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:

> Hi Daniel,
>
> We faced a similar issue during repair with reaper. We ran repair with
> more repair threads than number of cassandra nodes. But on and off repair
> was getting stuck and we had to do rolling restart of cluster or wait for
> lock time to expire (~1hr).
>
> We had a look at the stuck repair, threadpools were getting stuck at
> AntiEntropy stage. From the synchronized block in repair code it appeared
> that per node at max 1 concurrent repair session per node is possible.
>
> According to https://medium.com/@mlowicki/cassandra-reaper-introductio
> n-ed73410492bf#.f0erygqpk :
>
> Segment runner has protection mechanism to avoid overloading nodes using
> two simple rules to postpone repair if:
>
> 1. Number of pending compactions is greater than *MAX_PENDING_COMPACTIONS*
>  (20 by default)
> *2. Node is already running repair job*
>
> We tried running reaper with number of threads less than number of nodes
> (assuming reaper will not submit multiple segments to single cassandra
> node) but still it was observed that multiple repair segments were going to
> same node concurrently and threfore chances of nodes getting stuck in that
> state was possible. Finally we settled with single repair thread in reaper
> settings. Although takes a slightly more time but has completed
> successfully numerous times.
>
> Thread Dump of cassandra server when repair was getting stuck:
>
> "*AntiEntropyStage:1" #159 daemon prio=5 os_prio=0 tid=0x7f0fa16226a0
> nid=0x3c82 waiting for monitor entry [0x7ee9eabaf000*]
>java.lang.Thread.State: BLOCKED (*on object monitor*)
> at org.apache.cassandra.service.ActiveRepairService.removeParen
> tRepairSession(ActiveRepairService.java:392)
> - waiting to lock <0x00067c083308> (a
> org.apache.cassandra.service.ActiveRepairService)
> at org.apache.cassandra.service.ActiveRepairService.doAntiCompa
> ction(ActiveRepairService.java:417)
> at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(
> RepairMessageVerbHandler.java:145)
> at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeli
> veryTask.java:67)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executor
> s.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> Executor.java:1142)
>
> Hope it helps!
>
> Regards,
> Bhuvan
>
> According to https://medium.com/@mlowicki/cassandra-reaper-introductio
> n-ed73410492bf#.f0erygqpk :
>
> Segment runner has protection mechanism to avoid overloading nodes using
> two simple rules to postpone repair if:
>
> 1. Number of pending compactions is greater than *MAX_PENDING_COMPACTIONS*
>  (20 by default)
> 2. Node is already running repair job
>
>
> On Tue, Jan 3, 2017 at 11:16 AM, Alexander Dejanovski <
> a...@thelastpickle.com> wrote:
>
>> Hi Daniel,
>>
>> could you file a bug in the issue tracker ? https://github.com/thelastpi
>> ckle/cassandra-reaper/issues
>>
>> We'll figure out what's wrong and get your repairs running.
>>
>> Thanks !
>>
>> On Tue, Jan 3, 2017 at 12:35 AM Daniel Kleviansky <dan...@kleviansky.com>
>> wrote:
>>
>>> Hi everyone,
>>>
>>> Using The Last Pickle's fork of Reaper, and unfortunately running into a
>>> bit of an issue. I'll try break it down below.
>>>
>>> # Problem Description:
>>> * After starting repair via the GUI, progress remains at 0/x.
>>> * Cassandra nodes calculate their respective token ranges, and 

Re: Reaper repair seems to "hang"

2017-01-03 Thread Bhuvan Rawal
Hi Daniel,

We faced a similar issue during repair with reaper. We ran repair with more
repair threads than number of cassandra nodes. But on and off repair was
getting stuck and we had to do rolling restart of cluster or wait for lock
time to expire (~1hr).

We had a look at the stuck repair, threadpools were getting stuck at
AntiEntropy stage. From the synchronized block in repair code it appeared
that per node at max 1 concurrent repair session per node is possible.

According to https://medium.com/@mlowicki/cassandra-reaper-introduction-
ed73410492bf#.f0erygqpk :

Segment runner has protection mechanism to avoid overloading nodes using
two simple rules to postpone repair if:

1. Number of pending compactions is greater than *MAX_PENDING_COMPACTIONS* (20
by default)
*2. Node is already running repair job*

We tried running reaper with number of threads less than number of nodes
(assuming reaper will not submit multiple segments to single cassandra
node) but still it was observed that multiple repair segments were going to
same node concurrently and threfore chances of nodes getting stuck in that
state was possible. Finally we settled with single repair thread in reaper
settings. Although takes a slightly more time but has completed
successfully numerous times.

Thread Dump of cassandra server when repair was getting stuck:

"*AntiEntropyStage:1" #159 daemon prio=5 os_prio=0 tid=0x7f0fa16226a0
nid=0x3c82 waiting for monitor entry [0x7ee9eabaf000*]
   java.lang.Thread.State: BLOCKED (*on object monitor*)
at org.apache.cassandra.service.ActiveRepairService.
removeParentRepairSession(ActiveRepairService.java:392)
- waiting to lock <0x00067c083308> (a
org.apache.cassandra.service.ActiveRepairService)
at org.apache.cassandra.service.ActiveRepairService.
doAntiCompaction(ActiveRepairService.java:417)
at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(
RepairMessageVerbHandler.java:145)
at org.apache.cassandra.net.MessageDeliveryTask.run(
MessageDeliveryTask.java:67)
at java.util.concurrent.Executors$RunnableAdapter.
call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1142)

Hope it helps!

Regards,
Bhuvan

According to https://medium.com/@mlowicki/cassandra-reaper-introduction-
ed73410492bf#.f0erygqpk :

Segment runner has protection mechanism to avoid overloading nodes using
two simple rules to postpone repair if:

1. Number of pending compactions is greater than *MAX_PENDING_COMPACTIONS* (20
by default)
2. Node is already running repair job


On Tue, Jan 3, 2017 at 11:16 AM, Alexander Dejanovski <
a...@thelastpickle.com> wrote:

> Hi Daniel,
>
> could you file a bug in the issue tracker ? https://github.com/
> thelastpickle/cassandra-reaper/issues
>
> We'll figure out what's wrong and get your repairs running.
>
> Thanks !
>
> On Tue, Jan 3, 2017 at 12:35 AM Daniel Kleviansky 
> wrote:
>
>> Hi everyone,
>>
>> Using The Last Pickle's fork of Reaper, and unfortunately running into a
>> bit of an issue. I'll try break it down below.
>>
>> # Problem Description:
>> * After starting repair via the GUI, progress remains at 0/x.
>> * Cassandra nodes calculate their respective token ranges, and then
>> nothing happens.
>> * There were no errors in the Reaper or Cassandra logs. Only a message of
>> acknowledgement that a repair had initiated.
>> * Performing stack trace on the running JVM, once can see that the thread
>> spawning the repair process was waiting on a lock that was never being
>> released.
>> * This occurred on all nodes, and prevented any manually initiated repair
>> process from running. A rolling restart of each node was required, after
>> which one could run a `nodetool repair` successfully.
>>
>> # Cassandra Cluster Details:
>> * Cassandra 2.2.5 running on Windows Server 2008 R2
>> * 6 node cluster, split across 2 DCs, with RF = 3:3.
>>
>> # Reaper Details:
>> * Reaper 0.3.3 running on Windows Server 2008 R2, utilising a PostgreSQL
>> database.
>>
>> ## Reaper settings:
>> * Parallism: DC-Aware
>> * Repair Intensity: 0.9
>> * Incremental: true
>>
>> Don't want to swamp you with more details or unnecessary logs, especially
>> as I'd have to sanitize them before sending them out, so please let me know
>> if there is anything else I can provide, and I'll do my best to get it to
>> you.
>>
>> ​Kind regards,
>> Daniel
>>
> --
> -
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>


Re: Bootstrapping multiple cassandra nodes simultaneously in existing dc

2016-12-03 Thread Bhuvan Rawal
Thanks Jens! this was helpful, also to avoid pending compaction buildup
compaction throughput can be throttled higher. In our case however the
property batchlog throttle (batchlog_replay_throttle_in_kb) was being the
bottleneck increasing it to 10240k from default of 1024k reduced node
addition time by atleast a factor 3x. (from this it could be inferred that
batchlog is used during node addition).

An interesting hack for people on public cloud could be to get themselves a
higher cpu capacity machine during bootstrap and then downgrading it once
data is in place - as cpu essentially becomes bottleneck during node
addition.

I could find a property in docs -  consistent.rangemovement, setting it to
false which allows multiple node addition simultaneously.


*JVM_OPTS="$JVM_OPTS -Dcassandra.consistent.rangemovement=false"*

I could test with this property in a test cluster and things appeared fine.
This constraint seems to be introduced in CASSANDRA-2434
<https://issues.apache.org/jira/browse/CASSANDRA-2434>. If we talk of
today, what could be the possible implication of multiple node addition
simultaneously, suppose if 2 min rule is taken into accord and a repair
that works!

Regards,
Bhuvan

On Mon, Sep 12, 2016 at 2:56 AM, Jens Rantil <jens.ran...@tink.se> wrote:

> Yes. `nodetool setstreamthroughput` is your friend.
>
>
> On Sunday, September 11, 2016, sai krishnam raju potturi <
> pskraj...@gmail.com> wrote:
>
>> Make sure there is no spike in the load-avg on the existing nodes, as
>> that might affect your application read request latencies.
>>
>> On Sun, Sep 11, 2016, 17:10 Jens Rantil <jens.ran...@tink.se> wrote:
>>
>>> Hi Bhuvan,
>>>
>>> I have done such expansion multiple times and can really recommend
>>> bootstrapping a new DC and pointing your clients to it. The process is so
>>> much faster and the documentation you referred to has worked out fine for
>>> me.
>>>
>>> Cheers,
>>> Jens
>>>
>>>
>>> On Sunday, September 11, 2016, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> We are running Cassandra 3.6 and want to bump up Cassandra nodes in an
>>>> existing datacenter from 3 to 12 (plan to move to r3.xlarge machines to
>>>> leverage more memory instead of m4.2xlarge). Bootstrapping a node would
>>>> take 7-8 hours.
>>>>
>>>> If this activity is performed serially then it will take 5-6 days. I
>>>> had a look at CASSANDRA-7069
>>>> <https://issues.apache.org/jira/browse/CASSANDRA-7069> and a bit of
>>>> discussion in the past at - http://grokbase.com/t/cassan
>>>> dra/user/147gcqvybg/adding-more-nodes-into-the-cluster. Wanted to know
>>>> if the limitation is still applicable and race condition could occur in 3.6
>>>> version.
>>>>
>>>> If this is not the case can we add a new datacenter as mentioned here
>>>> opsAddDCToCluster
>>>> <https://docs.datastax.com/en/cassandra/3.x/cassandra/operations/opsAddDCToCluster.html>
>>>>  and
>>>> bootstrap multiple nodes simultaneously by keeping auto_bootstrap false in
>>>> cassandra.yaml and rebuilding nodes simultaneously in the new dc?
>>>>
>>>>
>>>> Thanks & Regards,
>>>> Bhuvan
>>>>
>>>
>>>
>>> --
>>> Jens Rantil
>>> Backend engineer
>>> Tink AB
>>>
>>> Email: jens.ran...@tink.se
>>> Phone: +46 708 84 18 32
>>> Web: www.tink.se
>>>
>>> Facebook <https://www.facebook.com/#!/tink.se> Linkedin
>>> <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
>>>  Twitter <https://twitter.com/tink>
>>>
>>>
>
> --
> Jens Rantil
> Backend engineer
> Tink AB
>
> Email: jens.ran...@tink.se
> Phone: +46 708 84 18 32
> Web: www.tink.se
>
> Facebook <https://www.facebook.com/#!/tink.se> Linkedin
> <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
>  Twitter <https://twitter.com/tink>
>
>


Re: High system CPU during high write workload

2016-11-15 Thread Bhuvan Rawal
Hi Ben,

Thanks for your reply, we tested the same workload on kernel
version 4.6.4-1.el7.elrepo.x86_64 and found the issue to be not present
there.

This had resulted in really high CPU in write workloads -> area in which
cassandra excels. Degrading performance by atleast 5x! I suggest this
mention could be included in cassandra community wiki as it could impact a
large audience.

Thanks & Regards,
Bhuvan

On Tue, Nov 15, 2016 at 12:33 PM, Ben Bromhead  wrote:

> Hi Abhishek
>
> The article with the futex bug description lists the solution, which is to
> upgrade to a version of RHEL or CentOS that have the specified patch.
>
> What help do you specifically need? If you need help upgrading the OS I
> would look at the documentation for RHEL or CentOS.
>
> Ben
>
> On Mon, 14 Nov 2016 at 22:48 Abhishek Gupta 
> wrote:
>
> Hi,
>
> We are seeing an issue where the system CPU is shooting off to a figure or
> > 90% when the cluster is subjected to a relatively high write workload i.e
> 4k wreq/secs.
>
> 2016-11-14T13:27:47.900+0530 Process summary
>   process cpu=695.61%
>   application cpu=676.11% (*user=200.63% sys=475.49%) **<== Very High
> System CPU *
>   other: cpu=19.49%
>   heap allocation rate *403mb*/s
> [000533] user= 1.43% sys= 6.91% alloc= 2216kb/s - SharedPool-Worker-129
> [000274] user= 0.38% sys= 7.78% alloc= 2415kb/s - SharedPool-Worker-34
> [000292] user= 1.24% sys= 6.77% alloc= 2196kb/s - SharedPool-Worker-56
> [000487] user= 1.24% sys= 6.69% alloc= 2260kb/s - SharedPool-Worker-79
> [000488] user= 1.24% sys= 6.56% alloc= 2064kb/s - SharedPool-Worker-78
> [000258] user= 1.05% sys= 6.66% alloc= 2250kb/s - SharedPool-Worker-41
>
> On doing strace it was found that the following system call is consuming
> all the system CPU
>  timeout 10s strace -f -p 5954 -c -q
> % time seconds  usecs/call callserrors syscall
> -- --- --- - - 
>
> *88.33 1712.798399   16674102723 22191 futex* 3.98
> 77.0987304356 17700   read
>  3.27   63.474795  394253   16129 restart_syscall
>  3.23   62.601530   29768  2103   epoll_wait
>
> On searching we found the following bug with the RHEL 6.6, CentOS 6.6
> kernel seems to be a probable cause for the issue:
>
> https://docs.datastax.com/en/landing_page/doc/landing_page/
> troubleshooting/cassandra/fetuxWaitBug.html
>
> The patch fix mentioned in the doc is also not present in our kernel.
>
> sudo rpm -q --changelog kernel-`uname -r` | grep futex | grep ref
> - [kernel] futex_lock_pi() key refcnt fix (Danny Feng) [566347]
> {CVE-2010-0623}
>
> Can some who has faced and resolved this issue help us here.
>
> Thanks,
> Abhishek
>
>
> --
> Ben Bromhead
> CTO | Instaclustr 
> +1 650 284 9692
> Managed Cassandra / Spark on AWS, Azure and Softlayer
>


Cassandra Read / Write Benchmarks with Stress - Public listing

2016-10-19 Thread Bhuvan Rawal
Hi,

Is there any public listing of cassandra performance test results with
cstar or cassandra-stress for read and write, with mention of
configurations modified from default and cassandra version.

It would be useful to not redo and do optimisations for cassandra wrt
Threadpools / JVM tuning / Caching which have already been tried before and
can be observed from past results for a faster approach to best tuning.

In case If such a thing does not exist currently, I propose a listing
possibly on Cassandra Wiki with a format say:

*# CPU* *# RAM* *# Nodes* *Latencies* *Throughput* *Con. Reads* *Con.
Writes* *Con. Compactors* *JVM Strategy* *Heap Size* *New Gen* *Other JVM
Params* *Link to Images / Detailed Blog*

Kindly suggest.

Thanks & Regards,
Bhuvan


Re: An extremely fast cassandra table full scan utility

2016-10-03 Thread Bhuvan Rawal
Hi Jonathan,

If full scan is a regular requirement then setting up a spark cluster in
locality with Cassandra nodes makes perfect sense. But supposing that it is
a one off requirement, say a weekly or a fortnightly task, a spark cluster
could be an added overhead with additional capacity, resource planning as
far as operations / maintenance is concerned.

So this could be thought a simple substitute for a single threaded scan
without additional efforts to setup and maintain another technology.

Regards,
Bhuvan

On Tue, Oct 4, 2016 at 1:37 AM, siddharth verma  wrote:

> Hi Jon,
> It wan't allowed.
> Moreover, if someone who isn't familiar with spark, and might be new to
> map filter reduce etc. operations, could also use the utility for some
> simple operations assuming a sequential scan of the cassandra table.
>
> Regards
> Siddharth Verma
>
> On Tue, Oct 4, 2016 at 1:32 AM, Jonathan Haddad  wrote:
>
>> Couldn't set up as couldn't get it working, or its not allowed?
>>
>> On Mon, Oct 3, 2016 at 3:23 PM Siddharth Verma <
>> verma.siddha...@snapdeal.com> wrote:
>>
>>> Hi Jon,
>>> We couldn't setup a spark cluster.
>>>
>>> For some use case, a spark cluster was required, but for some reason we
>>> couldn't create spark cluster. Hence, one may use this utility to iterate
>>> through the entire table at very high speed.
>>>
>>> Had to find a work around, that would be faster than paging on result
>>> set.
>>>
>>> Regards
>>>
>>> Siddharth Verma
>>> *Software Engineer I - CaMS*
>>> *M*: +91 9013689856, *T*: 011 22791596 *EXT*: 14697
>>> CA2125, 2nd Floor, ASF Centre-A, Jwala Mill Road,
>>> Udyog Vihar Phase - IV, Gurgaon-122016, INDIA
>>> Download Our App
>>> [image: A]
>>> 
>>>  [image:
>>> A]
>>> 
>>>  [image:
>>> W]
>>> 
>>>
>>> On Tue, Oct 4, 2016 at 12:41 AM, Jonathan Haddad 
>>> wrote:
>>>
>>> It almost sounds like you're duplicating all the work of both spark and
>>> the connector. May I ask why you decided to not use the existing tools?
>>>
>>> On Mon, Oct 3, 2016 at 2:21 PM siddharth verma <
>>> sidd.verma29.l...@gmail.com> wrote:
>>>
>>> Hi DuyHai,
>>> Thanks for your reply.
>>> A few more features planned in the next one(if there is one) like,
>>> custom policy keeping in mind the replication of token range on specific
>>> nodes,
>>> fine graining the token range(for more speedup),
>>> and a few more.
>>>
>>> I think, as fine graining a token range,
>>> If one token range is split further in say, 2-3 parts, divided among
>>> threads, this would exploit the possible parallelism on a large scaled out
>>> cluster.
>>>
>>> And, as you mentioned the JIRA, streaming of request, that would of huge
>>> help with further splitting the range.
>>>
>>> Thanks once again for your valuable comments. :-)
>>>
>>> Regards,
>>> Siddharth Verma
>>>
>>>
>>>
>


Re: An extremely fast cassandra table full scan utility

2016-10-03 Thread Bhuvan Rawal
It will be interesting to have a comparison with spark here for basic use
cases.

>From a naive observation it appears that this could be slower than spark as
a lot of data is streamed over network.

On the other hand in this approach we have seen that Young GC takes nearly
full CPU (possibly because a lot of data I moved on and off heap, which has
been seen as Young Gen keeps getting empty and full sometimes multiple
times a second) and that should be there with spark as well as it will be
calling Cassandra driver, on top of that Spark cluster will be sharing same
compute resources where it does filtering/doing operations on data. If we
have an appropriately sized client machine with enough network bandwidth
this could potentially work faster, ofcourse for basic scanning use cases.

Which of these assumptions seems to be more appropriate?

On Mon, Oct 3, 2016 at 11:40 PM, DuyHai Doan <doanduy...@gmail.com> wrote:

> Hello Siddarth
>
> I just throw an eye over the architecture diagram. The idea of using
> multiple threads, one for each token range is great. It help maxing out
> parallelism.
>
> With https://issues.apache.org/jira/browse/CASSANDRA-11521 it would be
> even faster.
>
> On Mon, Oct 3, 2016 at 7:51 PM, siddharth verma <
> sidd.verma29.l...@gmail.com> wrote:
>
>> Hi,
>> I was working on a utility which can be used for cassandra full table
>> scan, at a tremendously high velocity, cassandra fast full table scan.
>> How fast?
>> The script dumped ~ 229 million rows in 116 seconds, with a cluster of
>> size 6 nodes.
>> Data transfer rates were upto 25MBps was observed on cassandra nodes.
>>
>> For some use case, a spark cluster was required, but for some reason we
>> couldn't create spark cluster. Hence, one may use this utility to iterate
>> through the entire table at very high speed.
>>
>> But now for any full scan, I use it freely for my adhoc java programs to
>> manipulate or aggregate cassandra data.
>>
>> You can customize the options, setting fetch size, consistency level,
>> degree of parallelism(number of threads) according to your need.
>>
>> You can visit https://github.com/siddv29/cfs to go through the code, see
>> the logic behind it, or try it in your program.
>> A sample program is also provided.
>>
>> I coded this utility in java.
>>
>> Bhuvan Rawal(bhu1ra...@gmail.com) and I worked on this concept.
>> For python you may visit his blog(http://casualreflections.
>> io/tech/cassandra/python/Multiprocess-Producer-Cassandra-Python) and
>> github(https://gist.github.com/bhuvanrawal/93c5ae6cdd020de47
>> e0981d36d2c0785)
>>
>> Looking forward to your suggestions and comments.
>>
>> P.S. Give it a try. Trust me, the iteration speed is awesome!!
>> It is a bare application, built asap. If you would like to contribute to
>> the java utility, add or build up on it, do reach out
>> sidd.verma29.li...@gmail.com
>>
>> Thanks and Regards,
>> Siddharth Verma
>> (previous email id on this mailing list : verma.siddha...@snapdeal.com)
>>
>
>


High CPU usage by cqlsh when network is disconnected on client

2016-09-30 Thread Bhuvan Rawal
Hi,

We are using Cassandra 3.6 and I have been facing this issue for a while.
When I connect to a cassandra cluster using cqlsh and disconnect the
network keeping cqlsh on, I get really high cpu utilization on client by
cqlsh python process. On network reconnect things return back to normal.


​On debugging a particular process with strace I get a lot of lines like:
[pid  8449] connect(4, {sa_family=AF_INET, sin_port=htons(9042),
sin_addr=inet_addr("10.20.34.11")}, 16) = -1 ENETUNREACH (Network is
unreachable)
[pid  8449] close(4)= 0
[pid  8449] futex(0x7f39a8001360, FUTEX_WAKE_PRIVATE, 1) = 1
[pid  5734] <... futex resumed> )   = 0
[pid  5734] futex(0x1956fb0,
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, 

[pid  8449] futex(0x1956fb0, FUTEX_WAKE_PRIVATE, 1) = 1
[pid  5734] <... futex resumed> )   = 0
[pid  5734] socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 4
[pid  5734] fcntl(4, F_GETFL)   = 0x2 (flags O_RDWR)
[pid  5734] fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid  5734] connect(4, {sa_family=AF_INET, sin_port=htons(9042),
sin_addr=inet_addr("10.20.34.11")}, 16) = -1 ENETUNREACH (Network is
unreachable)
[pid  5734] close(4)= 0
[pid  5734] socket(PF_INET, SOCK_STREAM, IPPROTO_TCP 
[pid  8449] futex(0x7f39a40aa390,
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, 



Shall I create a jira for the same?

Thanks & Regards,
Bhuvan


Iterating over a table with multiple producers [Python]

2016-09-25 Thread Bhuvan Rawal
Hi,

Its a common occurrence where full scan of Cassandra table is required. One
of the most common requirement is to get the count of rows in a table. As
Cassandra doesn't keep count information stored anywhere (A node may not
have any clue about writes happening on other nodes) when we aggregate
using count(*) essentially all rows are being sent to coordinator by other
nodes, which is not really recommended and adds pressure on the
coordinator's heap.

Another common use case may be to filter by a cell value on which secondary
index isnt created. It may not be efficient to create a secondary index for
a just a one off requirement, a single full scan can again resolve it.

I worked on scans using single producer but it became pretty time consuming
as the size of table grew big. With motivation from java driver test cases
-(Datastax Java Driver
)
I worked on multi token range scan.

This approach gave pretty interesting results (of course depending on the
client machine and cluster size) and i though of sharing it with @users. We
have achieved in excess of 1.5 Million Rows per second scan by using 50
workers on a 6 node cluster which was pretty cool. It can be made faster on
larger clusters and better client machine. This approach has been tested on
a 710 Mil Row table and the scan took 473 seconds without overwhelming
Cassandra nodes.

This has been discussed in detail on my blog
,
sample code at github
.
Feel free to reach out if I can help / there could be better way out.

Regards,
Bhuvan

# Note - 1. Paging set reinjection feature has been used in case of
exception which is new in 3.7 driver which makes failover pretty easy
# This may not beat spark but If you dont have spark infra setup in
locality with Cassandra this could be a pretty good way to get things done
quickly.


Re: Having secondary indices limited to analytics dc

2016-09-18 Thread Bhuvan Rawal
Created CASSANDRA-12663
<https://issues.apache.org/jira/browse/CASSANDRA-12663> pls feel free to
make edits. From a birds eye view it seems a bit ineffecient to keep doing
computations and generating data which may not be put to use. (A user may
never read via Secondary Indices on primary transactional DC but he/she is
currently forced to create them on every dc in cluster).

On Mon, Sep 19, 2016 at 1:05 AM, Jonathan Haddad <j...@jonhaddad.com> wrote:

> I don't see why having per DC indexes would be an issue, from a technical
> standpoint.  I suggest putting in a JIRA for it, it's a good idea (if it
> doesn't exist already).  Post back to the ML with the issue #.
>
> On Sun, Sep 18, 2016 at 12:26 PM Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>
>> Can it be possible with change log feature implemented in CASSANDRA-8844
>> <https://issues.apache.org/jira/browse/CASSANDRA-8844>?  i.e. to have
>> two clusters (With different schema definitions for secondary indices) and
>> segregating analytics workload on the other cluster with CDC log shipper
>> enabled on parent DC which is taking care of transactional workload?
>>
>> On Sun, Sep 18, 2016 at 9:30 PM, Dorian Hoxha <dorian.ho...@gmail.com>
>> wrote:
>>
>>> Only way I know is in elassandra <https://github.com/vroyer/elassandra>.
>>> You spin nodes in dc1 as elassandra (having data + indexes) and in dc2 as
>>> cassandra (having only data).
>>>
>>> On Sun, Sep 18, 2016 at 5:43 PM, Bhuvan Rawal <bhu1ra...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Is it possible to have secondary indices (SASI or native ones) defined
>>>> on a table restricted to a particular DC? For instance it is very much
>>>> possible in mysql to have a parent server on which writes are being done
>>>> without any indices (other than the required ones), and to have indices on
>>>> replica db's, this helps the parent database to be lightweight and free
>>>> from building secondary index on every write.
>>>>
>>>> For analytics & auditing purposes it is essential to serve different
>>>> access patterns than that modeled from a partition key fetch perspective,
>>>> although a limited reads are needed by users but if enabled cluster wide it
>>>> will require index write for every row written on that table on every
>>>> single node on every DC even the one which may be serving read operations.
>>>>
>>>> What could be the potential means to solve this problem inside of
>>>> cassandra (Not having to ship off the data into elasticsearch etc).
>>>>
>>>> Best Regards,
>>>> Bhuvan
>>>>
>>>
>>>
>>


Re: Having secondary indices limited to analytics dc

2016-09-18 Thread Bhuvan Rawal
Can it be possible with change log feature implemented in CASSANDRA-8844
<https://issues.apache.org/jira/browse/CASSANDRA-8844>?  i.e. to have two
clusters (With different schema definitions for secondary indices) and
segregating analytics workload on the other cluster with CDC log shipper
enabled on parent DC which is taking care of transactional workload?

On Sun, Sep 18, 2016 at 9:30 PM, Dorian Hoxha <dorian.ho...@gmail.com>
wrote:

> Only way I know is in elassandra <https://github.com/vroyer/elassandra>.
> You spin nodes in dc1 as elassandra (having data + indexes) and in dc2 as
> cassandra (having only data).
>
> On Sun, Sep 18, 2016 at 5:43 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>
>> Hi,
>>
>> Is it possible to have secondary indices (SASI or native ones) defined on
>> a table restricted to a particular DC? For instance it is very much
>> possible in mysql to have a parent server on which writes are being done
>> without any indices (other than the required ones), and to have indices on
>> replica db's, this helps the parent database to be lightweight and free
>> from building secondary index on every write.
>>
>> For analytics & auditing purposes it is essential to serve different
>> access patterns than that modeled from a partition key fetch perspective,
>> although a limited reads are needed by users but if enabled cluster wide it
>> will require index write for every row written on that table on every
>> single node on every DC even the one which may be serving read operations.
>>
>> What could be the potential means to solve this problem inside of
>> cassandra (Not having to ship off the data into elasticsearch etc).
>>
>> Best Regards,
>> Bhuvan
>>
>
>


Having secondary indices limited to analytics dc

2016-09-18 Thread Bhuvan Rawal
Hi,

Is it possible to have secondary indices (SASI or native ones) defined on a
table restricted to a particular DC? For instance it is very much possible
in mysql to have a parent server on which writes are being done without any
indices (other than the required ones), and to have indices on replica
db's, this helps the parent database to be lightweight and free from
building secondary index on every write.

For analytics & auditing purposes it is essential to serve different access
patterns than that modeled from a partition key fetch perspective, although
a limited reads are needed by users but if enabled cluster wide it will
require index write for every row written on that table on every single
node on every DC even the one which may be serving read operations.

What could be the potential means to solve this problem inside of cassandra
(Not having to ship off the data into elasticsearch etc).

Best Regards,
Bhuvan


Bootstrapping multiple cassandra nodes simultaneously in existing dc

2016-09-11 Thread Bhuvan Rawal
Hi,

We are running Cassandra 3.6 and want to bump up Cassandra nodes in an
existing datacenter from 3 to 12 (plan to move to r3.xlarge machines to
leverage more memory instead of m4.2xlarge). Bootstrapping a node would
take 7-8 hours.

If this activity is performed serially then it will take 5-6 days. I had a
look at CASSANDRA-7069
 and a bit of
discussion in the past at -
http://grokbase.com/t/cassandra/user/147gcqvybg/adding-more-nodes-into-the-cluster.
Wanted to know if the limitation is still applicable and race condition
could occur in 3.6 version.

If this is not the case can we add a new datacenter as mentioned here
opsAddDCToCluster

and
bootstrap multiple nodes simultaneously by keeping auto_bootstrap false in
cassandra.yaml and rebuilding nodes simultaneously in the new dc?


Thanks & Regards,
Bhuvan


Re: Isolation in case of Single Partition Writes and Batching with LWT

2016-09-09 Thread Bhuvan Rawal
As per this
<https://docs.datastax.com/en/cql/3.3/cql/cql_reference/batch_r.html> doc
conditional batches can contain queries only belonging to that partition.
On trying it in 3.6 I got this exception as expected:
InvalidRequest: Error from server: code=2200 [Invalid query] message="Batch
with conditions cannot span multiple partitions"

On trying single partition batch with multiple LWT statements cassandra
accepted them at times and rejected the complete batch statement based on
the other LWT. I mean in the below batch
BEGIN BATCH
Statement 1 IF SOME CONDITION;
Statement 2 IF SOME CONDITION2;
Statement 3;
APPLY BATCH;

LWT of Either of Statement 1/2 was being observed of batch to be successful
or fail and as per this doc
<https://docs.datastax.com/en/cql/3.3/cql/cql_reference/batch_r.html> "If
one statement in a batch is a conditional update, the conditional logic
must return true, or the entire batch fails." Thats what must be
essentially happening and therefore having more than one lwt may not make a
lot of sence.

One query still remains though, can single partition batch considered to be
isolated per replica. Say if there are 5 rows in a partition and we are
updating all using lwt the clients should read either all of them old or
all of them during batch update.

Will be glad if someone can  clarify the above doubt.



On Tue, Sep 6, 2016 at 11:18 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:

> Hi,
>
> We are working to solve on a multi threaded distributed design which in
> which a thread reads current state from Cassandra (Single partition ~ 20
> Rows), does some computation and saves it back in. But it needs to be
> ensured that in between reading and writing by that thread any other thread
> should not have saved any operation on that partition.
>
> We have thought of a solution for the same - *having a write_time column*
> in the schema and making it static. Every time the thread picks up a job
> read will be performed with LOCAL_QUORUM. While writing into Cassandra
> batch will contain a LWT (IF write_time is read time) otherwise read will
> be performed and computation will be done again and so on. This will ensure
> that while saving partition is in a state it was read from.
>
> In order to avoid race condition we need to ensure couple of things:
>
> 1. While saving data in a batch with a single partition (*Rows may be
> Updates, Deletes, Inserts)* are they Isolated per replica node. (Not
> necessarily on a cluster as a whole). Is there a possibility of client
> reading partial rows?
>
> 2. If we do a LOCAL_QUORUM read and LOCAL_QUORUM writes in this case could
> there a chance of inconsistency in this case (When LWT is being used in
> batches).
>
> 3. Is it possible to use multiple LWT in a single Batch? In general how
> does LWT performs with Batch and is Paxos acted on before batch execution?
>
> Can someone help us with this?
>
> Thanks & Regards,
> Bhuvan
>
>


Isolation in case of Single Partition Writes and Batching with LWT

2016-09-06 Thread Bhuvan Rawal
Hi,

We are working to solve on a multi threaded distributed design which in
which a thread reads current state from Cassandra (Single partition ~ 20
Rows), does some computation and saves it back in. But it needs to be
ensured that in between reading and writing by that thread any other thread
should not have saved any operation on that partition.

We have thought of a solution for the same - *having a write_time column*
in the schema and making it static. Every time the thread picks up a job
read will be performed with LOCAL_QUORUM. While writing into Cassandra
batch will contain a LWT (IF write_time is read time) otherwise read will
be performed and computation will be done again and so on. This will ensure
that while saving partition is in a state it was read from.

In order to avoid race condition we need to ensure couple of things:

1. While saving data in a batch with a single partition (*Rows may be
Updates, Deletes, Inserts)* are they Isolated per replica node. (Not
necessarily on a cluster as a whole). Is there a possibility of client
reading partial rows?

2. If we do a LOCAL_QUORUM read and LOCAL_QUORUM writes in this case could
there a chance of inconsistency in this case (When LWT is being used in
batches).

3. Is it possible to use multiple LWT in a single Batch? In general how
does LWT performs with Batch and is Paxos acted on before batch execution?

Can someone help us with this?

Thanks & Regards,
Bhuvan


Performance impact of wide rows on read heavy workload

2016-07-21 Thread Bhuvan Rawal
Hi,

We are trying to evaluate read performance impact of having a wide row by
pushing a partition out into clustering column. From all the information I
could gather[1]

 [2]

 [3]  Key Cache as well
as Partition Index point to Block Location of partition on the disk.

In case if we have a schema like below which would result in a wide table
if pk is of high cardinality (Say Month in a time series data):

CREATE TABLE ks.wide_row_table (
pk int,
ck1 bigint,
ck2 text,
v1 text,
v2 text,
v3 bigint,
PRIMARY KEY (pk, ck1, ck2)
);

Suppose that a there is only one SSTable for this table at this instance
and specific partition has reached 100MB will reading the first row by
specifying first 0th row in the partition same as the last row in the
partition (At 100 MB).

In other words is there any heuristic to determine the disk offset by
clustering column after partition key is specified to locate to the block
in the disk or in the 2nd case complete 100MB partition will have to be
scanned in order to figure out the relevant row. For simplicity sake lets
assume that Row cache & OS page cache is disabled and all reads are hitting
disk.

Thanks & Regards,
Bhuvan


Re: Slow nodetool response time

2016-06-22 Thread Bhuvan Rawal
Thanks for your reply Sebastian. We have a 3 node dev cluster setup on
local servers. I tried nodetool commands on all 3 of them and response time
was same (excess of 17 Seconds) and most in blocked state as you found out.
I tried fiddling out with /etc/hosts file and on adding the line:

 

This change Made the wait nearly minmal and has possibly resolved the
problem. I can share the response on the same machine for which I had
earlier shared (it was greater than 17 secs earlier)
$ time nodetool version
ReleaseVersion: 3.0.3

real 0m*1.197s*
user 0m1.963s
sys 0m0.284s

This is a bit puzzling as I checked and ping is working in sub millisecond
from host to self and other two servers, wondering where that time was
being spent earlier.

Thanks & Regards,
Bhuvan

On Wed, Jun 22, 2016 at 7:10 PM, Sebastian Estevez <
sebastian.este...@datastax.com> wrote:

> Sounds like your process is spending a lot of time in blocked state (real
> - user - sys). Check your os subsystems, maybe your machine is bogged down
> by other work.
>
> FWIW, my time in an idle system is about 2 seconds but can go up to ~13
> seconds on a busy system with 70% cpu utilized. No difference between 1 and
> 3 node setups.
>
> All the best,
>
>
> [image: datastax_logo.png] <http://www.datastax.com/>
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>
> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
> <https://twitter.com/datastax> [image: g+.png]
> <https://plus.google.com/+Datastax/about>
> <http://feeds.feedburner.com/datastax>
> <http://goog_410786983>
>
>
> <http://www.datastax.com/gartner-magic-quadrant-odbms>
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Wed, Jun 22, 2016 at 9:03 AM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>
>> Hi,
>>
>> We have been facing slowness in getting response from nodetool for any of
>> its subcommand. On the same version on AWS it responds really fast but on
>> local 1 node machine or local DC cluster it performs very slow.
>>
>> On Local DC :
>> *$ time nodetool version*
>> ReleaseVersion: 3.0.3
>>
>> real 0m*17.582s*
>> user 0m2.334s
>> sys 0m0.470s
>>
>> On AWS:
>> *$ time nodetool version*
>> ReleaseVersion: 3.0.3
>>
>> real 0m*1.084s*
>> user 0m1.772s
>> sys 0m0.363s
>>
>> Any way by which its speed can be increased?
>>
>> Thanks & Regards,
>> Bhuvan
>>
>
>


Re: High Heap Memory usage during nodetool repair in Cassandra 3.0.3

2016-06-22 Thread Bhuvan Rawal
Thanks Markus that may have helped the cause as well. Certainly repair
works great beyond 3.0.3 and we have tested it on 3.5+ as well as on 3.0.7.

On that note, it is evident that there are been a number of optimizations
on various fronts post 3.0.3, I would like to know the general opinion
about stable version for production deployment.

I understand that it has been asked multiple times but while upgrading to
3.5 we encountered an issue as mentioned by Atul which was very critical
for us and we could not have compromised with it, as it would have required
a lot of code changes. How far can we consider 3.6+ versions for Production
Deployment?

On a number of instances it was suggested that Cassandra versions x.y.6+
can be assumed to be production stable as most of the major fixes are in
place by then citing past history. How far can we apply that logic to 3.x
series? As the fixes in future versions arent ported back to previous
versions in tick tock series it becomes a bit tricky as if a feature in 3.x
is being used and a bug is encountered then option is to either go to:

1. 3.(x-1) or 3.(x-2) or
2. go higher up in the series (if new version has been released).

In case of 1. if the feature isnt present in 3.x-- series then a lot of
code change may be required in application using cassandra.

(Say for example if someone deploys 3.5 and starts using SASI on production
and encounters an issue which cant be compromised with and changing
cassandra version with that issue not present is only option), then
possibly 3.3 is an option but that will require them to use an alternative
for SASI if they had used. In case of a surprise situation on production
changing app code and working out alternatives may not be so brisk.

Although chances are less but it can never be sure what issues new features
in 3.(x+1) may bring, so in that case going forward to new version is also
a bit dicy.

If a similar situation arises in earlier release strategy then one can near
blindly go ahead with new release in the same series because it is a bug
fix. Im sure a lot of thought must have been put in place to adopt tick
tock strategy (possibly to roll out a number of features which were
pending).

But from a user point of view using a feature in a new version and even
after testing for that feature exposes a risk of issues that may arise
because of other features developed and fixes for which will not be
backported and going forward or backward both may not be an option.

Will be glad if we can be helped mitigate this apprehension.


On Wed, Jun 22, 2016 at 6:33 PM, Marcus Eriksson <krum...@gmail.com> wrote:

> it could also be CASSANDRA-11412 if you have many sstables and vnodes
>
> On Wed, Jun 22, 2016 at 2:50 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>
>> Thanks for the info Paulo, Robert. I tried further testing with other
>> parameters and it was prevalent. We could be either 11739, 11206. But im
>> spektical about 11739 because repair works well in 3.5 and 11739 seems to
>> be fixed for 3.7/3.0.7.
>>
>> We may possibly resolve this by increasing heap size thereby reducing
>> some page cache bandwidth before upgrading to higher versions.
>>
>> On Mon, Jun 20, 2016 at 10:00 PM, Paulo Motta <pauloricard...@gmail.com>
>> wrote:
>>
>>> You could also be hitting CASSANDRA-11739, which was fixed on 3.0.7 and
>>> could potentially cause OOMs for long-running repairs.
>>>
>>>
>>> 2016-06-20 13:26 GMT-03:00 Robert Stupp <sn...@snazy.de>:
>>>
>>>> One possibility might be CASSANDRA-11206 (Support large partitions on
>>>> the 3.0 sstable format), which reduces heap usage for other operations
>>>> (like repair, compactions) as well.
>>>> You can verify that by setting column_index_cache_size_in_kb in c.yaml
>>>> to a really high value like 1000 - if you see the same behaviour in 3.7
>>>> with that setting, there’s not much you can do except upgrading to 3.7 as
>>>> that change went into 3.6 and not into 3.0.x.
>>>>
>>>> —
>>>> Robert Stupp
>>>> @snazy
>>>>
>>>> On 20 Jun 2016, at 18:13, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>>>>
>>>> Hi All,
>>>>
>>>> We are running Cassandra 3.0.3 on Production with Max Heap Size of 8GB.
>>>> There has been a consistent issue with nodetool repair for a while and
>>>> we have tried issuing it with multiple options --pr, --local as well,
>>>> sometimes node went down with Out of Memory error and at times nodes did
>>>> stopped connecting any connection, even jmx nodetool commands.
>>>>
>>>> On trying with same data on 3.7 Repair Ran successfully without
>&

Slow nodetool response time

2016-06-22 Thread Bhuvan Rawal
Hi,

We have been facing slowness in getting response from nodetool for any of
its subcommand. On the same version on AWS it responds really fast but on
local 1 node machine or local DC cluster it performs very slow.

On Local DC :
*$ time nodetool version*
ReleaseVersion: 3.0.3

real 0m*17.582s*
user 0m2.334s
sys 0m0.470s

On AWS:
*$ time nodetool version*
ReleaseVersion: 3.0.3

real 0m*1.084s*
user 0m1.772s
sys 0m0.363s

Any way by which its speed can be increased?

Thanks & Regards,
Bhuvan


Re: High Heap Memory usage during nodetool repair in Cassandra 3.0.3

2016-06-22 Thread Bhuvan Rawal
Thanks for the info Paulo, Robert. I tried further testing with other
parameters and it was prevalent. We could be either 11739, 11206. But im
spektical about 11739 because repair works well in 3.5 and 11739 seems to
be fixed for 3.7/3.0.7.

We may possibly resolve this by increasing heap size thereby reducing some
page cache bandwidth before upgrading to higher versions.

On Mon, Jun 20, 2016 at 10:00 PM, Paulo Motta <pauloricard...@gmail.com>
wrote:

> You could also be hitting CASSANDRA-11739, which was fixed on 3.0.7 and
> could potentially cause OOMs for long-running repairs.
>
>
> 2016-06-20 13:26 GMT-03:00 Robert Stupp <sn...@snazy.de>:
>
>> One possibility might be CASSANDRA-11206 (Support large partitions on the
>> 3.0 sstable format), which reduces heap usage for other operations (like
>> repair, compactions) as well.
>> You can verify that by setting column_index_cache_size_in_kb in c.yaml to
>> a really high value like 1000 - if you see the same behaviour in 3.7
>> with that setting, there’s not much you can do except upgrading to 3.7 as
>> that change went into 3.6 and not into 3.0.x.
>>
>> —
>> Robert Stupp
>> @snazy
>>
>> On 20 Jun 2016, at 18:13, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>>
>> Hi All,
>>
>> We are running Cassandra 3.0.3 on Production with Max Heap Size of 8GB.
>> There has been a consistent issue with nodetool repair for a while and
>> we have tried issuing it with multiple options --pr, --local as well,
>> sometimes node went down with Out of Memory error and at times nodes did
>> stopped connecting any connection, even jmx nodetool commands.
>>
>> On trying with same data on 3.7 Repair Ran successfully without
>> encountering any of the above mentioned issues. I then tried increasing
>> heap to 16GB on 3.0.3 and repair ran successfully.
>>
>> I then analyzed memory usage during nodetool repair for 3.0.3(16GB heap)
>> vs 3.7 (8GB Heap) and 3.0.3 occupied 11-14 GB at all times, whereas 3.7
>> spiked between 1-4.5 GB while repair runs. As they ran on same dataset
>> and unrepaired data with full repair.
>>
>> We would like to know if it is a known bug that was fixed post 3.0.3 and
>> there could be a possible way by which we can run repair on 3.0.3 without
>> increasing heap size as for all other activities 8GB works for us.
>>
>> PFA the visualvm snapshots.
>>
>> 
>> ​3.0.3 VisualVM Snapshot, consistent heap usage of greater than 12 GB.
>>
>>
>> 
>> ​3.7 VisualVM Snapshot, 8GB Max Heap and max heap usage till about 5GB.
>>
>> Thanks & Regards,
>> Bhuvan Rawal
>>
>>
>> PS: In case if the snapshots are not visible, they can be viewed from the
>> following links:
>> 3.0.3:
>> https://s31.postimg.org/4e7ifsjaz/Screenshot_from_2016_06_20_21_06_09.png
>> 3.7:
>> https://s31.postimg.org/xak32s9m3/Screenshot_from_2016_06_20_21_05_57.png
>>
>>
>>
>


Re: Backup strategy

2016-06-16 Thread Bhuvan Rawal
Also if we talk about backup strategy for Cassandra Data then essentially
there are couple of strategies that are adopted:

1. Incremental Backups. The old sstables will remain inside a backup
directory and can be shipped to a storage location like AWS Glacier, etc.
2. Snapshotting : Hardlinks of sstables will get created. This is a very
fast process and latest data is captured into sstables after flushing
memtables, snapshots will be created in snapshots directory. But snapshot
does not provide you the feature to go back to a certain point in time but
incremental backups give you that feature.

Depending on the use case, you can use 1 or 2 or both.

On Fri, Jun 17, 2016 at 4:46 AM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:

> What kind of data are we talking here?
> Is it time series data with infrequent updates and only inserts or
> frequently updated data. How frequently is old data read. I ask this
> because your Node size planning and Compaction Strategy will essentially
> depend on these.
>
> I have known people go upto 3-5 TB per node if data is not updated
> frequently.
>
> Regards,
> Bhuvan
>
> On Fri, Jun 17, 2016 at 4:31 AM, <vasu.no...@gmail.com> wrote:
>
>> Bhuvan,
>>
>> Thanks for the info but actually I'm not looking for migration strategy.
>> just want to backup strategy and retention policy best practices
>>
>> Thanks,
>> Vasu
>>
>> On Jun 16, 2016, at 6:51 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>>
>> Hi Vasu,
>>
>> Planet Cassandra has a documentation page for basic info about migrating
>> to cassandra from MySQL. What to expect and what not to. It can be found
>> here <http://planetcassandra.org/mysql-to-cassandra-migration/>.
>>
>> I had a look at this slide
>> <http://www.slideshare.net/planetcassandra/migration-best-practices-from-rdbms-to-cassandra-without-a-hitch>
>>  a
>> while back. It provides a pretty reliable 4 Phase Sync strategy, starting
>> from Slide 31. Also the QA session of the talk is informative too -
>> http://www.doanduyhai.com/blog/?p=1757.
>>
>> Best Regards,
>> Bhuvan
>>
>> On Fri, Jun 17, 2016 at 4:03 AM, <vasu.no...@gmail.com> wrote:
>>
>>> Hi ,
>>>
>>> I'm from relational world recently started working on Cassandra. I'm
>>> just wondering what is backup best practices for DB around 100 Tb with
>>> multi DC setup.
>>>
>>>
>>> Thanks,
>>> Vasu
>>
>>
>>
>


Re: Backup strategy

2016-06-16 Thread Bhuvan Rawal
What kind of data are we talking here?
Is it time series data with infrequent updates and only inserts or
frequently updated data. How frequently is old data read. I ask this
because your Node size planning and Compaction Strategy will essentially
depend on these.

I have known people go upto 3-5 TB per node if data is not updated
frequently.

Regards,
Bhuvan

On Fri, Jun 17, 2016 at 4:31 AM, <vasu.no...@gmail.com> wrote:

> Bhuvan,
>
> Thanks for the info but actually I'm not looking for migration strategy.
> just want to backup strategy and retention policy best practices
>
> Thanks,
> Vasu
>
> On Jun 16, 2016, at 6:51 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>
> Hi Vasu,
>
> Planet Cassandra has a documentation page for basic info about migrating
> to cassandra from MySQL. What to expect and what not to. It can be found
> here <http://planetcassandra.org/mysql-to-cassandra-migration/>.
>
> I had a look at this slide
> <http://www.slideshare.net/planetcassandra/migration-best-practices-from-rdbms-to-cassandra-without-a-hitch>
>  a
> while back. It provides a pretty reliable 4 Phase Sync strategy, starting
> from Slide 31. Also the QA session of the talk is informative too -
> http://www.doanduyhai.com/blog/?p=1757.
>
> Best Regards,
> Bhuvan
>
> On Fri, Jun 17, 2016 at 4:03 AM, <vasu.no...@gmail.com> wrote:
>
>> Hi ,
>>
>> I'm from relational world recently started working on Cassandra. I'm just
>> wondering what is backup best practices for DB around 100 Tb with multi DC
>> setup.
>>
>>
>> Thanks,
>> Vasu
>
>
>


Re: Backup strategy

2016-06-16 Thread Bhuvan Rawal
Hi Vasu,

Planet Cassandra has a documentation page for basic info about migrating to
cassandra from MySQL. What to expect and what not to. It can be found here
.

I had a look at this slide

a
while back. It provides a pretty reliable 4 Phase Sync strategy, starting
from Slide 31. Also the QA session of the talk is informative too -
http://www.doanduyhai.com/blog/?p=1757.

Best Regards,
Bhuvan

On Fri, Jun 17, 2016 at 4:03 AM,  wrote:

> Hi ,
>
> I'm from relational world recently started working on Cassandra. I'm just
> wondering what is backup best practices for DB around 100 Tb with multi DC
> setup.
>
>
> Thanks,
> Vasu


Re: select query on entire primary key returning more than one row in result

2016-06-14 Thread Bhuvan Rawal
Joel,

Id rather thank you for naming 11513 earlier in the mail, I would have been
lost in the code for a much longer time otherwise.

Repeating what Tianshi mentioned in 11513 - "*Cassandra community is
awesome! Should buy you a beer, Joel."* :)

On Wed, Jun 15, 2016 at 6:01 AM, Joel Knighton <joel.knigh...@datastax.com>
wrote:

> Great work, Bhuvan - I sat down after work to look at this more carefully.
>
> For a short summary, you are correct.
>
> For a longer summary, I initially thought the reproduction you provided
> would not run into the issue from 3.4/3.5 because it didn't select any
> static columns, which meant that it wouldn't have statics in its
> ColumnFilter (basically, the filter we apply when deciding if we need to
> look for the requested data in more SSTables). This was an incorrect
> understanding - in order to preserve the CQL semantic (see CASSANDRA-6588
> for details), we are including all columns, including the static columns,
> in the fetched columns, which means they are part of the ColumnFilter. I
> believe there may be an opportunity for an optimization here, but that's a
> whole different discussion. I now agree that these are the same issue.
>
> You are correct in your analysis that 3.4/3.5 are the only affected
> versions. It has been patched in release 3.6 forward and was not introduced
> until 3.4
>
> Thanks for sticking with me on this - I'm going to resolve CASSANDRA-12003
> as a duplicate of CASSANDRA-11513.
>
> On Tue, Jun 14, 2016 at 4:21 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>
>> Joel,
>>
>> Thanks for your reply, I have checked and found that the behavior is same
>> in case of CASSANDRA-11513
>> <https://issues.apache.org/jira/browse/CASSANDRA-11513>. I have verified
>> this behavior (for both 11513 & 12003) to occur in case of 3.4 & 3.5. They
>> both don't occur in 3.0.4, 3.6 & 3.7.
>>
>> Please find below the results of selecting only pk and clustering key
>> from 11513. It has also been verified that both issues occur while
>> selecting all / filtered rows therefore selection criteria is not an issue
>> filtering by WHERE is:
>>
>> cqlsh:ks> select pk,a from test0 where pk=0 and a=2;
>>
>>  pk | a
>> +---
>>   0 | 1
>>   0 | 2
>>   0 | 3
>>
>> We can verify this claim by applying 11513 Patch to 3.5 Tag and build &
>> test for 12003. If it is fixed then we can guarantee the claim. Let me
>> know if any further input may possibly be required here.
>>
>> On Wed, Jun 15, 2016 at 2:23 AM, Joel Knighton <
>> joel.knigh...@datastax.com> wrote:
>>
>>> The important part of that query is that it's selecting a static column
>>> (with select *), not whether it is filtering on one. In CASSANDRA-12003 and
>>> this thread, it looks like you're only selecting the primary and clustering
>>> columns. I'd be cautious about concluding that CASSANDRA-12003 and
>>> CASSANDRA-11513 are the same issue and that CASSANDRA-12003 is fixed.
>>>
>>> If you have a reproduction path for CASSANDRA-12003, I'd recommend
>>> attaching it to a ticket, and someone can investigate internals to see if
>>> CASSANDRA-11513 (or something else entirely) fixed the issue.
>>>
>>> On Tue, Jun 14, 2016 at 2:13 PM, Bhuvan Rawal <bhu1ra...@gmail.com>
>>> wrote:
>>>
>>>> Joel,
>>>>
>>>> If we look at the schema carefully:
>>>>
>>>> CREATE TABLE test0 (
>>>> pk int,
>>>> a int,
>>>> b text,
>>>> s text static,
>>>> PRIMARY KEY (*pk, a)*
>>>> );
>>>>
>>>> and filtering is performed on clustering column a and its not a static
>>>> column:
>>>>
>>>> select * from test0 where pk=0 and a=2;
>>>>
>>>>
>>>>
>>>> On Wed, Jun 15, 2016 at 12:39 AM, Joel Knighton <
>>>> joel.knigh...@datastax.com> wrote:
>>>>
>>>>> It doesn't seem to be an exact duplicate - CASSANDRA-11513 relies on
>>>>> you selecting a static column, which you weren't doing in the reported
>>>>> issue. That said, I haven't looked too closely.
>>>>>
>>>>> On Tue, Jun 14, 2016 at 2:07 PM, Bhuvan Rawal <bhu1ra...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I can reproduce CASSANDRA-11513
>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-11513> locally on
>>>>>> 3.5, possible duplicate.

Re: select query on entire primary key returning more than one row in result

2016-06-14 Thread Bhuvan Rawal
Joel,

Thanks for your reply, I have checked and found that the behavior is same
in case of CASSANDRA-11513
<https://issues.apache.org/jira/browse/CASSANDRA-11513>. I have verified
this behavior (for both 11513 & 12003) to occur in case of 3.4 & 3.5. They
both don't occur in 3.0.4, 3.6 & 3.7.

Please find below the results of selecting only pk and clustering key
from 11513.
It has also been verified that both issues occur while selecting all /
filtered rows therefore selection criteria is not an issue filtering by
WHERE is:

cqlsh:ks> select pk,a from test0 where pk=0 and a=2;

 pk | a
+---
  0 | 1
  0 | 2
  0 | 3

We can verify this claim by applying 11513 Patch to 3.5 Tag and build &
test for 12003. If it is fixed then we can guarantee the claim. Let me know
if any further input may possibly be required here.

On Wed, Jun 15, 2016 at 2:23 AM, Joel Knighton <joel.knigh...@datastax.com>
wrote:

> The important part of that query is that it's selecting a static column
> (with select *), not whether it is filtering on one. In CASSANDRA-12003 and
> this thread, it looks like you're only selecting the primary and clustering
> columns. I'd be cautious about concluding that CASSANDRA-12003 and
> CASSANDRA-11513 are the same issue and that CASSANDRA-12003 is fixed.
>
> If you have a reproduction path for CASSANDRA-12003, I'd recommend
> attaching it to a ticket, and someone can investigate internals to see if
> CASSANDRA-11513 (or something else entirely) fixed the issue.
>
> On Tue, Jun 14, 2016 at 2:13 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>
>> Joel,
>>
>> If we look at the schema carefully:
>>
>> CREATE TABLE test0 (
>> pk int,
>> a int,
>> b text,
>> s text static,
>> PRIMARY KEY (*pk, a)*
>> );
>>
>> and filtering is performed on clustering column a and its not a static
>> column:
>>
>> select * from test0 where pk=0 and a=2;
>>
>>
>>
>> On Wed, Jun 15, 2016 at 12:39 AM, Joel Knighton <
>> joel.knigh...@datastax.com> wrote:
>>
>>> It doesn't seem to be an exact duplicate - CASSANDRA-11513 relies on you
>>> selecting a static column, which you weren't doing in the reported issue.
>>> That said, I haven't looked too closely.
>>>
>>> On Tue, Jun 14, 2016 at 2:07 PM, Bhuvan Rawal <bhu1ra...@gmail.com>
>>> wrote:
>>>
>>>> I can reproduce CASSANDRA-11513
>>>> <https://issues.apache.org/jira/browse/CASSANDRA-11513> locally on
>>>> 3.5, possible duplicate.
>>>>
>>>> On Wed, Jun 15, 2016 at 12:29 AM, Joel Knighton <
>>>> joel.knigh...@datastax.com> wrote:
>>>>
>>>>> There's some precedent for similar issues with static columns in 3.5
>>>>> with https://issues.apache.org/jira/browse/CASSANDRA-11513 - a
>>>>> deterministic (or somewhat deterministic) path for reproduction would help
>>>>> narrow the issue down farther. I've played around locally with similar
>>>>> schemas (sans the stratio indices) and couldn't reproduce the issue.
>>>>>
>>>>> On Tue, Jun 14, 2016 at 1:41 PM, Bhuvan Rawal <bhu1ra...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Jira CASSANDRA-12003
>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-12003> Has been
>>>>>> created for the same.
>>>>>>
>>>>>> On Tue, Jun 14, 2016 at 11:54 PM, Atul Saroha <
>>>>>> atul.sar...@snapdeal.com> wrote:
>>>>>>
>>>>>>> Hi Tyler,
>>>>>>>
>>>>>>> This issue is mainly visible for tables having static columns, still
>>>>>>> investigating.
>>>>>>> We will try to test after removing lucene index but I don’t think
>>>>>>> this plug-in could led to change in behaviour of cassandra write to 
>>>>>>> table's
>>>>>>> memtable.
>>>>>>>
>>>>>>>
>>>>>>> -
>>>>>>> Atul Saroha
>>>>>>> *Lead Software Engineer*
>>>>>>> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
>>>>>>> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>>>>>>>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>>>>>>>
>>>>>>>

Re: select query on entire primary key returning more than one row in result

2016-06-14 Thread Bhuvan Rawal
I have verified this issue to be fixed in 3.6 and 3.7.
And the issue mentioned on this thread is fixed as well.

On Wed, Jun 15, 2016 at 12:43 AM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:

> Joel,
>
> If we look at the schema carefully:
>
> CREATE TABLE test0 (
> pk int,
> a int,
> b text,
> s text static,
> PRIMARY KEY (*pk, a)*
> );
>
> and filtering is performed on clustering column a and its not a static
> column:
>
> select * from test0 where pk=0 and a=2;
>
>
>
> On Wed, Jun 15, 2016 at 12:39 AM, Joel Knighton <
> joel.knigh...@datastax.com> wrote:
>
>> It doesn't seem to be an exact duplicate - CASSANDRA-11513 relies on you
>> selecting a static column, which you weren't doing in the reported issue.
>> That said, I haven't looked too closely.
>>
>> On Tue, Jun 14, 2016 at 2:07 PM, Bhuvan Rawal <bhu1ra...@gmail.com>
>> wrote:
>>
>>> I can reproduce CASSANDRA-11513
>>> <https://issues.apache.org/jira/browse/CASSANDRA-11513> locally on 3.5,
>>> possible duplicate.
>>>
>>> On Wed, Jun 15, 2016 at 12:29 AM, Joel Knighton <
>>> joel.knigh...@datastax.com> wrote:
>>>
>>>> There's some precedent for similar issues with static columns in 3.5
>>>> with https://issues.apache.org/jira/browse/CASSANDRA-11513 - a
>>>> deterministic (or somewhat deterministic) path for reproduction would help
>>>> narrow the issue down farther. I've played around locally with similar
>>>> schemas (sans the stratio indices) and couldn't reproduce the issue.
>>>>
>>>> On Tue, Jun 14, 2016 at 1:41 PM, Bhuvan Rawal <bhu1ra...@gmail.com>
>>>> wrote:
>>>>
>>>>> Jira CASSANDRA-12003
>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-12003> Has been
>>>>> created for the same.
>>>>>
>>>>> On Tue, Jun 14, 2016 at 11:54 PM, Atul Saroha <
>>>>> atul.sar...@snapdeal.com> wrote:
>>>>>
>>>>>> Hi Tyler,
>>>>>>
>>>>>> This issue is mainly visible for tables having static columns, still
>>>>>> investigating.
>>>>>> We will try to test after removing lucene index but I don’t think
>>>>>> this plug-in could led to change in behaviour of cassandra write to 
>>>>>> table's
>>>>>> memtable.
>>>>>>
>>>>>>
>>>>>> -
>>>>>> Atul Saroha
>>>>>> *Lead Software Engineer*
>>>>>> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
>>>>>> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>>>>>>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>>>>>>
>>>>>> On Tue, Jun 14, 2016 at 9:54 PM, Tyler Hobbs <ty...@datastax.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Is 'id' your partition key? I'm not familiar with the stratio
>>>>>>> indexes, but it looks like the primary key columns are both indexed.
>>>>>>> Perhaps this is related?
>>>>>>>
>>>>>>> On Tue, Jun 14, 2016 at 1:25 AM, Atul Saroha <
>>>>>>> atul.sar...@snapdeal.com> wrote:
>>>>>>>
>>>>>>>> After further debug, this issue is found in in-memory memtable as
>>>>>>>> doing nodetool flush + compact resolve the issue. And there is no batch
>>>>>>>> write used for this table which is showing issue.
>>>>>>>> Table properties:
>>>>>>>>
>>>>>>>> WITH CLUSTERING ORDER BY (f_name ASC)
>>>>>>>>> AND bloom_filter_fp_chance = 0.01
>>>>>>>>> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>>>>>>>> AND comment = ''
>>>>>>>>> AND compaction = {'class':
>>>>>>>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>>>>>>>> 'max_threshold': '32', 'min_threshold': '4'}
>>>>>>>>> AND compression = {'chunk_length_in_kb': '64', 'class':
>>>>>>>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>>>>>>> AND crc_check_chance = 1.0

Re: select query on entire primary key returning more than one row in result

2016-06-14 Thread Bhuvan Rawal
Joel,

If we look at the schema carefully:

CREATE TABLE test0 (
pk int,
a int,
b text,
s text static,
PRIMARY KEY (*pk, a)*
);

and filtering is performed on clustering column a and its not a static
column:

select * from test0 where pk=0 and a=2;



On Wed, Jun 15, 2016 at 12:39 AM, Joel Knighton <joel.knigh...@datastax.com>
wrote:

> It doesn't seem to be an exact duplicate - CASSANDRA-11513 relies on you
> selecting a static column, which you weren't doing in the reported issue.
> That said, I haven't looked too closely.
>
> On Tue, Jun 14, 2016 at 2:07 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>
>> I can reproduce CASSANDRA-11513
>> <https://issues.apache.org/jira/browse/CASSANDRA-11513> locally on 3.5,
>> possible duplicate.
>>
>> On Wed, Jun 15, 2016 at 12:29 AM, Joel Knighton <
>> joel.knigh...@datastax.com> wrote:
>>
>>> There's some precedent for similar issues with static columns in 3.5
>>> with https://issues.apache.org/jira/browse/CASSANDRA-11513 - a
>>> deterministic (or somewhat deterministic) path for reproduction would help
>>> narrow the issue down farther. I've played around locally with similar
>>> schemas (sans the stratio indices) and couldn't reproduce the issue.
>>>
>>> On Tue, Jun 14, 2016 at 1:41 PM, Bhuvan Rawal <bhu1ra...@gmail.com>
>>> wrote:
>>>
>>>> Jira CASSANDRA-12003
>>>> <https://issues.apache.org/jira/browse/CASSANDRA-12003> Has been
>>>> created for the same.
>>>>
>>>> On Tue, Jun 14, 2016 at 11:54 PM, Atul Saroha <atul.sar...@snapdeal.com
>>>> > wrote:
>>>>
>>>>> Hi Tyler,
>>>>>
>>>>> This issue is mainly visible for tables having static columns, still
>>>>> investigating.
>>>>> We will try to test after removing lucene index but I don’t think this
>>>>> plug-in could led to change in behaviour of cassandra write to table's
>>>>> memtable.
>>>>>
>>>>>
>>>>> -
>>>>> Atul Saroha
>>>>> *Lead Software Engineer*
>>>>> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
>>>>> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>>>>>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>>>>>
>>>>> On Tue, Jun 14, 2016 at 9:54 PM, Tyler Hobbs <ty...@datastax.com>
>>>>> wrote:
>>>>>
>>>>>> Is 'id' your partition key? I'm not familiar with the stratio
>>>>>> indexes, but it looks like the primary key columns are both indexed.
>>>>>> Perhaps this is related?
>>>>>>
>>>>>> On Tue, Jun 14, 2016 at 1:25 AM, Atul Saroha <
>>>>>> atul.sar...@snapdeal.com> wrote:
>>>>>>
>>>>>>> After further debug, this issue is found in in-memory memtable as
>>>>>>> doing nodetool flush + compact resolve the issue. And there is no batch
>>>>>>> write used for this table which is showing issue.
>>>>>>> Table properties:
>>>>>>>
>>>>>>> WITH CLUSTERING ORDER BY (f_name ASC)
>>>>>>>> AND bloom_filter_fp_chance = 0.01
>>>>>>>> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>>>>>>> AND comment = ''
>>>>>>>> AND compaction = {'class':
>>>>>>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>>>>>>> 'max_threshold': '32', 'min_threshold': '4'}
>>>>>>>> AND compression = {'chunk_length_in_kb': '64', 'class':
>>>>>>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>>>>>> AND crc_check_chance = 1.0
>>>>>>>> AND dclocal_read_repair_chance = 0.1
>>>>>>>> AND default_time_to_live = 0
>>>>>>>> AND gc_grace_seconds = 864000
>>>>>>>> AND max_index_interval = 2048
>>>>>>>> AND memtable_flush_period_in_ms = 0
>>>>>>>> AND min_index_interval = 128
>>>>>>>> AND read_repair_chance = 0.0
>>>>>>>> AND speculative_retry = '99PERCENTILE';
>>>>>>>> CREATE CUSTOM INDEX nbf_

Re: select query on entire primary key returning more than one row in result

2016-06-14 Thread Bhuvan Rawal
I can reproduce CASSANDRA-11513
<https://issues.apache.org/jira/browse/CASSANDRA-11513> locally on 3.5,
possible duplicate.

On Wed, Jun 15, 2016 at 12:29 AM, Joel Knighton <joel.knigh...@datastax.com>
wrote:

> There's some precedent for similar issues with static columns in 3.5 with
> https://issues.apache.org/jira/browse/CASSANDRA-11513 - a deterministic
> (or somewhat deterministic) path for reproduction would help narrow the
> issue down farther. I've played around locally with similar schemas (sans
> the stratio indices) and couldn't reproduce the issue.
>
> On Tue, Jun 14, 2016 at 1:41 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>
>> Jira CASSANDRA-12003
>> <https://issues.apache.org/jira/browse/CASSANDRA-12003> Has been created
>> for the same.
>>
>> On Tue, Jun 14, 2016 at 11:54 PM, Atul Saroha <atul.sar...@snapdeal.com>
>> wrote:
>>
>>> Hi Tyler,
>>>
>>> This issue is mainly visible for tables having static columns, still
>>> investigating.
>>> We will try to test after removing lucene index but I don’t think this
>>> plug-in could led to change in behaviour of cassandra write to table's
>>> memtable.
>>>
>>>
>>> -
>>> Atul Saroha
>>> *Lead Software Engineer*
>>> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
>>> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>>>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>>>
>>> On Tue, Jun 14, 2016 at 9:54 PM, Tyler Hobbs <ty...@datastax.com> wrote:
>>>
>>>> Is 'id' your partition key? I'm not familiar with the stratio indexes,
>>>> but it looks like the primary key columns are both indexed.  Perhaps this
>>>> is related?
>>>>
>>>> On Tue, Jun 14, 2016 at 1:25 AM, Atul Saroha <atul.sar...@snapdeal.com>
>>>> wrote:
>>>>
>>>>> After further debug, this issue is found in in-memory memtable as
>>>>> doing nodetool flush + compact resolve the issue. And there is no batch
>>>>> write used for this table which is showing issue.
>>>>> Table properties:
>>>>>
>>>>> WITH CLUSTERING ORDER BY (f_name ASC)
>>>>>> AND bloom_filter_fp_chance = 0.01
>>>>>> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>>>>> AND comment = ''
>>>>>> AND compaction = {'class':
>>>>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>>>>> 'max_threshold': '32', 'min_threshold': '4'}
>>>>>> AND compression = {'chunk_length_in_kb': '64', 'class':
>>>>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>>>> AND crc_check_chance = 1.0
>>>>>> AND dclocal_read_repair_chance = 0.1
>>>>>> AND default_time_to_live = 0
>>>>>> AND gc_grace_seconds = 864000
>>>>>> AND max_index_interval = 2048
>>>>>> AND memtable_flush_period_in_ms = 0
>>>>>> AND min_index_interval = 128
>>>>>> AND read_repair_chance = 0.0
>>>>>> AND speculative_retry = '99PERCENTILE';
>>>>>> CREATE CUSTOM INDEX nbf_index ON nbf () USING
>>>>>> 'com.stratio.cassandra.lucene.Index' WITH OPTIONS = {'refresh_seconds':
>>>>>> '1', 'schema': '{
>>>>>> fields : {
>>>>>> id  : {type : "bigint"},
>>>>>> f_d_name : {
>>>>>> type   : "string",
>>>>>> indexed: true,
>>>>>> sorted : false,
>>>>>> validated  : true,
>>>>>> case_sensitive : false
>>>>>> }
>>>>>> }
>>>>>> }'};
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -
>>>>> Atul Saroha
>>>>> *Lead Software Engineer*
>>>>> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
>>>>> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>>>>>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>>>>>
>>>>> On Mon, Jun 13, 2016 at 11:11 PM, Siddharth Verma <
>>>>> verma.siddha...@snapdeal.com> wrote:
>>>>>
>>>>>> No, all rows were not the same.
>>>>>> Querying only on the partition key gives 20 rows.
>>>>>> In the erroneous result, while querying on partition key and
>>>>>> clustering key, we got 16 of those 20 rows.
>>>>>>
>>>>>> And for "*tombstone_threshold"* there isn't any entry at column
>>>>>> family level.
>>>>>>
>>>>>> Thanks,
>>>>>> Siddharth Verma
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Tyler Hobbs
>>>> DataStax <http://datastax.com/>
>>>>
>>>
>>>
>>
>
>
> --
>
> <http://www.datastax.com/>
>
> Joel Knighton
> Cassandra Developer | joel.knigh...@datastax.com
>
> <https://www.linkedin.com/company/datastax>
> <https://www.facebook.com/datastax> <https://twitter.com/datastax>
> <https://plus.google.com/+Datastax/about>
> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>
> <http://cassandrasummit.org/Email_Signature>
>


Re: select query on entire primary key returning more than one row in result

2016-06-14 Thread Bhuvan Rawal
Jira CASSANDRA-12003
 Has
been created for the same.

On Tue, Jun 14, 2016 at 11:54 PM, Atul Saroha 
wrote:

> Hi Tyler,
>
> This issue is mainly visible for tables having static columns, still
> investigating.
> We will try to test after removing lucene index but I don’t think this
> plug-in could led to change in behaviour of cassandra write to table's
> memtable.
>
>
> -
> Atul Saroha
> *Lead Software Engineer*
> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>
> On Tue, Jun 14, 2016 at 9:54 PM, Tyler Hobbs  wrote:
>
>> Is 'id' your partition key? I'm not familiar with the stratio indexes,
>> but it looks like the primary key columns are both indexed.  Perhaps this
>> is related?
>>
>> On Tue, Jun 14, 2016 at 1:25 AM, Atul Saroha 
>> wrote:
>>
>>> After further debug, this issue is found in in-memory memtable as doing
>>> nodetool flush + compact resolve the issue. And there is no batch write
>>> used for this table which is showing issue.
>>> Table properties:
>>>
>>> WITH CLUSTERING ORDER BY (f_name ASC)
 AND bloom_filter_fp_chance = 0.01
 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
 AND comment = ''
 AND compaction = {'class':
 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
 'max_threshold': '32', 'min_threshold': '4'}
 AND compression = {'chunk_length_in_kb': '64', 'class':
 'org.apache.cassandra.io.compress.LZ4Compressor'}
 AND crc_check_chance = 1.0
 AND dclocal_read_repair_chance = 0.1
 AND default_time_to_live = 0
 AND gc_grace_seconds = 864000
 AND max_index_interval = 2048
 AND memtable_flush_period_in_ms = 0
 AND min_index_interval = 128
 AND read_repair_chance = 0.0
 AND speculative_retry = '99PERCENTILE';
 CREATE CUSTOM INDEX nbf_index ON nbf () USING
 'com.stratio.cassandra.lucene.Index' WITH OPTIONS = {'refresh_seconds':
 '1', 'schema': '{
 fields : {
 id  : {type : "bigint"},
 f_d_name : {
 type   : "string",
 indexed: true,
 sorted : false,
 validated  : true,
 case_sensitive : false
 }
 }
 }'};

>>>
>>>
>>>
>>> -
>>> Atul Saroha
>>> *Lead Software Engineer*
>>> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
>>> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>>>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>>>
>>> On Mon, Jun 13, 2016 at 11:11 PM, Siddharth Verma <
>>> verma.siddha...@snapdeal.com> wrote:
>>>
 No, all rows were not the same.
 Querying only on the partition key gives 20 rows.
 In the erroneous result, while querying on partition key and clustering
 key, we got 16 of those 20 rows.

 And for "*tombstone_threshold"* there isn't any entry at column family
 level.

 Thanks,
 Siddharth Verma



>>>
>>
>>
>> --
>> Tyler Hobbs
>> DataStax 
>>
>
>


Re: Installing Cassandra from Tarball

2016-06-13 Thread Bhuvan Rawal
Hi Steve,

Please find the responses in line:

WARN  15:41:58 Unable to lock JVM memory (ENOMEM). This can result in part
> of the JVM being swapped out, especially with mmapped I/O enabled. Increase
> RLIMIT_MEMLOCK or run Cassandra as root.
>
 You can edit -* /etc/security/limits.conf *and put these lines in there

* - memlock unlimited
* - nofile 10
* - nproc 32768
* - as unlimited

and reload the properties by command $ sudo sysctl -p
and then checking it :
$ ulimit -l
and for the cassandra process by :
$ cat /proc//limits
Source - Datastax Troubleshooting


WARN  15:41:58 jemalloc shared library could not be preloaded to speed up
> memory allocations
>
If you want to allocate off heap memory using jemalloc then uncomment this
line in *cassandra-env.sh* and provide appropriate jemalloc path
JVM_OPTS="$JVM_OPTS -Djava.library.path=/lib/"

WARN  15:41:58 JMX is not enabled to receive remote connections. Please see
> cassandra-env.sh for more info.
>
By default JMX is enabled only for local connections, if you want to debug
from a remote machine mark LOCAL_JMX=no  in *cassandra-env.sh*

WARN  15:41:58 Cassandra server running in degraded mode. Is swap disabled?
> : true,  Address space adequate? : true,  nofile limit adequate? : false,
> nproc limit adequate? : false
>
You need to disable swap in order to avoid this message, using swap space
can have serious performance implications. Make sure you disable fstab
entry as well for swap partition.

Thanks & Regards,
Bhuvan


sstabledump failing for system keyspace tables

2016-06-11 Thread Bhuvan Rawal
I have been trying to obtain json dump of batches table using sstabledump
but I get this exception:
$ sstabledump
/sstable/data/system/batches-919a4bc57a333573b03e13fc3f68b465/ma-277-big-Data.db
Exception in thread "main"
org.apache.cassandra.exceptions.ConfigurationException: Cannot use abstract
class 'org.apache.cassandra.dht.LocalPartitioner' as partitioner.
at org.apache.cassandra.utils.FBUtilities.construct(FBUtilities.java:489)
at
org.apache.cassandra.utils.FBUtilities.instanceOrConstruct(FBUtilities.java:461)
at
org.apache.cassandra.utils.FBUtilities.newPartitioner(FBUtilities.java:402)
at
org.apache.cassandra.tools.SSTableExport.metadataFromSSTable(SSTableExport.java:108)
at org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:184)

I further tried Andrew Tolbert's sstable tool but it gives the same
exception.
$ java -jar sstable-tools-3.0.0-alpha4.jar describe
/sstable/data/system/batches-919a4bc57a333573b03e13fc3f68b465/ma-277-big-Data.db
/sstable/data/system/batches-919a4bc57a333573b03e13fc3f68b465/ma-277-big-Data.db

org.apache.cassandra.exceptions.ConfigurationException: Cannot use abstract
class 'org.apache.cassandra.dht.LocalPartitioner' as partitioner.
at org.apache.cassandra.utils.FBUtilities.construct(FBUtilities.java:489)

Any way by which I can figure out the content of batches table?

Thanks & Regards,
Bhuvan


Re: Node Stuck while restarting

2016-05-30 Thread Bhuvan Rawal
We took backup of commitlogs and restarted the node, it started fine. As
the node was down for more than one day we can say for sure that it was
stuck and was not processing.

Wondering how we can tune our settings so as to avoid a similar scenario in
future, possibly not taking a hacky measure.

On Sun, May 29, 2016 at 7:12 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:

> Hi Mike,
>
> PFA the details you asked for: and some others if that helps:
> we are using jvm params
> -Xms8G
> -Xmx8G
>
> MAX_HEAP_SIZE: & HEAP_NEWSIZE: is not being set and possibly calculated
> by calculate_heap_sizes function. (And we are using default calculations):
> here are the results, pls correct me if im wrong :
> system_memory_in_mb : 64544
> system_cpu_cores : 16
>
> for MAX_HEAP_SIZE:
>
> # set max heap size based on the following
> # max(min(1/2 ram, 1024MB), min(1/4 ram, 8GB))
> # calculate 1/2 ram and cap to 1024MB
> # calculate 1/4 ram and cap to 8192MB
> # pick the max
>
> By this I can figure out that MAX_HEAP_SIZE is 8GB - (From the first case
> & third case)
>
> max_sensible_yg_per_core_in_mb="100"
> max_sensible_yg_in_mb=`expr $max_sensible_yg_per_core_in_mb "*"
> $system_cpu_cores` -  100* 16 = 1600 MB
> desired_yg_in_mb=`expr $max_heap_size_in_mb / 4 ---That comes out to
> be- 8GB/4 = 2GB
>
> if [ "$desired_yg_in_mb" -gt "$max_sensible_yg_in_mb" ]
> then
> HEAP_NEWSIZE="${max_sensible_yg_in_mb}M"
> else
> HEAP_NEWSIZE="${desired_yg_in_mb}M"
> fi
>
> That should set HEAP_NEWSIZE to 1600MB by first case.
>
>
> memtable_allocation_type: heap_buffers
>
> memtable_cleanup_threshold- we are using default:
> # memtable_cleanup_threshold defaults to 1 / (memtable_flush_writers + 1)
> # memtable_cleanup_threshold: 0.11
>
> memtable_flush_writers - default (2)
> we can increase this as we are using SSD with IOPS of around 300/s
>
> memtable_heap_space_in_mb - default values
> # memtable_heap_space_in_mb: 2048
> # memtable_offheap_space_in_mb: 2048
>
> We are using G1 garbage collector and jdk1.8.0_45
>
> Best Regards,
>
>
> On Sun, May 29, 2016 at 5:07 PM, Mike Yeap <wkk1...@gmail.com> wrote:
>
>> Hi Bhuvan, how big are your current commit logs in the failed node, and
>> what are the sizes MAX_HEAP_SIZE and HEAP_NEWSIZE?
>>
>> Also the values of following properties in cassandra.yaml??
>>
>> memtable_allocation_type
>> memtable_cleanup_threshold
>> memtable_flush_writers
>> memtable_heap_space_in_mb
>> memtable_offheap_space_in_mb
>>
>>
>> Regards,
>> Mike Yeap
>>
>>
>>
>> On Sun, May 29, 2016 at 6:18 PM, Bhuvan Rawal <bhu1ra...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> We are running a 6 Node cluster in 2 DC on DSC 3.0.3, with 3 Node each.
>>> One of the node was showing UNREACHABLE on other nodes in nodetool
>>> describecluster  and on that node it was showing all others UNREACHABLE and
>>> as a measure we restarted the node.
>>>
>>> But on doing that it is stuck possibly at with these messages in
>>> system.log:
>>>
>>> DEBUG [SlabPoolCleaner] 2016-05-29 14:07:28,156
>>> ColumnFamilyStore.java:829 - Enqueuing flush of batches: 226784704 (11%)
>>> on-heap, 0 (0%) off-heap
>>> DEBUG [main] 2016-05-29 14:07:28,576 CommitLogReplayer.java:415 -
>>> Replaying /commitlog/data/CommitLog-6-1464508993391.log (CL version 6,
>>> messaging version 10, compression null)
>>> DEBUG [main] 2016-05-29 14:07:28,781 ColumnFamilyStore.java:829 -
>>> Enqueuing flush of batches: 207333510 (10%) on-heap, 0 (0%) off-heap
>>>
>>> MemtablePostFlush / MemtableFlushWriter stages where it is stuck with
>>> pending messages.
>>> This has been the status of them as per *nodetool tpstats *for long.
>>> MemtablePostFlush Active - 1pending - 52
>>>   completed - 16
>>> MemtableFlushWriter   Active - 2pending - 13
>>>   completed - 15
>>>
>>>
>>> We restarted the node by setting log level to TRACE but in vain. What
>>> could be a possible contingency plan in such a scenario?
>>>
>>> Best Regards,
>>> Bhuvan
>>>
>>>
>>
>


Re: Node Stuck while restarting

2016-05-29 Thread Bhuvan Rawal
Hi Mike,

PFA the details you asked for: and some others if that helps:
we are using jvm params
-Xms8G
-Xmx8G

MAX_HEAP_SIZE: & HEAP_NEWSIZE: is not being set and possibly calculated
by calculate_heap_sizes function. (And we are using default calculations):
here are the results, pls correct me if im wrong :
system_memory_in_mb : 64544
system_cpu_cores : 16

for MAX_HEAP_SIZE:

# set max heap size based on the following
# max(min(1/2 ram, 1024MB), min(1/4 ram, 8GB))
# calculate 1/2 ram and cap to 1024MB
# calculate 1/4 ram and cap to 8192MB
# pick the max

By this I can figure out that MAX_HEAP_SIZE is 8GB - (From the first case &
third case)

max_sensible_yg_per_core_in_mb="100"
max_sensible_yg_in_mb=`expr $max_sensible_yg_per_core_in_mb "*"
$system_cpu_cores` -  100* 16 = 1600 MB
desired_yg_in_mb=`expr $max_heap_size_in_mb / 4 ---That comes out to
be- 8GB/4 = 2GB

if [ "$desired_yg_in_mb" -gt "$max_sensible_yg_in_mb" ]
then
HEAP_NEWSIZE="${max_sensible_yg_in_mb}M"
else
HEAP_NEWSIZE="${desired_yg_in_mb}M"
fi

That should set HEAP_NEWSIZE to 1600MB by first case.


memtable_allocation_type: heap_buffers

memtable_cleanup_threshold- we are using default:
# memtable_cleanup_threshold defaults to 1 / (memtable_flush_writers + 1)
# memtable_cleanup_threshold: 0.11

memtable_flush_writers - default (2)
we can increase this as we are using SSD with IOPS of around 300/s

memtable_heap_space_in_mb - default values
# memtable_heap_space_in_mb: 2048
# memtable_offheap_space_in_mb: 2048

We are using G1 garbage collector and jdk1.8.0_45

Best Regards,


On Sun, May 29, 2016 at 5:07 PM, Mike Yeap <wkk1...@gmail.com> wrote:

> Hi Bhuvan, how big are your current commit logs in the failed node, and
> what are the sizes MAX_HEAP_SIZE and HEAP_NEWSIZE?
>
> Also the values of following properties in cassandra.yaml??
>
> memtable_allocation_type
> memtable_cleanup_threshold
> memtable_flush_writers
> memtable_heap_space_in_mb
> memtable_offheap_space_in_mb
>
>
> Regards,
> Mike Yeap
>
>
>
> On Sun, May 29, 2016 at 6:18 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>
>> Hi,
>>
>> We are running a 6 Node cluster in 2 DC on DSC 3.0.3, with 3 Node each.
>> One of the node was showing UNREACHABLE on other nodes in nodetool
>> describecluster  and on that node it was showing all others UNREACHABLE and
>> as a measure we restarted the node.
>>
>> But on doing that it is stuck possibly at with these messages in
>> system.log:
>>
>> DEBUG [SlabPoolCleaner] 2016-05-29 14:07:28,156
>> ColumnFamilyStore.java:829 - Enqueuing flush of batches: 226784704 (11%)
>> on-heap, 0 (0%) off-heap
>> DEBUG [main] 2016-05-29 14:07:28,576 CommitLogReplayer.java:415 -
>> Replaying /commitlog/data/CommitLog-6-1464508993391.log (CL version 6,
>> messaging version 10, compression null)
>> DEBUG [main] 2016-05-29 14:07:28,781 ColumnFamilyStore.java:829 -
>> Enqueuing flush of batches: 207333510 (10%) on-heap, 0 (0%) off-heap
>>
>> MemtablePostFlush / MemtableFlushWriter stages where it is stuck with
>> pending messages.
>> This has been the status of them as per *nodetool tpstats *for long.
>> MemtablePostFlush Active - 1pending - 52
>>   completed - 16
>> MemtableFlushWriter   Active - 2pending - 13
>>   completed - 15
>>
>>
>> We restarted the node by setting log level to TRACE but in vain. What
>> could be a possible contingency plan in such a scenario?
>>
>> Best Regards,
>> Bhuvan
>>
>>
>


Node Stuck while restarting

2016-05-29 Thread Bhuvan Rawal
Hi,

We are running a 6 Node cluster in 2 DC on DSC 3.0.3, with 3 Node each. One
of the node was showing UNREACHABLE on other nodes in nodetool
describecluster  and on that node it was showing all others UNREACHABLE and
as a measure we restarted the node.

But on doing that it is stuck possibly at with these messages in system.log:

DEBUG [SlabPoolCleaner] 2016-05-29 14:07:28,156 ColumnFamilyStore.java:829
- Enqueuing flush of batches: 226784704 (11%) on-heap, 0 (0%) off-heap
DEBUG [main] 2016-05-29 14:07:28,576 CommitLogReplayer.java:415 - Replaying
/commitlog/data/CommitLog-6-1464508993391.log (CL version 6, messaging
version 10, compression null)
DEBUG [main] 2016-05-29 14:07:28,781 ColumnFamilyStore.java:829 - Enqueuing
flush of batches: 207333510 (10%) on-heap, 0 (0%) off-heap

MemtablePostFlush / MemtableFlushWriter stages where it is stuck with
pending messages.
This has been the status of them as per *nodetool tpstats *for long.
MemtablePostFlush Active - 1pending - 52
completed - 16
MemtableFlushWriter   Active - 2pending - 13
completed - 15


We restarted the node by setting log level to TRACE but in vain. What could
be a possible contingency plan in such a scenario?

Best Regards,
Bhuvan


Re: Increasing replication factor and repair doesn't seem to work

2016-05-24 Thread Bhuvan Rawal
For the other DC, it can be acceptable because partition reside on one
node, so say  if you have a large partition, it may skew things a bit.
On May 25, 2016 2:41 AM, "Luke Jolly" <l...@getadmiral.com> wrote:

> So I guess the problem may have been with the initial addition of the
> 10.128.0.20 node because when I added it in it never synced data I
> guess?  It was at around 50 MB when it first came up and transitioned to
> "UN". After it was in I did the 1->2 replication change and tried repair
> but it didn't fix it.  From what I can tell all the data on it is stuff
> that has been written since it came up.  We never delete data ever so we
> should have zero tombstones.
>
> If I am not mistaken, only two of my nodes actually have all the data,
> 10.128.0.3 and 10.142.0.14 since they agree on the data amount. 10.142.0.13
> is almost a GB lower and then of course 10.128.0.20 which is missing over
> 5 GB of data.  I tried running nodetool -local on both DCs and it didn't
> fix either one.
>
> Am I running into a bug of some kind?
>
> On Tue, May 24, 2016 at 4:06 PM Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>
>> Hi Luke,
>>
>> You mentioned that replication factor was increased from 1 to 2. In that
>> case was the node bearing ip 10.128.0.20 carried around 3GB data earlier?
>>
>> You can run nodetool repair with option -local to initiate repair local
>> datacenter for gce-us-central1.
>>
>> Also you may suspect that if a lot of data was deleted while the node was
>> down it may be having a lot of tombstones which is not needed to be
>> replicated to the other node. In order to verify the same, you can issue a
>> select count(*) query on column families (With the amount of data you have
>> it should not be an issue) with tracing on and with consistency local_all
>> by connecting to either 10.128.0.3  or 10.128.0.20 and store it in a
>> file. It will give you a fair amount of idea about how many deleted cells
>> the nodes have. I tried searching for reference if tombstones are moved
>> around during repair, but I didnt find evidence of it. However I see no
>> reason to because if the node didnt have data then streaming tombstones
>> does not make a lot of sense.
>>
>> Regards,
>> Bhuvan
>>
>> On Tue, May 24, 2016 at 11:06 PM, Luke Jolly <l...@getadmiral.com> wrote:
>>
>>> Here's my setup:
>>>
>>> Datacenter: gce-us-central1
>>> ===
>>> Status=Up/Down
>>> |/ State=Normal/Leaving/Joining/Moving
>>> --  Address  Load   Tokens   Owns (effective)  Host ID
>>> Rack
>>> UN  10.128.0.3   6.4 GB 256  100.0%
>>>  3317a3de-9113-48e2-9a85-bbf756d7a4a6  default
>>> UN  10.128.0.20  943.08 MB  256  100.0%
>>>  958348cb-8205-4630-8b96-0951bf33f3d3  default
>>> Datacenter: gce-us-east1
>>> 
>>> Status=Up/Down
>>> |/ State=Normal/Leaving/Joining/Moving
>>> --  Address  Load   Tokens   Owns (effective)  Host ID
>>> Rack
>>> UN  10.142.0.14  6.4 GB 256  100.0%
>>>  c3a5c39d-e1c9-4116-903d-b6d1b23fb652  default
>>> UN  10.142.0.13  5.55 GB256  100.0%
>>>  d0d9c30e-1506-4b95-be64-3dd4d78f0583  default
>>>
>>> And my replication settings are:
>>>
>>> {'class': 'NetworkTopologyStrategy', 'aws-us-west': '2',
>>> 'gce-us-central1': '2', 'gce-us-east1': '2'}
>>>
>>> As you can see 10.128.0.20 in the gce-us-central1 DC only has a load of
>>> 943 MB even though it's supposed to own 100% and should have 6.4 GB.  Also 
>>> 10.142.0.13
>>> seems also not to have everything as it only has a load of 5.55 GB.
>>>
>>> On Mon, May 23, 2016 at 7:28 PM, kurt Greaves <k...@instaclustr.com>
>>> wrote:
>>>
>>>> Do you have 1 node in each DC or 2? If you're saying you have 1 node in
>>>> each DC then a RF of 2 doesn't make sense. Can you clarify on what your set
>>>> up is?
>>>>
>>>> On 23 May 2016 at 19:31, Luke Jolly <l...@getadmiral.com> wrote:
>>>>
>>>>> I am running 3.0.5 with 2 nodes in two DCs, gce-us-central1 and
>>>>> gce-us-east1.  I increased the replication factor of gce-us-central1 from 
>>>>> 1
>>>>> to 2.  Then I ran 'nodetool repair -dc gce-us-central1'.  The "Owns"
>>>>> for the node switched to 100% as it should but the Load showed that it
>>>>> didn't actually sync the data.  I then ran a full 'nodetool repair' and it
>>>>> didn't fix it still.  This scares me as I thought 'nodetool repair' was a
>>>>> way to assure consistency and that all the nodes were synced but it 
>>>>> doesn't
>>>>> seem to be.  Outside of that command, I have no idea how I would assure 
>>>>> all
>>>>> the data was synced or how to get the data correctly synced without
>>>>> decommissioning the node and re-adding it.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Kurt Greaves
>>>> k...@instaclustr.com
>>>> www.instaclustr.com
>>>>
>>>
>>>
>>


Re: Increasing replication factor and repair doesn't seem to work

2016-05-24 Thread Bhuvan Rawal
Hi Luke,

You mentioned that replication factor was increased from 1 to 2. In that
case was the node bearing ip 10.128.0.20 carried around 3GB data earlier?

You can run nodetool repair with option -local to initiate repair local
datacenter for gce-us-central1.

Also you may suspect that if a lot of data was deleted while the node was
down it may be having a lot of tombstones which is not needed to be
replicated to the other node. In order to verify the same, you can issue a
select count(*) query on column families (With the amount of data you have
it should not be an issue) with tracing on and with consistency local_all
by connecting to either 10.128.0.3  or 10.128.0.20 and store it in a file.
It will give you a fair amount of idea about how many deleted cells the
nodes have. I tried searching for reference if tombstones are moved around
during repair, but I didnt find evidence of it. However I see no reason to
because if the node didnt have data then streaming tombstones does not make
a lot of sense.

Regards,
Bhuvan

On Tue, May 24, 2016 at 11:06 PM, Luke Jolly  wrote:

> Here's my setup:
>
> Datacenter: gce-us-central1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address  Load   Tokens   Owns (effective)  Host ID
>   Rack
> UN  10.128.0.3   6.4 GB 256  100.0%
>  3317a3de-9113-48e2-9a85-bbf756d7a4a6  default
> UN  10.128.0.20  943.08 MB  256  100.0%
>  958348cb-8205-4630-8b96-0951bf33f3d3  default
> Datacenter: gce-us-east1
> 
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address  Load   Tokens   Owns (effective)  Host ID
>   Rack
> UN  10.142.0.14  6.4 GB 256  100.0%
>  c3a5c39d-e1c9-4116-903d-b6d1b23fb652  default
> UN  10.142.0.13  5.55 GB256  100.0%
>  d0d9c30e-1506-4b95-be64-3dd4d78f0583  default
>
> And my replication settings are:
>
> {'class': 'NetworkTopologyStrategy', 'aws-us-west': '2',
> 'gce-us-central1': '2', 'gce-us-east1': '2'}
>
> As you can see 10.128.0.20 in the gce-us-central1 DC only has a load of
> 943 MB even though it's supposed to own 100% and should have 6.4 GB.  Also 
> 10.142.0.13
> seems also not to have everything as it only has a load of 5.55 GB.
>
> On Mon, May 23, 2016 at 7:28 PM, kurt Greaves 
> wrote:
>
>> Do you have 1 node in each DC or 2? If you're saying you have 1 node in
>> each DC then a RF of 2 doesn't make sense. Can you clarify on what your set
>> up is?
>>
>> On 23 May 2016 at 19:31, Luke Jolly  wrote:
>>
>>> I am running 3.0.5 with 2 nodes in two DCs, gce-us-central1 and
>>> gce-us-east1.  I increased the replication factor of gce-us-central1 from 1
>>> to 2.  Then I ran 'nodetool repair -dc gce-us-central1'.  The "Owns"
>>> for the node switched to 100% as it should but the Load showed that it
>>> didn't actually sync the data.  I then ran a full 'nodetool repair' and it
>>> didn't fix it still.  This scares me as I thought 'nodetool repair' was a
>>> way to assure consistency and that all the nodes were synced but it doesn't
>>> seem to be.  Outside of that command, I have no idea how I would assure all
>>> the data was synced or how to get the data correctly synced without
>>> decommissioning the node and re-adding it.
>>>
>>
>>
>>
>> --
>> Kurt Greaves
>> k...@instaclustr.com
>> www.instaclustr.com
>>
>
>


Blocking read repair giving consistent data but not repairing existing data

2016-05-12 Thread Bhuvan Rawal
Hi,

We are using dsc 3.0.3 on total of *6 Nodes*,* 2 DC's, 3 Node Each, RF-3*
so every node has complete data. Now we are facing a situation on a table
with 1 partition key, 2 clustering columns and 4 normal columns.

Out of the 6 5 nodes has a single value and Partition key, 2 clustering key
for a row but 3 other normal values are null.

When doing consistency level all query we get complete view of the row and
in the tracing output it says that inconsistency found in digest and read
repair is sent out to the nodes.
<*Exact error in tracing : Digest mismatch*:
org.apache.cassandra.service.DigestMismatchException: Mismatch for key
DecoratedKey(-9222324572206777856, 53444c393233363439393233)
(c421efa89ea3435c153617a34c08f396 vs 51a7e02f9e5e93520f56541ed6730558>

But on doing another read with reduced consistency the output received is
not repaired.

We are speculating that the node which is having complete view of the row
was down when delete happened on that row for more than 3 hrs (default hint
window).  We had not enabled read repair within gc grace period so possibly
the 3 deleted cells have come alive on that node, but in that case:

1. if the consistency level all query is giving complete row view then why
isnt it reflected on other nodes.
2. *read_repair_chance* is 0.0 but *dclocal_read_repair_chance* = 0.1 on
the table  (default configurations) and I tried the query with Local one on
all the servers to fulfill that probability (35-40 times on every server).

Therefore what could be the possibility that blocking and non blocking read
repair is not working? Could Full read repair fix it? Possible bug in
dsc3.0.3 which could be fixed in later version?

Any assistance on this will be welcome as this appears to be one off
scenario. I can provide complete cqlsh tracing log for consistency
local_one read query and consistency all query if required.

C*eers,
Bhuvan


Re: A question to 'paging' support in DataStax java driver

2016-05-09 Thread Bhuvan Rawal
Hi Doan,

What does it have to do being eventual consistency? Lets assume a scenario
with complete consistency and we are at page X, and at the same time some
inserts/updates happened at page X-2 and we jumped to that.
User will see inconsistent page in that case as well, right? Also in such
cases how would you design a user facing application (Cache previous pages
at app level?)

Regards,
Bhuvan

On Mon, May 9, 2016 at 4:18 PM, DuyHai Doan  wrote:

> "Is it possible to just return PagingState object without returning
> data?" --> No
>
> Simply because before reading the actual data for each page of N rows, you
> cannot know at which token value a page of data starts...
>
> And it is worst than that, with paging you don't have any isolation. Let's
> suppose you keep in your application/web front-end the paging states for
> page 1, 2 and 3. Since there are concurrent inserts on the cluster at the
> same time, when you re-use the paging state 2 for example, you may not get
> the same results as the previous read.
>
> And it is inevitable in an eventual consistent distributed DB world
>
> On Mon, May 9, 2016 at 12:25 PM, Lu, Boying  wrote:
>
>> dHi, All,
>>
>>
>>
>> We are considering to use DataStax java driver in our codes. One
>> important feature provided by the driver we want to use is ‘paging’.
>>
>> But according to the
>> https://datastax.github.io/java-driver/3.0.0/manual/paging/, it seems
>> that we can’t jump between pages.
>>
>>
>>
>> Is it possible to just return PagingState object without returning data?
>> e.g.  If I want to jump to the page 5 from the page 1,
>>
>> I need to go through each page from page 1 to page 5,  Is it possible to
>> just return the PagingState object of page 1, 2, 3 and 4 without
>>
>> actual data of each page? This can save some bandwidth at least.
>>
>>
>>
>> Thanks in advance.
>>
>>
>>
>> Boying
>>
>>
>>
>>
>>
>
>


Re: Hi Memory consumption with Copy command

2016-04-23 Thread Bhuvan Rawal
I built cython and disabled bundled driver, the performance has been
impressive. Memory issue is resolved and Im currently getting around
100,000 rows per second, its stressing both the client CPU as well as
cassandra nodes. Thats the fastest I have ever seen it perform. With 60
Million rows already transferred in ~5 Minutes.

Just a final question before we close this thread, at this performance
level would you recommend sstable loader or copy command?

On Sat, Apr 23, 2016 at 2:00 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:

> Thanks Stefania for the informative answer.  The next blog was pretty
> useful as well:
> http://www.datastax.com/dev/blog/how-we-optimized-cassandra-cqlsh-copy-from
> . Ill upgrade to 3.0.5 and test with C extensions enabled and report on
> this thread.
>
> On Sat, Apr 23, 2016 at 8:54 AM, Stefania Alborghetti <
> stefania.alborghe...@datastax.com> wrote:
>
>> Hi Bhuvan
>>
>> Support for large datasets in COPY FROM was added by CASSANDRA-11053
>> <https://issues.apache.org/jira/browse/CASSANDRA-11053>, which is
>> available in 2.1.14, 2.2.6, 3.0.5 and 3.5. Your scenario is valid with this
>> patch applied.
>>
>> The 3.0.x and 3.x releases are already available, whilst the other two
>> releases are due in the next few days. You only need to install an
>> up-to-date release on the machine where COPY FROM is running.
>>
>> You may find the setup instructions in this blog
>> <http://www.datastax.com/dev/blog/six-parameters-affecting-cqlsh-copy-from-performance>
>> interesting. Specifically, for large datasets, I would highly recommend
>> installing the Python driver with C extensions, as it will speed things up
>> considerably. Again, this is only possible with the 11053 patch. Please
>> ignore the suggestion to also compile the cqlsh copy module itself with C
>> extensions (Cython), as you may hit CASSANDRA-11574
>> <https://issues.apache.org/jira/browse/CASSANDRA-11574> in the 3.0.5 and
>> 3.5 releases.
>>
>> Before CASSANDRA-11053, the parent process was a bottleneck. This is
>> explained further in  this blog
>> <http://www.datastax.com/dev/blog/how-we-optimized-cassandra-cqlsh-copy-from>,
>> second paragraph in the "worker processes" section. As a workaround, if you
>> are unable to upgrade, you may try reducing the INGESTRATE and introducing
>> a few extra worker processes via NUMPROCESSES. Also, the parent process is
>> overloaded and is therefore not able to report progress correctly.
>> Therefore, if the progress report is frozen, it doesn't mean the COPY
>> OPERATION is not making progress.
>>
>> Do let us know if you still have problems, as this is new functionality.
>>
>> With best regards,
>> Stefania
>>
>>
>> On Sat, Apr 23, 2016 at 6:34 AM, Bhuvan Rawal <bhu1ra...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Im trying to copy a 20 GB CSV file into a 3 node fresh cassandra cluster
>>> with 32 GB memory each, sufficient disk, RF-1 and durable write false. The
>>> machine im feeding into is external to the cluster and shares 1GBps line
>>> and has 16 GB RAM. (We have chosen this setup to possibly reduce CPU and IO
>>> usage).
>>>
>>> Im trying to use COPY command to feed in data. It kicks off well,
>>> launches a set of processes, does about 50,000 rows per second. But I can
>>> see that the parent process starts aggregating memory almost of the size of
>>> data processed and after a point the processes just hang. The parent
>>> process was consuming 95% system memory when it had processed around 60%
>>> data.
>>>
>>> I had earlier tried to feed in data from multiple files (Less than 4GB
>>> each) and it was working as expected.
>>>
>>> Is it a valid scenario?
>>>
>>> Regards,
>>> Bhuvan
>>>
>>
>>
>>
>> --
>>
>>
>> [image: datastax_logo.png] <http://www.datastax.com/>
>>
>> Stefania Alborghetti
>>
>> Apache Cassandra Software Engineer
>>
>> |+852 6114 9265| stefania.alborghe...@datastax.com
>>
>>
>> [image: cassandrasummit.org/Email_Signature]
>> <http://cassandrasummit.org/Email_Signature>
>>
>
>


Re: Hi Memory consumption with Copy command

2016-04-23 Thread Bhuvan Rawal
Thanks Stefania for the informative answer.  The next blog was pretty
useful as well:
http://www.datastax.com/dev/blog/how-we-optimized-cassandra-cqlsh-copy-from
. Ill upgrade to 3.0.5 and test with C extensions enabled and report on
this thread.

On Sat, Apr 23, 2016 at 8:54 AM, Stefania Alborghetti <
stefania.alborghe...@datastax.com> wrote:

> Hi Bhuvan
>
> Support for large datasets in COPY FROM was added by CASSANDRA-11053
> <https://issues.apache.org/jira/browse/CASSANDRA-11053>, which is
> available in 2.1.14, 2.2.6, 3.0.5 and 3.5. Your scenario is valid with this
> patch applied.
>
> The 3.0.x and 3.x releases are already available, whilst the other two
> releases are due in the next few days. You only need to install an
> up-to-date release on the machine where COPY FROM is running.
>
> You may find the setup instructions in this blog
> <http://www.datastax.com/dev/blog/six-parameters-affecting-cqlsh-copy-from-performance>
> interesting. Specifically, for large datasets, I would highly recommend
> installing the Python driver with C extensions, as it will speed things up
> considerably. Again, this is only possible with the 11053 patch. Please
> ignore the suggestion to also compile the cqlsh copy module itself with C
> extensions (Cython), as you may hit CASSANDRA-11574
> <https://issues.apache.org/jira/browse/CASSANDRA-11574> in the 3.0.5 and
> 3.5 releases.
>
> Before CASSANDRA-11053, the parent process was a bottleneck. This is
> explained further in  this blog
> <http://www.datastax.com/dev/blog/how-we-optimized-cassandra-cqlsh-copy-from>,
> second paragraph in the "worker processes" section. As a workaround, if you
> are unable to upgrade, you may try reducing the INGESTRATE and introducing
> a few extra worker processes via NUMPROCESSES. Also, the parent process is
> overloaded and is therefore not able to report progress correctly.
> Therefore, if the progress report is frozen, it doesn't mean the COPY
> OPERATION is not making progress.
>
> Do let us know if you still have problems, as this is new functionality.
>
> With best regards,
> Stefania
>
>
> On Sat, Apr 23, 2016 at 6:34 AM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>
>> Hi,
>>
>> Im trying to copy a 20 GB CSV file into a 3 node fresh cassandra cluster
>> with 32 GB memory each, sufficient disk, RF-1 and durable write false. The
>> machine im feeding into is external to the cluster and shares 1GBps line
>> and has 16 GB RAM. (We have chosen this setup to possibly reduce CPU and IO
>> usage).
>>
>> Im trying to use COPY command to feed in data. It kicks off well,
>> launches a set of processes, does about 50,000 rows per second. But I can
>> see that the parent process starts aggregating memory almost of the size of
>> data processed and after a point the processes just hang. The parent
>> process was consuming 95% system memory when it had processed around 60%
>> data.
>>
>> I had earlier tried to feed in data from multiple files (Less than 4GB
>> each) and it was working as expected.
>>
>> Is it a valid scenario?
>>
>> Regards,
>> Bhuvan
>>
>
>
>
> --
>
>
> [image: datastax_logo.png] <http://www.datastax.com/>
>
> Stefania Alborghetti
>
> Apache Cassandra Software Engineer
>
> |+852 6114 9265| stefania.alborghe...@datastax.com
>
>
> [image: cassandrasummit.org/Email_Signature]
> <http://cassandrasummit.org/Email_Signature>
>


Re: Balancing tokens over 2 datacenter

2016-04-13 Thread Bhuvan Rawal
This could be because of the way you have configured the policy, have a
look at the below links for configuring the policy

https://datastax.github.io/python-driver/api/cassandra/policies.html

http://stackoverflow.com/questions/22813045/ability-to-write-to-a-particular-cassandra-node

Regards,
Bhuvan

On Wed, Apr 13, 2016 at 6:54 PM, Walsh, Stephen 
wrote:

> Hi there,
>
> So we have 2 datacenter with 3 nodes each.
> Replication factor is 3 per DC (so each node has all data)
>
> We have an application in each DC that writes that Cassandra DC.
>
> Now, due to a miss configuration in our application, we saw that our
> application in both DC’s where pointing to DC1.
>
> As such, all keyspaces and tables where created on DC1.
> The effect of this is that all reads are now going to DC1 and ignoring DC2
>
> WE’ve tried doing , nodetool repair / cleanup – but the reads always go to
> DC1?
>
> Anyone know how to rebalance the tokens over DC’s?
>
>
> Regards
> Steve
>
>
> P.S I know about this article
> http://www.datastax.com/dev/blog/balancing-your-cassandra-cluster
> But its doesn’t answer my question regarding 2 DC’s token balancing
>
> This email (including any attachments) is proprietary to Aspect Software,
> Inc. and may contain information that is confidential. If you have received
> this message in error, please do not read, copy or forward this message.
> Please notify the sender immediately, delete it from your system and
> destroy any copies. You may not further disclose or distribute this email
> or its attachments.
>


Creation of Async Datacenter for Monitoring purposes and Engineering Services purpose

2016-04-13 Thread Bhuvan Rawal
Hi All,

We have 2 Running Datacenters in physically seperate DC's with 3 Nodes
each. There is a requirement of an Audit DC for issuing queries which will
not be concerned with live application traffic. Live Data delay of 1-2
Hours is acceptable. It is essential that replication to this DC to not
impact the other 2 data centers.

Is there a way to copy data in async fashion to new DC3 so that it doesnt
impact existing DC's, possibly without querying (using sstable/commitlog).

Regards,
Bhuvan


Re: Can we lengthy big column names in cassandra 3.0.3

2016-03-30 Thread Bhuvan Rawal
It has been discussed in past in
https://issues.apache.org/jira/browse/CASSANDRA-4175.

I believe it is fixed in
https://issues.apache.org/jira/browse/CASSANDRA-8099, though we have not
evaluated the performance. Will be glad if someone can reply with
benchmarks.

On Wed, Mar 30, 2016 at 4:49 PM, Atul Saroha 
wrote:

> Hi,
>
>
> Some time back, I had seen a Jira which tells me that the limitation to
> use small column names for performance benefit is no longer valid. Now
> cassandra generate some unique identifier for each column name in the
> table. So larger table name and column names are no longer an issue for
> performance and data storage.
>
> Is this true for cassandra 3.0.3?
> Does anyone knows that Jira number as i missed it?
>
> -
> Atul Saroha
> *Sr. Software Engineer*
> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>


Re: Which version of Cassandra 3.x is production ready.

2016-03-16 Thread Bhuvan Rawal
This has been discussed in the past :

https://www.mail-archive.com/user@cassandra.apache.org/msg45990.html

This link should be useful for your case:
https://www.eventbrite.com/engineering/what-version-of-cassandra
-should-i-run/

3.0.4 comes with a ton of features from 2.1.x which is considered to be
most stable. Datastax Enterprise is using 2.1.13.

Version 2.1.x and 2.2.x will be only supported till Nov 2016. In my opinion
you would be better off opting for 3.0.4 version.

Best Regards,
Bhuvan

On Wed, Mar 16, 2016 at 11:34 AM, Prakash Chauhan <
prakash.chau...@ericsson.com> wrote:

> Hello,
>
>
>
> Is Cassandra 3.x production-ready ?
>
> Which version of Cassandra 3.x is stable and production-ready ?
>
>
>
> Regards.
>


Re: How to create an additional cluster in Cassandra exclusively for Analytics Purpose

2016-03-07 Thread Bhuvan Rawal
Thanks for the correction Jon. (Atmost 2000 queries *per cluster* for
serving 100 searches.)

On Mon, Mar 7, 2016 at 11:47 PM, Jonathan Haddad <j...@jonhaddad.com> wrote:

> If you're doing 100 searches a second each machine will be serving at most
> 100 requests per second, not 2000.
>
> On Mon, Mar 7, 2016 at 10:13 AM Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>
>> Well thats certainly true, there are these points worth discussing here :
>>
>> 1. Scatter Gather queries - Especially if the cluster size is large. Say
>> we have a 20 node cluster, and we are searching 100 times a second. then
>> effectively coordinator would be hitting each node 2000 times (20*100) That
>> factor will only increase as the number of node goes higher. Im sure having
>> a centralized index alleviates that problem.
>> 2. High Cardinality (For columns like email / phone number)
>> 3. Low Cardinality (Boolean column or any column with limited set of
>> available options).
>>
>> SASI seems to be a good solution for Like queries this doc
>> <https://github.com/apache/cassandra/blob/trunk/doc/SASI.md> looks
>> really promising. But wouldn't it be better to tackle the use cases of
>> search differently than from data storage ones, from a design standpoint?
>>
>> On Sun, Mar 6, 2016 at 9:14 PM, Jack Krupansky <jack.krupan...@gmail.com>
>> wrote:
>>
>>> I don't have any direct personal experience with Stratio. It will all
>>> depend on your queries and your data cardinality - some queries are fine
>>> with secondary indexes while other are quite poor. Ditto for Lucene and
>>> Solr.
>>>
>>> It is also worth noting that the new SASI feature of Cassandra supports
>>> keyword and prefix/suffix search. But it doesn't support multi-column ad
>>> hoc queries, which is what people tend to use Lucene and Solr for. So,
>>> again, it all depends on your queries and your data cardinality.
>>>
>>> -- Jack Krupansky
>>>
>>> On Sun, Mar 6, 2016 at 1:29 AM, Bhuvan Rawal <bhu1ra...@gmail.com>
>>> wrote:
>>>
>>>> Yes Jack, we are rolling out with Stratio right now, we will assess the
>>>> performance benefit it yields and can go for ElasticSearch/Solr later.
>>>>
>>>> As per your experience how does Stratio perform vis-a-vis Secondary
>>>> Indexes?
>>>>
>>>> On Sun, Mar 6, 2016 at 11:15 AM, Jack Krupansky <
>>>> jack.krupan...@gmail.com> wrote:
>>>>
>>>>> You haven't been clear about how you intend to add Solr. You can also
>>>>> use Stratio or Stargate for basic Lucene search if you don't want need 
>>>>> full
>>>>> Solr support and want to stick to open source rather than go with DSE
>>>>> Search for Solr.
>>>>>
>>>>> -- Jack Krupansky
>>>>>
>>>>> On Sun, Mar 6, 2016 at 12:25 AM, Bhuvan Rawal <bhu1ra...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks Sean and Nirmallaya.
>>>>>>
>>>>>> @Jack, We are going with DSC right now and plan to use spark and
>>>>>> later solr over the analytics DC. The use case is to have  olap and oltp
>>>>>> workloads separated and not intertwine them, whether it is achieved by
>>>>>> creating a new DC or a new cluster altogether. From Nirmallaya's and 
>>>>>> Sean's
>>>>>> answer I could understand that its easily achievable by creating a 
>>>>>> separate
>>>>>> DC, app client will need to be made DC aware and it should not make a
>>>>>> coordinator in dc3. And same goes for spark configuration, it should read
>>>>>> from 3rd DC. Correct me if I'm wrong.
>>>>>>
>>>>>> On Mar 4, 2016 7:55 PM, "Jack Krupansky" <jack.krupan...@gmail.com>
>>>>>> wrote:
>>>>>> >
>>>>>> > DataStax Enterprise (DSE) should be fine for three or even four
>>>>>> data centers in the same cluster. Or are you talking about some custom 
>>>>>> Solr
>>>>>> implementation?
>>>>>> >
>>>>>> > -- Jack Krupansky
>>>>>> >
>>>>>> > On Fri, Mar 4, 2016 at 9:21 AM, <sean_r_dur...@homedepot.com>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> Sure. Just add a new DC. Alter your keyspac

Re: How to create an additional cluster in Cassandra exclusively for Analytics Purpose

2016-03-07 Thread Bhuvan Rawal
Well thats certainly true, there are these points worth discussing here :

1. Scatter Gather queries - Especially if the cluster size is large. Say we
have a 20 node cluster, and we are searching 100 times a second. then
effectively coordinator would be hitting each node 2000 times (20*100) That
factor will only increase as the number of node goes higher. Im sure having
a centralized index alleviates that problem.
2. High Cardinality (For columns like email / phone number)
3. Low Cardinality (Boolean column or any column with limited set of
available options).

SASI seems to be a good solution for Like queries this doc
<https://github.com/apache/cassandra/blob/trunk/doc/SASI.md> looks really
promising. But wouldn't it be better to tackle the use cases of search
differently than from data storage ones, from a design standpoint?

On Sun, Mar 6, 2016 at 9:14 PM, Jack Krupansky <jack.krupan...@gmail.com>
wrote:

> I don't have any direct personal experience with Stratio. It will all
> depend on your queries and your data cardinality - some queries are fine
> with secondary indexes while other are quite poor. Ditto for Lucene and
> Solr.
>
> It is also worth noting that the new SASI feature of Cassandra supports
> keyword and prefix/suffix search. But it doesn't support multi-column ad
> hoc queries, which is what people tend to use Lucene and Solr for. So,
> again, it all depends on your queries and your data cardinality.
>
> -- Jack Krupansky
>
> On Sun, Mar 6, 2016 at 1:29 AM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>
>> Yes Jack, we are rolling out with Stratio right now, we will assess the
>> performance benefit it yields and can go for ElasticSearch/Solr later.
>>
>> As per your experience how does Stratio perform vis-a-vis Secondary
>> Indexes?
>>
>> On Sun, Mar 6, 2016 at 11:15 AM, Jack Krupansky <jack.krupan...@gmail.com
>> > wrote:
>>
>>> You haven't been clear about how you intend to add Solr. You can also
>>> use Stratio or Stargate for basic Lucene search if you don't want need full
>>> Solr support and want to stick to open source rather than go with DSE
>>> Search for Solr.
>>>
>>> -- Jack Krupansky
>>>
>>> On Sun, Mar 6, 2016 at 12:25 AM, Bhuvan Rawal <bhu1ra...@gmail.com>
>>> wrote:
>>>
>>>> Thanks Sean and Nirmallaya.
>>>>
>>>> @Jack, We are going with DSC right now and plan to use spark and later
>>>> solr over the analytics DC. The use case is to have  olap and oltp
>>>> workloads separated and not intertwine them, whether it is achieved by
>>>> creating a new DC or a new cluster altogether. From Nirmallaya's and Sean's
>>>> answer I could understand that its easily achievable by creating a separate
>>>> DC, app client will need to be made DC aware and it should not make a
>>>> coordinator in dc3. And same goes for spark configuration, it should read
>>>> from 3rd DC. Correct me if I'm wrong.
>>>>
>>>> On Mar 4, 2016 7:55 PM, "Jack Krupansky" <jack.krupan...@gmail.com>
>>>> wrote:
>>>> >
>>>> > DataStax Enterprise (DSE) should be fine for three or even four data
>>>> centers in the same cluster. Or are you talking about some custom Solr
>>>> implementation?
>>>> >
>>>> > -- Jack Krupansky
>>>> >
>>>> > On Fri, Mar 4, 2016 at 9:21 AM, <sean_r_dur...@homedepot.com> wrote:
>>>> >>
>>>> >> Sure. Just add a new DC. Alter your keyspaces with a new replication
>>>> factor for that DC. Run repairs on the new DC to get the data streamed.
>>>> Then make sure your clients only connect to the DC(s) that they need.
>>>> >>
>>>> >>
>>>> >>
>>>> >> Separation of workloads is one of the key powers of a Cassandra
>>>> cluster.
>>>> >>
>>>> >>
>>>> >>
>>>> >> You may want to look at different configurations for the analytics
>>>> cluster – smaller replication factor, more memory per node, more disk per
>>>> node, perhaps less vnodes. Others may chime in with their experience.
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> Sean Durity
>>>> >>
>>>> >>
>>>> >>
>>>> >> From: Bhuvan Rawal [mailto:bhu1ra...@gmail.com]
>>>> >> Sent: Friday, March 04, 2016 3:27 AM
>>>> &

Re: How to create an additional cluster in Cassandra exclusively for Analytics Purpose

2016-03-05 Thread Bhuvan Rawal
Yes Jack, we are rolling out with Stratio right now, we will assess the
performance benefit it yields and can go for ElasticSearch/Solr later.

As per your experience how does Stratio perform vis-a-vis Secondary Indexes?

On Sun, Mar 6, 2016 at 11:15 AM, Jack Krupansky <jack.krupan...@gmail.com>
wrote:

> You haven't been clear about how you intend to add Solr. You can also use
> Stratio or Stargate for basic Lucene search if you don't want need full
> Solr support and want to stick to open source rather than go with DSE
> Search for Solr.
>
> -- Jack Krupansky
>
> On Sun, Mar 6, 2016 at 12:25 AM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>
>> Thanks Sean and Nirmallaya.
>>
>> @Jack, We are going with DSC right now and plan to use spark and later
>> solr over the analytics DC. The use case is to have  olap and oltp
>> workloads separated and not intertwine them, whether it is achieved by
>> creating a new DC or a new cluster altogether. From Nirmallaya's and Sean's
>> answer I could understand that its easily achievable by creating a separate
>> DC, app client will need to be made DC aware and it should not make a
>> coordinator in dc3. And same goes for spark configuration, it should read
>> from 3rd DC. Correct me if I'm wrong.
>>
>> On Mar 4, 2016 7:55 PM, "Jack Krupansky" <jack.krupan...@gmail.com>
>> wrote:
>> >
>> > DataStax Enterprise (DSE) should be fine for three or even four data
>> centers in the same cluster. Or are you talking about some custom Solr
>> implementation?
>> >
>> > -- Jack Krupansky
>> >
>> > On Fri, Mar 4, 2016 at 9:21 AM, <sean_r_dur...@homedepot.com> wrote:
>> >>
>> >> Sure. Just add a new DC. Alter your keyspaces with a new replication
>> factor for that DC. Run repairs on the new DC to get the data streamed.
>> Then make sure your clients only connect to the DC(s) that they need.
>> >>
>> >>
>> >>
>> >> Separation of workloads is one of the key powers of a Cassandra
>> cluster.
>> >>
>> >>
>> >>
>> >> You may want to look at different configurations for the analytics
>> cluster – smaller replication factor, more memory per node, more disk per
>> node, perhaps less vnodes. Others may chime in with their experience.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> Sean Durity
>> >>
>> >>
>> >>
>> >> From: Bhuvan Rawal [mailto:bhu1ra...@gmail.com]
>> >> Sent: Friday, March 04, 2016 3:27 AM
>> >> To: user@cassandra.apache.org
>> >> Subject: How to create an additional cluster in Cassandra exclusively
>> for Analytics Purpose
>> >>
>> >>
>> >>
>> >> Hi,
>> >>
>> >>
>> >>
>> >> We would like to create an additional C* data center for batch
>> processing using spark on CFS. We would like to limit this DC exclusively
>> for Spark operations and would like to continue the Application Servers to
>> continue fetching data from OLTP.
>> >>
>> >>
>> >>
>> >> Is there any way to configure the same?
>> >>
>> >>
>> >>
>> >>
>> >> ​
>> >>
>> >> Regards,
>> >>
>> >> Bhuvan
>> >>
>> >>
>> >> 
>> >>
>> >> The information in this Internet Email is confidential and may be
>> legally privileged. It is intended solely for the addressee. Access to this
>> Email by anyone else is unauthorized. If you are not the intended
>> recipient, any disclosure, copying, distribution or any action taken or
>> omitted to be taken in reliance on it, is prohibited and may be unlawful.
>> When addressed to our clients any opinions or advice contained in this
>> Email are subject to the terms and conditions expressed in any applicable
>> governing The Home Depot terms of business or client engagement letter. The
>> Home Depot disclaims all responsibility and liability for the accuracy and
>> content of this attachment and for any damages or losses arising from any
>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>> items of a destructive nature, which may be contained in this attachment
>> and shall not be liable for direct, indirect, consequential or special
>> damages in connection with this e-mail message or its attachment.
>> >
>> >
>>
>
>


Re: How to create an additional cluster in Cassandra exclusively for Analytics Purpose

2016-03-05 Thread Bhuvan Rawal
Thanks Sean and Nirmallaya.

@Jack, We are going with DSC right now and plan to use spark and later solr
over the analytics DC. The use case is to have  olap and oltp workloads
separated and not intertwine them, whether it is achieved by creating a new
DC or a new cluster altogether. From Nirmallaya's and Sean's answer I could
understand that its easily achievable by creating a separate DC, app client
will need to be made DC aware and it should not make a coordinator in dc3.
And same goes for spark configuration, it should read from 3rd DC. Correct
me if I'm wrong.

On Mar 4, 2016 7:55 PM, "Jack Krupansky" <jack.krupan...@gmail.com> wrote:
>
> DataStax Enterprise (DSE) should be fine for three or even four data
centers in the same cluster. Or are you talking about some custom Solr
implementation?
>
> -- Jack Krupansky
>
> On Fri, Mar 4, 2016 at 9:21 AM, <sean_r_dur...@homedepot.com> wrote:
>>
>> Sure. Just add a new DC. Alter your keyspaces with a new replication
factor for that DC. Run repairs on the new DC to get the data streamed.
Then make sure your clients only connect to the DC(s) that they need.
>>
>>
>>
>> Separation of workloads is one of the key powers of a Cassandra cluster.
>>
>>
>>
>> You may want to look at different configurations for the analytics
cluster – smaller replication factor, more memory per node, more disk per
node, perhaps less vnodes. Others may chime in with their experience.
>>
>>
>>
>>
>>
>> Sean Durity
>>
>>
>>
>> From: Bhuvan Rawal [mailto:bhu1ra...@gmail.com]
>> Sent: Friday, March 04, 2016 3:27 AM
>> To: user@cassandra.apache.org
>> Subject: How to create an additional cluster in Cassandra exclusively
for Analytics Purpose
>>
>>
>>
>> Hi,
>>
>>
>>
>> We would like to create an additional C* data center for batch
processing using spark on CFS. We would like to limit this DC exclusively
for Spark operations and would like to continue the Application Servers to
continue fetching data from OLTP.
>>
>>
>>
>> Is there any way to configure the same?
>>
>>
>>
>>
>> ​
>>
>> Regards,
>>
>> Bhuvan
>>
>>
>> 
>>
>> The information in this Internet Email is confidential and may be
legally privileged. It is intended solely for the addressee. Access to this
Email by anyone else is unauthorized. If you are not the intended
recipient, any disclosure, copying, distribution or any action taken or
omitted to be taken in reliance on it, is prohibited and may be unlawful.
When addressed to our clients any opinions or advice contained in this
Email are subject to the terms and conditions expressed in any applicable
governing The Home Depot terms of business or client engagement letter. The
Home Depot disclaims all responsibility and liability for the accuracy and
content of this attachment and for any damages or losses arising from any
inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
items of a destructive nature, which may be contained in this attachment
and shall not be liable for direct, indirect, consequential or special
damages in connection with this e-mail message or its attachment.
>
>


Re: installing DSE

2016-02-12 Thread Bhuvan Rawal
I believe you missed this note :

   1. Attention: Depending on your environment, you might need to replace @ in
   your email address with %40 and escape any character in your password
   that is used in your operating system's command line. Examples: \! and \|
   .


On Sat, Feb 13, 2016 at 3:15 AM, Ted Yu  wrote:

> Hi,
> I followed this guide:
>
> https://docs.datastax.com/en/datastax_enterprise/4.5/datastax_enterprise/install/installRHELdse.html
>
> and populated /etc/yum.repos.d/datastax.repo with DataStax Academy account
> info.
>
> [Errno 14] PYCURL ERROR 6 - "Couldn't resolve host 'gmail.com:p
> as...@rpm.datastax.com'"
> Trying other mirror.
>
> Can someone give me hint ?
>
> Thanks
>


Re: CASSANDRA-8072

2016-02-08 Thread Bhuvan Rawal
Hi Ted,

Have you specified the listen_address and rpc_address? What addresses are
there in the seed list?

Have you started seed first and after waiting for 30 seconds started other
nodes?


On Tue, Feb 9, 2016 at 12:14 AM, Ted Yu  wrote:

> Hi,
> I am trying to setup a cluster with DSE 4.8.4
>
> I added the following in resources/cassandra/conf/cassandra.yaml :
>
> cluster_name: 'cass'
>
> which resulted in:
>
> http://pastebin.com/27adxKTM
>
> This seems to be resolved by CASSANDRA-8072
>
> My question is whether there is workaround ?
> If not, when can I expect 2.1.13 release ?
>
> Thanks
>


Re: specifying listen_address

2016-02-08 Thread Bhuvan Rawal
Hi Ted,

Are you sure the path to yaml is correct?
For me(DSE 4.8.4) it is /etc/dse/cassandra/cassandra.yaml

On Mon, Feb 8, 2016 at 11:22 PM, Ted Yu  wrote:

> Hi,
> I downloaded and expanded DSE 4.8.4
>
> When I specify the following in resources/dse/conf/dse.yaml :
>
> listen_address: XX.YY
>
> I got:
> INFO  17:43:10  Loading settings from
> file:/home/cassandra/dse-4.8.4/resources/dse/conf/dse.yaml
> Exception in thread "main" java.lang.ExceptionInInitializerError
> at com.datastax.bdp.DseCoreModule.(DseCoreModule.java:43)
> at com.datastax.bdp.DseModule.getRequiredModules(DseModule.java:97)
> at
> com.datastax.bdp.server.AbstractDseModule.configure(AbstractDseModule.java:26)
> ...
> Caused by: org.yaml.snakeyaml.error.YAMLException: Unable to find property
> 'listen_address' on class: com.datastax.bdp.config.Config
> at
> org.yaml.snakeyaml.introspector.PropertyUtils.getProperty(PropertyUtils.java:132)
> at
> org.yaml.snakeyaml.introspector.PropertyUtils.getProperty(PropertyUtils.java:121)
>
> Some hint is appreciated.
>
> If this is not the proper mailing list, please direct me to proper one.
>
> Thanks
>
>


Re: specifying listen_address

2016-02-08 Thread Bhuvan Rawal
In either case, these properties should be placed in cassandra.yaml file
rather than dse.yaml.

You can find it in /resources/cassandra/conf directory.

On Mon, Feb 8, 2016 at 11:41 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> I didn't start cassandra as service.
>
> I am starting as stand-alone process. Is multiple node setup not supported
> in stand-alone mode ?
>
> Caused by: org.yaml.snakeyaml.error.YAMLException: Unable to find property
> 'cluster_name' on class: com.datastax.bdp.config.Config
> at
> org.yaml.snakeyaml.introspector.PropertyUtils.getProperty(PropertyUtils.java:132)
> at
> org.yaml.snakeyaml.introspector.PropertyUtils.getProperty(PropertyUtils.java:121)
>
> Thanks
>
> On Mon, Feb 8, 2016 at 10:04 AM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>
>> Hi Ted,
>>
>> Are you sure the path to yaml is correct?
>> For me(DSE 4.8.4) it is /etc/dse/cassandra/cassandra.yaml
>>
>> On Mon, Feb 8, 2016 at 11:22 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>>> Hi,
>>> I downloaded and expanded DSE 4.8.4
>>>
>>> When I specify the following in resources/dse/conf/dse.yaml :
>>>
>>> listen_address: XX.YY
>>>
>>> I got:
>>> INFO  17:43:10  Loading settings from
>>> file:/home/cassandra/dse-4.8.4/resources/dse/conf/dse.yaml
>>> Exception in thread "main" java.lang.ExceptionInInitializerError
>>> at com.datastax.bdp.DseCoreModule.(DseCoreModule.java:43)
>>> at com.datastax.bdp.DseModule.getRequiredModules(DseModule.java:97)
>>> at
>>> com.datastax.bdp.server.AbstractDseModule.configure(AbstractDseModule.java:26)
>>> ...
>>> Caused by: org.yaml.snakeyaml.error.YAMLException: Unable to find
>>> property 'listen_address' on class: com.datastax.bdp.config.Config
>>> at
>>> org.yaml.snakeyaml.introspector.PropertyUtils.getProperty(PropertyUtils.java:132)
>>> at
>>> org.yaml.snakeyaml.introspector.PropertyUtils.getProperty(PropertyUtils.java:121)
>>>
>>> Some hint is appreciated.
>>>
>>> If this is not the proper mailing list, please direct me to proper one.
>>>
>>> Thanks
>>>
>>>
>>
>


Re: CASSANDRA-8072

2016-02-08 Thread Bhuvan Rawal
Your config looks fine to me,  i tried reproducing the scenario by setting
localhost in listen_address,rpc_address and seed list, and it worked fine,
I had earlier the node local ip in the 3 fields and it was working fine.

Looks like there is some other issue here.

On Tue, Feb 9, 2016 at 12:49 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> Here it is:
> http://pastebin.com/QEdjtAj6
>
> XX.YY is localhost in this case.
>
> On Mon, Feb 8, 2016 at 11:03 AM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>
>> could you paste your cassandra.yaml here, except for commented out lines?
>>
>> On Tue, Feb 9, 2016 at 12:30 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>>> The issue I described was observed on the seed node.
>>>
>>> Both rpc_address and listen_address point to localhost.
>>>
>>> bq. What addresses are there in the seed list?
>>>
>>> The IP of the seed node.
>>>
>>> I haven't come to starting non-seed node(s) yet.
>>>
>>> Thanks for the quick response.
>>>
>>> On Mon, Feb 8, 2016 at 10:50 AM, Bhuvan Rawal <bhu1ra...@gmail.com>
>>> wrote:
>>>
>>>> Hi Ted,
>>>>
>>>> Have you specified the listen_address and rpc_address? What addresses
>>>> are there in the seed list?
>>>>
>>>> Have you started seed first and after waiting for 30 seconds started
>>>> other nodes?
>>>>
>>>>
>>>> On Tue, Feb 9, 2016 at 12:14 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>> I am trying to setup a cluster with DSE 4.8.4
>>>>>
>>>>> I added the following in resources/cassandra/conf/cassandra.yaml :
>>>>>
>>>>> cluster_name: 'cass'
>>>>>
>>>>> which resulted in:
>>>>>
>>>>> http://pastebin.com/27adxKTM
>>>>>
>>>>> This seems to be resolved by CASSANDRA-8072
>>>>>
>>>>> My question is whether there is workaround ?
>>>>> If not, when can I expect 2.1.13 release ?
>>>>>
>>>>> Thanks
>>>>>
>>>>
>>>>
>>>
>>
>


Re: CASSANDRA-8072

2016-02-08 Thread Bhuvan Rawal
could you paste your cassandra.yaml here, except for commented out lines?

On Tue, Feb 9, 2016 at 12:30 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> The issue I described was observed on the seed node.
>
> Both rpc_address and listen_address point to localhost.
>
> bq. What addresses are there in the seed list?
>
> The IP of the seed node.
>
> I haven't come to starting non-seed node(s) yet.
>
> Thanks for the quick response.
>
> On Mon, Feb 8, 2016 at 10:50 AM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>
>> Hi Ted,
>>
>> Have you specified the listen_address and rpc_address? What addresses are
>> there in the seed list?
>>
>> Have you started seed first and after waiting for 30 seconds started
>> other nodes?
>>
>>
>> On Tue, Feb 9, 2016 at 12:14 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>>> Hi,
>>> I am trying to setup a cluster with DSE 4.8.4
>>>
>>> I added the following in resources/cassandra/conf/cassandra.yaml :
>>>
>>> cluster_name: 'cass'
>>>
>>> which resulted in:
>>>
>>> http://pastebin.com/27adxKTM
>>>
>>> This seems to be resolved by CASSANDRA-8072
>>>
>>> My question is whether there is workaround ?
>>> If not, when can I expect 2.1.13 release ?
>>>
>>> Thanks
>>>
>>
>>
>


Want inputs about super column family vs map/list

2016-02-04 Thread Bhuvan Rawal
Hi All,

There are two ways to achieve this :
1. Using super column family:

raman | atul | bhuvan
---
1234  | 5678 | 2345

OR
Using single Collection column :
Phone Number
---
Map 



I would like to know which approach would be better in the below use cases :

   1. First Case - Frequent complete map Update
   2. Second Case - Frequent complete map Read
   3. Frequent Update only for specific fields.
   4. Frequent Read only for specific fields.

Also is there any way to configure cassandra-stress tool for testing this
scenario?

Thanks & Regards,
Bhuvan


Re: Getting error while issuing Cassandra stress

2016-01-23 Thread Bhuvan Rawal
Alright, I uninstalled the DSE from all the nodes in the cluster and
reinstalled them again from scratch. Ran nodetool status and output is U/N
for all nodes again.

Tried running cassandra-stress and it doesnt work.

I can see in the system.peer table that token ranges has been distributed,
further, other columns seem normal too.
cqlsh> select peer, data_center, host_id,  rack, release_version,
rpc_address, workload from system.peers;

peer | data_center | host_id | rack | release_version | rpc_address |
workload
-+-+--+---+-+-+---
10.41.55.23
| Cassandra | b48f7683-3f42-4c49-9ce8-694112a21d5d | rack1 | 2.1.12.1046 |
10.41.55.23 | Cassandra 10.41.55.19 | Analytics |
f271657f-52e2-4f43-8bbe-03a979852679 | rack1 | 2.1.12.1046 | 10.41.55.19 |
Analytics 10.41.55.20 | Analytics | e4f484c9-8e7f-48b7-bce8-2a91d70be4d1 |
rack1 | 2.1.12.1046 | 10.41.55.20 | Analytics 10.41.55.18 | Analytics |
813c2610-0656-4714-9ec6-99d51c72da92 | rack1 | 2.1.12.1046 | 10.41.55.18 |
Analytics 10.41.55.15 | Cassandra | 2b09a481-3e08-45e5-a2d0-4c3d5d611ef6 |
rack1 | 2.1.12.1046 | 10.41.55.15 | Cassandra 10.41.55.22 | Cassandra |
d16cc704-4453-4dc0-880e-4af6fbd0dc48 | rack1 | 2.1.12.1046 | 10.41.55.22 |
Cassandra 10.41.55.17 | Analytics | dfbb4f0e-2ffa-4694-811c-1167662ad537 |
rack1 | 2.1.12.1046 | 10.41.55.17 | Analytics

This seems to be fine too:
$ nodetool describecluster Cluster Information: Name: POC Cluster Snitch:
org.apache.cassandra.locator.DynamicEndpointSnitch Partitioner:
org.apache.cassandra.dht.Murmur3Partitioner Schema versions:
07d395d2-e5b8-3e67-addd-3f8d1e8d7803: [10.41.55.15, 10.41.55.17,
10.41.55.19, 10.41.55.18, 10.41.55.21, 10.41.55.20, 10.41.55.23,
10.41.55.22]

I tried to telnet to a node of other cluster which is not a seed, here are
the observations :
Telnet to ports 9042, 7000,9160 seem to work fine
telnet 10.41.55.18 9042 # connects
telnet 10.41.55.18 7000 # connects
telnet 10.41.55.18 9160 # connects
But telnet to JMX port is not working
telnet 10.41.55.18 7199 # doesnt connect
Trying 10.41.55.18...
telnet: connect to address 10.41.55.18: Connection refused

Could this be the reason?

On Sat, Jan 23, 2016 at 7:33 AM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:

> Should I use nodetool repair utility
>>
>
> That wouldn't help, this an anti-entropy mechanism (see
> https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsRepair.html#toolsRepair__description_unique_27
> ).
>
> It is something really important too often left aside.
>
> Yet, your issue here is not about consistency. The client can't find any
> node in charge of the read / written tokens in the ring. This depends on
> the topology, the replication factor and your network mainly. I think there
> is something wrong in your setup. I would try this:
>
> - Make sure connection / port are ok
> - Try increasing the RF / Strategy in the stress tool
> - Try with an other consistency level (not LOCAL_*, as mentioned here :
> http://stackoverflow.com/questions/32055251/not-enough-replica-available-for-query-at-consistency-local-one-1-required-but
> )
>
> Good luck,
>
> -
> Alain
>
> The Last Pickle
> http://www.thelastpickle.com
>
> 2016-01-22 23:02 GMT+01:00 Bhuvan Rawal <bhu1ra...@gmail.com>:
>
>> Getting same exception again. Should I use nodetool repair utility?
>>
>> On Sat, Jan 23, 2016 at 3:10 AM, Sebastian Estevez <
>> sebastian.este...@datastax.com> wrote:
>>
>>> https://github.com/brianmhess/cassandra-loader
>>>
>>> All the best,
>>>
>>>
>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>
>>> Sebastián Estévez
>>>
>>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>>>
>>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
>>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>>> <https://twitter.com/datastax> [image: g+.png]
>>> <https://plus.google.com/+Datastax/about>
>>> <http://feeds.feedburner.com/datastax>
>>> <http://goog_410786983>
>>>
>>>
>>> <http://www.datastax.com/gartner-magic-quadrant-odbms>
>>>
>>> DataStax is the fastest, most scalable distributed database technology,
>>> delivering Apache Cassandra to the world’s most innovative enterprises.
>>> Datastax is built to be agile, always-on, and predictably scalable to any
>>> size. With more than 500 customers in 45 countries, DataStax is the
>>> database technology and transactional backbone of choice for the worlds
>>> most innovative companies such as Ne

Re: Getting error while issuing Cassandra stress

2016-01-23 Thread Bhuvan Rawal
Hi Alan,

You have hit bulls eye! I used a custom yaml file for configuring keyspace,
used NTT and multiple replications for each cluster, defined table schema
and query and it seemed to work great!!

I have some apprehensions here though. The tests that I have done did not
yield results that I expected, Im not sure if I configured everything
correctly. Sharing results in a separate mail.

Thanks & Regards,
Bhuvan

On Sat, Jan 23, 2016 at 5:59 PM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:

> telnet 10.41.55.18 7199 # doesnt connect
>> Trying 10.41.55.18...
>> telnet: connect to address 10.41.55.18: Connection refused
>>
>> Could this be the reason?
>>
>
> I don't think so. But I have not a clue on what's going on...
>
> Did you tried this ?
>
> - Try increasing the RF / Strategy in the stress tool
>>
>
> -schema option, pass a file if it is easier, with a custom keyspace,
> defining NTS and a RF of 2 or 3. You don't want to use RF = 1 nor
> SimpleStrategy in Multi DC under real conditions anyway, so your test will
> be more relevant.
> You can paste us the command + schema + output / error (if any)
>
>
>> - Try with an other consistency level (not LOCAL_*, as mentioned here :
>> http://stackoverflow.com/questions/32055251/not-enough-replica-available-for-query-at-consistency-local-one-1-required-but
>> )
>>
>
> Shouldn't affect you, yet, it is easy enough to test, so it might be worth
> it. Maybe the first thing I would try, just out of curiosity.
>
> C*heers,
>
> -
> Alain
>
> The Last Pickle
> http://www.thelastpickle.com
>
> 2016-01-23 11:13 GMT+01:00 Bhuvan Rawal <bhu1ra...@gmail.com>:
>
>> Alright, I uninstalled the DSE from all the nodes in the cluster and
>> reinstalled them again from scratch. Ran nodetool status and output is U/N
>> for all nodes again.
>>
>> Tried running cassandra-stress and it doesnt work.
>>
>> I can see in the system.peer table that token ranges has been
>> distributed, further, other columns seem normal too.
>> cqlsh> select peer, data_center, host_id,  rack, release_version,
>> rpc_address, workload from system.peers;
>>
>> peer | data_center | host_id | rack | release_version | rpc_address |
>> workload
>> -+-+--+---+-+-+---
>>  10.41.55.23
>> | Cassandra | b48f7683-3f42-4c49-9ce8-694112a21d5d | rack1 | 2.1.12.1046 |
>> 10.41.55.23 | Cassandra 10.41.55.19 | Analytics |
>> f271657f-52e2-4f43-8bbe-03a979852679 | rack1 | 2.1.12.1046 | 10.41.55.19 |
>> Analytics 10.41.55.20 | Analytics | e4f484c9-8e7f-48b7-bce8-2a91d70be4d1
>> | rack1 | 2.1.12.1046 | 10.41.55.20 | Analytics 10.41.55.18 | Analytics
>> | 813c2610-0656-4714-9ec6-99d51c72da92 | rack1 | 2.1.12.1046 | 10.41.55.18
>> | Analytics 10.41.55.15 | Cassandra |
>> 2b09a481-3e08-45e5-a2d0-4c3d5d611ef6 | rack1 | 2.1.12.1046 | 10.41.55.15 |
>> Cassandra 10.41.55.22 | Cassandra | d16cc704-4453-4dc0-880e-4af6fbd0dc48
>> | rack1 | 2.1.12.1046 | 10.41.55.22 | Cassandra 10.41.55.17 | Analytics
>> | dfbb4f0e-2ffa-4694-811c-1167662ad537 | rack1 | 2.1.12.1046 | 10.41.55.17
>> | Analytics
>>
>> This seems to be fine too:
>> $ nodetool describecluster Cluster Information: Name: POC Cluster Snitch:
>> org.apache.cassandra.locator.DynamicEndpointSnitch Partitioner:
>> org.apache.cassandra.dht.Murmur3Partitioner Schema versions:
>> 07d395d2-e5b8-3e67-addd-3f8d1e8d7803: [10.41.55.15, 10.41.55.17,
>> 10.41.55.19, 10.41.55.18, 10.41.55.21, 10.41.55.20, 10.41.55.23,
>> 10.41.55.22]
>>
>> I tried to telnet to a node of other cluster which is not a seed, here
>> are the observations :
>> Telnet to ports 9042, 7000,9160 seem to work fine
>> telnet 10.41.55.18 9042 # connects
>> telnet 10.41.55.18 7000 # connects
>> telnet 10.41.55.18 9160 # connects
>> But telnet to JMX port is not working
>> telnet 10.41.55.18 7199 # doesnt connect
>> Trying 10.41.55.18...
>> telnet: connect to address 10.41.55.18: Connection refused
>>
>> Could this be the reason?
>>
>> On Sat, Jan 23, 2016 at 7:33 AM, Alain RODRIGUEZ <arodr...@gmail.com>
>> wrote:
>>
>>> Should I use nodetool repair utility
>>>>
>>>
>>> That wouldn't help, this an anti-entropy mechanism (see
>>> https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsRepair.html#toolsRepair__description_unique_27
>>> ).
>>>
>>> It is something really important too often left aside.
>>>
>>> Yet, your 

Need Feedback about cassandra-stress tests

2016-01-23 Thread Bhuvan Rawal
 4738
threadCount 54 total 46259 610 610 4738
threadCount 81 timeline 53709 645 645 4996
threadCount 81 total 53709 645 645 4996
threadCount 121 timeline 46524 650 650 5037
threadCount 121 total 46524 650 650 5037
threadCount 181 timeline 36158 673 673 5224
threadCount 181 total 36158 673 673 5224
threadCount 271 timeline 43682 747 747 5800
threadCount 271 total 43682 747 747 5800
threadCount 406 timeline 55336 785 785 6094
threadCount 406 total 55336 785 785 6094
threadCount 609 timeline 69326 831 831 6449
threadCount 609 total 69326 831 831 6449
threadCount 913 timeline 94283 837 837 6482
threadCount 913 total 94283 837 837 6482

Am I missing something here?

Thanks & Regards,
Bhuvan Rawal


Re: Getting error while issuing Cassandra stress

2016-01-22 Thread Bhuvan Rawal
I had a look at the jira below:
https://issues.apache.org/jira/browse/CASSANDRA-7905

when i opened my cassandra-rackdc.properties i saw that DC names were DC1 &
DC2, rack name was RAC1 . Please note that this is the default
configuration, I have not modified any file.

There is another point of concern here which might be relevant to previous
one as well, im not able to login to cqlsh directly, i.e. I have to specify
ip as well even when im logged in to that machine.

$ cqlsh
Connection error: ('Unable to connect to any servers', {'127.0.0.1':
error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error:
Connection refused")})

whereas
$ cqlsh 
works fine

is that the reason why the cassandra-stress is not able to communicate with
other replicas?

On Sat, Jan 23, 2016 at 1:37 AM, Sebastian Estevez <
sebastian.este...@datastax.com> wrote:

> Sorry I missed that.
>
> Both your nodetool status and keyspace replication settings say Cassandra
> and Analytics for the DC names. I'm not sure where you're seeing DC1, DC2,
> etc. and why you suspect that is the problem.
>
> All the best,
>
>
> [image: datastax_logo.png] <http://www.datastax.com/>
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>
> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
> <https://twitter.com/datastax> [image: g+.png]
> <https://plus.google.com/+Datastax/about>
> <http://feeds.feedburner.com/datastax>
> <http://goog_410786983>
>
>
> <http://www.datastax.com/gartner-magic-quadrant-odbms>
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Fri, Jan 22, 2016 at 1:45 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>
>> Hi Sebastian,
>>
>> I had attached nodetool status output in previous mail, pasting it again :
>>
>> $ nodetool status Datacenter: Analytics =
>> Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load
>> Tokens Owns Host ID Rack UN 10.41.55.17 428.5 KB 256 ?
>> 39d6d585-e641-4046-9d0b-797356597b5e rack1 UN 10.41.55.19 404.44 KB 256 ?
>> 69edf930-efd9-4d74-a798-f3d4ac02e516 rack1 UN 10.41.55.18 423.21 KB 256 ?
>> b74bab13-09b2-4760-bce9-c8ef05e50f6d rack1 UN 10.41.55.20 683.23 KB 256 ?
>> fb5c4fed-6e1e-4ea8-838d-358106906830 rack1 Datacenter: Cassandra
>> = Status=Up/Down |/
>> State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID
>> Rack UN 10.41.55.15 209.4 KB 256 ? ffc3b9a0-5d5c-4a3d-a99e-49d255731278
>> rack1 UN 10.41.55.21 227.44 KB 256 ? c68deba4-b9a2-43fc-bb13-6af74c88c210
>> rack1 UN 10.41.55.23 222.71 KB 256 ? 8229aa87-af00-48fa-ad6b-3066d3dc0e58
>> rack1 UN 10.41.55.22 218.72 KB 256 ? c7ba84fd-7992-41de-8c88-11574a72db99
>> rack1
>>
>> Regards,
>> Bhuvan Rawal
>>
>> On Sat, Jan 23, 2016 at 12:11 AM, Sebastian Estevez <
>> sebastian.este...@datastax.com> wrote:
>>
>>> The output of `nodetool status` would help us diagnose.
>>>
>>> All the best,
>>>
>>>
>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>
>>> Sebastián Estévez
>>>
>>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>>>
>>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
>>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>>> <https://twitter.com/datastax> [image: g+.png]
>>> <https://plus.google.com/+Datastax/about>
>>> <http://feeds.feedburner.com/datastax>
>>> <http://goog_410786983>
>>>
>>>
>>> <http://www.datastax.com/gartner-magic-quadrant-odbms>
>>>
>>> DataStax is the fastest, most scalable distributed database technology,
>>> delivering Apache Cassandra to the world’s most innovative enterprises.
>>> Datastax is built to be agile, always-on, and predictably scalable to any
>>> size. With more than 500 customers in 45 countries, DataStax is the
>>> database technology and transactional backbone of choice for the worlds
>>> most innovative companies such as

Re: Getting error while issuing Cassandra stress

2016-01-22 Thread Bhuvan Rawal
Yes im specifying -node parameter to stress, otherwise it throws network
connection failed.

Can you point me to a sample java application to test pushing data from
external server? Let's see if that works

On Sat, Jan 23, 2016 at 2:55 AM, Sebastian Estevez <
sebastian.este...@datastax.com> wrote:

> when i opened my cassandra-rackdc.properties i saw that DC names were DC1
>> & DC2, rack name was RAC1 . Please note that this is the default
>> configuration, I have not modified any file.
>
>
> cassandra-rackdc.properties is only respected based on your snitch
> <https://docs.datastax.com/en/cassandra/2.1/cassandra/architecture/architectureSnitchesAbout_c.html>
> .
>
> $ cqlsh
>> Connection error: ('Unable to connect to any servers', {'127.0.0.1':
>> error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error:
>> Connection refused")})
>> whereas
>> $ cqlsh 
>> works fine
>> is that the reason why the cassandra-stress is not able to communicate
>> with other replicas?
>
>
> Are you providing the -node parameter to stress
> <http://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCStress_t.html>
> ?
>
>
>
> All the best,
>
>
> [image: datastax_logo.png] <http://www.datastax.com/>
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>
> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
> <https://twitter.com/datastax> [image: g+.png]
> <https://plus.google.com/+Datastax/about>
> <http://feeds.feedburner.com/datastax>
> <http://goog_410786983>
>
>
> <http://www.datastax.com/gartner-magic-quadrant-odbms>
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Fri, Jan 22, 2016 at 4:07 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>
>> I had a look at the jira below:
>> https://issues.apache.org/jira/browse/CASSANDRA-7905
>>
>> when i opened my cassandra-rackdc.properties i saw that DC names were DC1
>> & DC2, rack name was RAC1 . Please note that this is the default
>> configuration, I have not modified any file.
>>
>> There is another point of concern here which might be relevant to
>> previous one as well, im not able to login to cqlsh directly, i.e. I have
>> to specify ip as well even when im logged in to that machine.
>>
>> $ cqlsh
>> Connection error: ('Unable to connect to any servers', {'127.0.0.1':
>> error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error:
>> Connection refused")})
>>
>> whereas
>> $ cqlsh 
>> works fine
>>
>> is that the reason why the cassandra-stress is not able to communicate
>> with other replicas?
>>
>> On Sat, Jan 23, 2016 at 1:37 AM, Sebastian Estevez <
>> sebastian.este...@datastax.com> wrote:
>>
>>> Sorry I missed that.
>>>
>>> Both your nodetool status and keyspace replication settings say
>>> Cassandra and Analytics for the DC names. I'm not sure where you're seeing
>>> DC1, DC2, etc. and why you suspect that is the problem.
>>>
>>> All the best,
>>>
>>>
>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>
>>> Sebastián Estévez
>>>
>>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>>>
>>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
>>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>>> <https://twitter.com/datastax> [image: g+.png]
>>> <https://plus.google.com/+Datastax/about>
>>> <http://feeds.feedburner.com/datastax>
>>> <http://goog_410786983>
>>>
>>>
>>> <http://www.datastax.com/gartner-magic-quadrant-odbms>
>>>
>>> DataStax is the fastest, most scalable distributed database technology,
>>> delivering Apache Cassandra to the world’s most innovative enterprises.
>>> Datastax is built to be agile, always-on, and predictably scalable to any
>>> size. With more than 500 customers in 45 count

Re: Getting error while issuing Cassandra stress

2016-01-22 Thread Bhuvan Rawal
Thanks for the response Alain,

cqlsh> create keyspace mykeyspace WITH replication =
{'class':'NetworkTopologyStrategy', 'Analytics':2, 'Cassandra':3}
cqlsh> use mykeyspace;
cqlsh:mykeyspace>create table mytable (id int primary key, name text,
address text, phone text);
cqlsh:mykeyspace> insert into mytable (id, name, address, phone) values (1,
'Kiyu','Texas', '555-1212'); # and other similar statement
I then issued the below command from every node and found consistent
results.
cqlsh:mykeyspace> select * from mytable;

// Then i repeated the above steps for NetworkTopologyStrategy and found
same results

I ran basic cassandra stress
seed1 - seed of datacenter 1
 $ cassandra-stress write n=5 -rate threads=4 -node any_random_ip
 $ cassandra-stress write n=5 -rate threads=4 -node seed1
 $ cassandra-stress write n=5 -rate threads=4 -node seed1,seed2
 $ cassandra-stress write n=5 -rate threads=4 -node
all_8_ip_comma_seperated
 $ cassandra-stress write n=100 cl=one -mode native cql3 -schema
keyspace="keyspace1" -pop seq=1..100 -node ip1,ip2,ip3,ip4

All of them threw the exception
*com.datastax.driver.core.exceptions.UnavailableException: Not enough
replica available for query at consistency LOCAL_ONE (1 required but only 0
alive)*


I have a feeling that the issue is with datacenter name for some reason,
because in some config files I found DC name to be like DC1/DC2/DC3 in some
it is like Cassandra/Analytics (The ones I had specified while
installation). Im unsure which yaml/property file to look for correct
inconsistency.

(C*heers :) - im so tempted to copy that)

Regards,
Bhuvan

On Fri, Jan 22, 2016 at 8:47 PM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:

> Hi,
>
> The the exact command you ran (stress-tool with options) could be useful
> to help you on that.
>
> However, Im able to create keyspace, tables and insert data using cqlsh
>> and it is replicating fine to all the nodes.
>
>
> Having the schema might be useful too.
>
> Did you ran the cqlsh and the stress-tool from the same server ? If not,
> you might want to check the port you use (9042/9160/...) are open.
> Also, cqlsh uses local_one by default too. If both commands were run
> against the same DC, from the same machine they should behave the same way.
> Are they ?
>
> C*heers,
>
> -----
> Alain
>
> The Last Pickle
> http://www.thelastpickle.com
>
>
> 2016-01-22 9:57 GMT+01:00 Bhuvan Rawal <bhu1ra...@gmail.com>:
>
>> Hi,
>>
>> i have created a POC cluster with 2 DC , each having 4 nodes with DSE
>> 4.8.1 installed.
>>
>> On issuing cassandra stress im getting an error  and data is not being
>> inserted:
>> *com.datastax.driver.core.exceptions.UnavailableException: Not enough
>> replica available for query at consistency LOCAL_ONE (1 required but only 0
>> alive)*
>>
>> However, Im able to create keyspace, tables and insert data using cqlsh
>> and it is replicating fine to all the nodes.
>>
>> Details of the cluster can be found below (all the nodes seem to be alive
>> and kicking):
>>
>> $ nodetool status Datacenter: Analytics =
>> Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load
>> Tokens Owns Host ID Rack UN 10.41.55.17 428.5 KB 256 ?
>> 39d6d585-e641-4046-9d0b-797356597b5e rack1 UN 10.41.55.19 404.44 KB 256 ?
>> 69edf930-efd9-4d74-a798-f3d4ac02e516 rack1 UN 10.41.55.18 423.21 KB 256 ?
>> b74bab13-09b2-4760-bce9-c8ef05e50f6d rack1 UN 10.41.55.20 683.23 KB 256 ?
>> fb5c4fed-6e1e-4ea8-838d-358106906830 rack1 Datacenter: Cassandra
>> = Status=Up/Down |/
>> State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID
>> Rack UN 10.41.55.15 209.4 KB 256 ? ffc3b9a0-5d5c-4a3d-a99e-49d255731278
>> rack1 UN 10.41.55.21 227.44 KB 256 ? c68deba4-b9a2-43fc-bb13-6af74c88c210
>> rack1 UN 10.41.55.23 222.71 KB 256 ? 8229aa87-af00-48fa-ad6b-3066d3dc0e58
>> rack1 UN 10.41.55.22 218.72 KB 256 ? c7ba84fd-7992-41de-8c88-11574a72db99
>> rack1
>>
>> Regards,
>> Bhuvan Rawal
>>
>>
>>
>


Re: Getting error while issuing Cassandra stress

2016-01-22 Thread Bhuvan Rawal
Hi Sebastian,

I had attached nodetool status output in previous mail, pasting it again :

$ nodetool status Datacenter: Analytics =
Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load
Tokens Owns Host ID Rack UN 10.41.55.17 428.5 KB 256 ?
39d6d585-e641-4046-9d0b-797356597b5e rack1 UN 10.41.55.19 404.44 KB 256 ?
69edf930-efd9-4d74-a798-f3d4ac02e516 rack1 UN 10.41.55.18 423.21 KB 256 ?
b74bab13-09b2-4760-bce9-c8ef05e50f6d rack1 UN 10.41.55.20 683.23 KB 256 ?
fb5c4fed-6e1e-4ea8-838d-358106906830 rack1 Datacenter: Cassandra
= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving --
Address Load Tokens Owns Host ID Rack UN 10.41.55.15 209.4 KB 256 ?
ffc3b9a0-5d5c-4a3d-a99e-49d255731278 rack1 UN 10.41.55.21 227.44 KB 256 ?
c68deba4-b9a2-43fc-bb13-6af74c88c210 rack1 UN 10.41.55.23 222.71 KB 256 ?
8229aa87-af00-48fa-ad6b-3066d3dc0e58 rack1 UN 10.41.55.22 218.72 KB 256 ?
c7ba84fd-7992-41de-8c88-11574a72db99 rack1

Regards,
Bhuvan Rawal

On Sat, Jan 23, 2016 at 12:11 AM, Sebastian Estevez <
sebastian.este...@datastax.com> wrote:

> The output of `nodetool status` would help us diagnose.
>
> All the best,
>
>
> [image: datastax_logo.png] <http://www.datastax.com/>
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>
> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
> <https://twitter.com/datastax> [image: g+.png]
> <https://plus.google.com/+Datastax/about>
> <http://feeds.feedburner.com/datastax>
> <http://goog_410786983>
>
>
> <http://www.datastax.com/gartner-magic-quadrant-odbms>
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Fri, Jan 22, 2016 at 1:39 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>
>> Thanks for the response Alain,
>>
>> cqlsh> create keyspace mykeyspace WITH replication =
>> {'class':'NetworkTopologyStrategy', 'Analytics':2, 'Cassandra':3}
>> cqlsh> use mykeyspace;
>> cqlsh:mykeyspace>create table mytable (id int primary key, name text,
>> address text, phone text);
>> cqlsh:mykeyspace> insert into mytable (id, name, address, phone) values
>> (1, 'Kiyu','Texas', '555-1212'); # and other similar statement
>> I then issued the below command from every node and found consistent
>> results.
>> cqlsh:mykeyspace> select * from mytable;
>>
>> // Then i repeated the above steps for NetworkTopologyStrategy and found
>> same results
>>
>> I ran basic cassandra stress
>> seed1 - seed of datacenter 1
>>  $ cassandra-stress write n=5 -rate threads=4 -node any_random_ip
>>  $ cassandra-stress write n=5 -rate threads=4 -node seed1
>>  $ cassandra-stress write n=5 -rate threads=4 -node seed1,seed2
>>  $ cassandra-stress write n=5 -rate threads=4 -node
>> all_8_ip_comma_seperated
>>  $ cassandra-stress write n=100 cl=one -mode native cql3 -schema
>> keyspace="keyspace1" -pop seq=1..100 -node ip1,ip2,ip3,ip4
>>
>> All of them threw the exception
>> *com.datastax.driver.core.exceptions.UnavailableException: Not enough
>> replica available for query at consistency LOCAL_ONE (1 required but only 0
>> alive)*
>>
>>
>> I have a feeling that the issue is with datacenter name for some reason,
>> because in some config files I found DC name to be like DC1/DC2/DC3 in some
>> it is like Cassandra/Analytics (The ones I had specified while
>> installation). Im unsure which yaml/property file to look for correct
>> inconsistency.
>>
>> (C*heers :) - im so tempted to copy that)
>>
>> Regards,
>> Bhuvan
>>
>> On Fri, Jan 22, 2016 at 8:47 PM, Alain RODRIGUEZ <arodr...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> The the exact command you ran (stress-tool with options) could be useful
>>> to help you on that.
>>>
>>> However, Im able to create keyspace, tables and insert data using cqlsh
>>>> and it is replicating fine to all the nodes.
>>>
>>>
>>> Having the schema might be useful too.
>>>
>>> Did you ran the cqlsh and the stress-tool from the same server 

Re: Getting error while issuing Cassandra stress

2016-01-22 Thread Bhuvan Rawal
Getting same exception again. Should I use nodetool repair utility?

On Sat, Jan 23, 2016 at 3:10 AM, Sebastian Estevez <
sebastian.este...@datastax.com> wrote:

> https://github.com/brianmhess/cassandra-loader
>
> All the best,
>
>
> [image: datastax_logo.png] <http://www.datastax.com/>
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>
> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
> <https://twitter.com/datastax> [image: g+.png]
> <https://plus.google.com/+Datastax/about>
> <http://feeds.feedburner.com/datastax>
> <http://goog_410786983>
>
>
> <http://www.datastax.com/gartner-magic-quadrant-odbms>
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Fri, Jan 22, 2016 at 4:37 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>
>> Yes im specifying -node parameter to stress, otherwise it throws network
>> connection failed.
>>
>> Can you point me to a sample java application to test pushing data from
>> external server? Let's see if that works
>>
>> On Sat, Jan 23, 2016 at 2:55 AM, Sebastian Estevez <
>> sebastian.este...@datastax.com> wrote:
>>
>>> when i opened my cassandra-rackdc.properties i saw that DC names were
>>>> DC1 & DC2, rack name was RAC1 . Please note that this is the default
>>>> configuration, I have not modified any file.
>>>
>>>
>>> cassandra-rackdc.properties is only respected based on your snitch
>>> <https://docs.datastax.com/en/cassandra/2.1/cassandra/architecture/architectureSnitchesAbout_c.html>
>>> .
>>>
>>> $ cqlsh
>>>> Connection error: ('Unable to connect to any servers', {'127.0.0.1':
>>>> error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error:
>>>> Connection refused")})
>>>> whereas
>>>> $ cqlsh 
>>>> works fine
>>>> is that the reason why the cassandra-stress is not able to communicate
>>>> with other replicas?
>>>
>>>
>>> Are you providing the -node parameter to stress
>>> <http://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCStress_t.html>
>>> ?
>>>
>>>
>>>
>>> All the best,
>>>
>>>
>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>
>>> Sebastián Estévez
>>>
>>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>>>
>>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
>>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>>> <https://twitter.com/datastax> [image: g+.png]
>>> <https://plus.google.com/+Datastax/about>
>>> <http://feeds.feedburner.com/datastax>
>>> <http://goog_410786983>
>>>
>>>
>>> <http://www.datastax.com/gartner-magic-quadrant-odbms>
>>>
>>> DataStax is the fastest, most scalable distributed database technology,
>>> delivering Apache Cassandra to the world’s most innovative enterprises.
>>> Datastax is built to be agile, always-on, and predictably scalable to any
>>> size. With more than 500 customers in 45 countries, DataStax is the
>>> database technology and transactional backbone of choice for the worlds
>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>>
>>> On Fri, Jan 22, 2016 at 4:07 PM, Bhuvan Rawal <bhu1ra...@gmail.com>
>>> wrote:
>>>
>>>> I had a look at the jira below:
>>>> https://issues.apache.org/jira/browse/CASSANDRA-7905
>>>>
>>>> when i opened my cassandra-rackdc.properties i saw that DC names were
>>>> DC1 & DC2, rack name was RAC1 . Please note that this is the default
>>>> configuration, I have not modified any file.
>>>>
>>>> There is another point of concern here which might be relevant to
>>>> previous one as well, im not able to login to cqlsh directly, i.e. I have
>

Re: Requesting some details for my use case

2016-01-07 Thread Bhuvan Rawal
Hi Jack,

We are valuing reliability and consistency over performance right now. In
E-commerce industry we can expect unexpected spikes at odd times.

Ill be grateful if you tell me about reliability and failover scenarios.

On Wed, Jan 6, 2016 at 2:59 AM, Jack Krupansky <jack.krupan...@gmail.com>
wrote:

> DataStax has documented quite a few customers/case studies:
> http://www.datastax.com/resources/casestudies
>
> Materialized Views should be considered if you can go straight to 3.0, but
> you can always do the same synthesized views yourself in your app, which is
> current standard best practice anyways. MV is just a way to automate that
> best practice.
>
> The key to performance is to characterize your load requirements and then
> make sure to provision your cluster with enough nodes to support that load.
> You'll have to do a proof of concept implementation to verify your own
> requirements. Like start with a 6 or 8 node cluster for a subset of the
> data and add nodes as needed to accommodate load. The trick is to limit the
> amount of data on each node so that incoming requests can be processed as
> rapidly as possible to meet latency requirements, and then to scale up load
> capacity by adding nodes.
>
> -- Jack Krupansky
>
> On Tue, Jan 5, 2016 at 4:02 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>
>> *Thanks Jack* *for the detailed advice*.
>>
>> Yes it is a Java Application.
>>
>> We have a Denormalized view of our data already in place,  we use it for
>> storing it in MongoDB as a cache, however will get our hands dirty before
>> implementation. We would like to have a single DB view. And replace MongoDB
>> & MySQL with a single data store. If we talk numbers then we can expect 10
>> Million create/update requests a day and ~500 Million read requests.
>>
>> The question here not "should I or should I not", but "which one".
>>
>> A lot of the features you have mentioned are supported but not advisable. 
>> *(automated
>> Materialized View feature) (Triggers are supported, but not advised)
>> (Secondary indexes are supported, but not advised). *By when do you
>> believe that these will be stable enough to use for enterprise
>> implementation?
>>
>> We have made our minds clear far as shift to NoSQL is concerned as MySQL
>> is not able to serve our purpose and is currently a bottleneck in the
>> design.
>>
>>  From all the benchmarks we have analyzed for our use case, Cassandra
>> seems to be doing better as far as performance is concerned.  Our only
>> concern is to know as a Primary Database how Cassandra compares with HBase.
>> By Primary database I mean the attributes: Data Consistency, Transaction
>> Management and Rollback, brisk Failure Recovery, cross datacenter
>> replication and partition aware sharding.
>>
>> The general opinion of Cassandra is that its more of a cache, and as we
>> are going to be replacing our primary Data Store we need something fast but
>> not at the expense of reliability. Can you guide me towards a case study
>> where someone has tuned it in such a way to perform reliably for most use
>> cases.
>>
>> Also Ill be grateful if someone directs me to a repository where I can
>> find major customers of the DB's and their case studies.
>>
>> Thanks & Regards,
>> Bhuvan
>>
>> On Tue, Jan 5, 2016 at 9:56 PM, Jack Krupansky <jack.krupan...@gmail.com>
>> wrote:
>>
>>> Bear in mind that you won't be able to merely "tune" your schema - you
>>> will need to completely redesign your data model. Step one is to look at
>>> all of the queries you need to perform and get a handle on what flat,
>>> denormalized data model they will need to execute performantly in a NoSQL
>>> database. No JOINs. No ad hoc queries. Secondary indexes are supported, but
>>> not advised. The general model is that you have a "query table" for each
>>> form of query, with the primary key adapted to the needs of the query. That
>>> means a lot of denormalization and repetition of data. The new, automated
>>> Materialized View feature of Cassandra 3.0 can help with that a lot, but is
>>> a new feature and not quite stable enough for production (no DataStax
>>> Enterprise (DSE) release with 3.0 yet.) Triggers are supported, but not
>>> advised - better to do that processing at the application level. DSE also
>>> supports Hadoop and Spark for batch/analytics and Solr for search and ad
>>> hoc queries (or use Stratio or Stargate for Lucene queries.)
>>>
>>> Best

Re: Requesting some details for my use case

2016-01-05 Thread Bhuvan Rawal
I understand, Ravi,  we have our application layers well defined. The major
changes will be in database access layers and entities will be changed.
Schema will be modified to tune the efficiency of the data store chosen.

We have been using mongo as a cache for a long time now, but as its a
document store and since we have a crisp well defined schema we chose to go
with a columnar database.

Our data size has been growing very rapidly. Currently it is 200GB with
indexes, in couple of years it will grow up to approx 5 TB. And we may need
to run procedures to aggregate data and update tables.

On Tue, Jan 5, 2016 at 6:54 PM, Ravi Krishna <sravikrish...@gmail.com>
wrote:

> You are moving from a SQL database to C* ??? I hope you are aware of the
> differences between a nosql like C* and a RDBMS. To keep it short, the app
> has to change significantly.
>
> Please read documentation on differences between nosql and RDBMS.
>
> thanks.
>
> On Tue, Jan 5, 2016 at 6:20 AM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>
>> Hi All,
>>
>> Im planning to shift from SQL database to a columnar nosql database, we
>> have streamlined our choices to Cassandra and HBase. I would really
>> appreciate if someone decent experience with both give me a honest
>> comparison on below parameters (links to neutral benchmarks/blogs also
>> appreciated):
>>
>> 1. Data Consistency (Eventual consistency allowed but define "eventual")
>> 2. Ease of Scaling Up
>> 3. Managebility
>> 4. Failure Recovery options
>> 5. Secondary Indexing
>> 6. Data Aggregation
>> 7. Query Language (3rd party wrapper solutions also allowed)
>> 8. Security
>> 9. *Commercial Support for quick solutions to issues*.
>> 10. Run batch job on data like map reduce or some common aggregation
>> functions using row scan. Any other packages for cassandra to achieve this?
>> 11. Trigger specific updates on tables used for secondary index.
>> 12. Please consider that our DB will be the source of truth, with no
>> specific requirement of immediate data consistency amongst nodes.
>>
>> Regards,
>> Bhuvan Rawal
>> SDE
>>
>
>


Requesting some details for my use case

2016-01-05 Thread Bhuvan Rawal
Hi All,

Im planning to shift from SQL database to a columnar nosql database, we
have streamlined our choices to Cassandra and HBase. I would really
appreciate if someone decent experience with both give me a honest
comparison on below parameters (links to neutral benchmarks/blogs also
appreciated):

1. Data Consistency (Eventual consistency allowed but define "eventual")
2. Ease of Scaling Up
3. Managebility
4. Failure Recovery options
5. Secondary Indexing
6. Data Aggregation
7. Query Language (3rd party wrapper solutions also allowed)
8. Security
9. *Commercial Support for quick solutions to issues*.
10. Run batch job on data like map reduce or some common aggregation
functions using row scan. Any other packages for cassandra to achieve this?
11. Trigger specific updates on tables used for secondary index.
12. Please consider that our DB will be the source of truth, with no
specific requirement of immediate data consistency amongst nodes.

Regards,
Bhuvan Rawal
SDE


Re: Requesting some details for my use case

2016-01-05 Thread Bhuvan Rawal
*Thanks Jack* *for the detailed advice*.

Yes it is a Java Application.

We have a Denormalized view of our data already in place,  we use it for
storing it in MongoDB as a cache, however will get our hands dirty before
implementation. We would like to have a single DB view. And replace MongoDB
& MySQL with a single data store. If we talk numbers then we can expect 10
Million create/update requests a day and ~500 Million read requests.

The question here not "should I or should I not", but "which one".

A lot of the features you have mentioned are supported but not
advisable. *(automated
Materialized View feature) (Triggers are supported, but not advised)
(Secondary indexes are supported, but not advised). *By when do you believe
that these will be stable enough to use for enterprise implementation?

We have made our minds clear far as shift to NoSQL is concerned as MySQL is
not able to serve our purpose and is currently a bottleneck in the design.

 From all the benchmarks we have analyzed for our use case, Cassandra seems
to be doing better as far as performance is concerned.  Our only concern is
to know as a Primary Database how Cassandra compares with HBase. By Primary
database I mean the attributes: Data Consistency, Transaction Management
and Rollback, brisk Failure Recovery, cross datacenter replication and
partition aware sharding.

The general opinion of Cassandra is that its more of a cache, and as we are
going to be replacing our primary Data Store we need something fast but not
at the expense of reliability. Can you guide me towards a case study where
someone has tuned it in such a way to perform reliably for most use cases.

Also Ill be grateful if someone directs me to a repository where I can find
major customers of the DB's and their case studies.

Thanks & Regards,
Bhuvan

On Tue, Jan 5, 2016 at 9:56 PM, Jack Krupansky <jack.krupan...@gmail.com>
wrote:

> Bear in mind that you won't be able to merely "tune" your schema - you
> will need to completely redesign your data model. Step one is to look at
> all of the queries you need to perform and get a handle on what flat,
> denormalized data model they will need to execute performantly in a NoSQL
> database. No JOINs. No ad hoc queries. Secondary indexes are supported, but
> not advised. The general model is that you have a "query table" for each
> form of query, with the primary key adapted to the needs of the query. That
> means a lot of denormalization and repetition of data. The new, automated
> Materialized View feature of Cassandra 3.0 can help with that a lot, but is
> a new feature and not quite stable enough for production (no DataStax
> Enterprise (DSE) release with 3.0 yet.) Triggers are supported, but not
> advised - better to do that processing at the application level. DSE also
> supports Hadoop and Spark for batch/analytics and Solr for search and ad
> hoc queries (or use Stratio or Stargate for Lucene queries.)
>
> Best to start with a basic proof of concept implementation to get your
> feet wet and learn the ins and outs before making a full commitment.
>
> Is this a Java app? The Java Driver is where you need to get started in
> terms of ingesting and querying data. It's a bit more sophisticated than
> just a simple JDBC interface. Most of your queries will need to be
> rewritten anyway even though the CQL syntax does indeed look a lot like
> SQL, but much of that will be because your data model will need to be made
> NoSQL-compatible.
>
> That should get you started.
>
>
> -- Jack Krupansky
>
> On Tue, Jan 5, 2016 at 10:52 AM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>
>> I understand, Ravi,  we have our application layers well defined. The
>> major changes will be in database access layers and entities will be
>> changed. Schema will be modified to tune the efficiency of the data store
>> chosen.
>>
>> We have been using mongo as a cache for a long time now, but as its a
>> document store and since we have a crisp well defined schema we chose to go
>> with a columnar database.
>>
>> Our data size has been growing very rapidly. Currently it is 200GB with
>> indexes, in couple of years it will grow up to approx 5 TB. And we may need
>> to run procedures to aggregate data and update tables.
>>
>> On Tue, Jan 5, 2016 at 6:54 PM, Ravi Krishna <sravikrish...@gmail.com>
>> wrote:
>>
>>> You are moving from a SQL database to C* ??? I hope you are aware of the
>>> differences between a nosql like C* and a RDBMS. To keep it short, the app
>>> has to change significantly.
>>>
>>> Please read documentation on differences between nosql and RDBMS.
>>>
>>> thanks.
>>>
>>> On Tue, Jan 5, 2016 at 6:

Re: Requesting some details for my use case

2016-01-05 Thread Bhuvan Rawal
Thanks for pointing out the typo Jonathan. Our use case is of Column
Family. :)

On Wed, Jan 6, 2016 at 2:38 AM, Jonathan Haddad <j...@jonhaddad.com> wrote:

> Sorry to nitpick, but Cassandra is not a columnar database.  If you're
> looking for columnar because you have an analytics need, Cassandra is not
> what you want.  If you've just made the same mistake that 99% of people
> make, well, now you know.  Cassandra historically has been referred to as a
> "Column Family" data store, which is easily mistaken for columnar.
>
>
> On Tue, Jan 5, 2016 at 3:21 AM Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>
>> Hi All,
>>
>> Im planning to shift from SQL database to a columnar nosql database, we
>> have streamlined our choices to Cassandra and HBase. I would really
>> appreciate if someone decent experience with both give me a honest
>> comparison on below parameters (links to neutral benchmarks/blogs also
>> appreciated):
>>
>> 1. Data Consistency (Eventual consistency allowed but define "eventual")
>> 2. Ease of Scaling Up
>> 3. Managebility
>> 4. Failure Recovery options
>> 5. Secondary Indexing
>> 6. Data Aggregation
>> 7. Query Language (3rd party wrapper solutions also allowed)
>> 8. Security
>> 9. *Commercial Support for quick solutions to issues*.
>> 10. Run batch job on data like map reduce or some common aggregation
>> functions using row scan. Any other packages for cassandra to achieve this?
>> 11. Trigger specific updates on tables used for secondary index.
>> 12. Please consider that our DB will be the source of truth, with no
>> specific requirement of immediate data consistency amongst nodes.
>>
>> Regards,
>> Bhuvan Rawal
>> SDE
>>
>