Re: Repairs on 2.1.12

2017-05-10 Thread kurt greaves
never seen a repair loop, seems very unlikely. when you say "on a ring"
what do you mean? what arguments are you passing to repair?

On 10 May 2017 03:22, "Mark Furlong"  wrote:

I have a large cluster running a -dc repair on a ring which has been
running for nearly two weeks. When I review the logs I can see where my
tables are reporting as ‘fully synced’ multiple times. I’m looking for some
information to help me confirm that my repair is not looping and is running
properly.



*Mark Furlong*

Sr. Database Administrator

*mfurl...@ancestry.com *
M: 801-859-7427 <(801)%20859-7427>

O: 801-705-7115 <(801)%20705-7115>

1300 W Traverse Pkwy

Lehi, UT 84043





​[image: http://c.mfcreative.com/mars/email/shared-icon/sig-logo.gif]


Difference between yum and git

2017-05-10 Thread Yuji Ito
Hi all,

I'm trying a simple performance test.
The test requests select operations (CL.SERIAL or CL.QUORUM) by increasing
the number of threads.
There is the difference of the performance between C* installed by yum and
C* which I built by myself.
What causes the difference?

I use C* 2.2.8.
One of them was installed by yum (# yum install cassandra22).
Another was acquired by git from https://github.com/apache/
cassandra/tree/cassandra-2.2.8 and built it by myself.
I changed cassandra.yaml to set `commitlog_sync: batch` and
`commitlog_sync_batch_window_in_ms: 2`.

My environment:
- a cluster has 3 nodes
- node: AWS EC2 m4.large with 200 IOPS EBS volume
- Replication Factor: 3
- 1 rows

Result:
** yum
 select (CL.SERIAL) 
threads  operations/sec
1   188
2   156
4   434
8   396
16  837
32  1176
64  2206
128 4115
256 7272

** git
 select (CL.SERIAL) 
threads  operations/sec
1   192
2   162
4   264
8   446
16  733
32  1114
64  1715
128 2776
256 3920

** yum
 select (CL.QUORUM) 
threads  operations/sec
1   434
2   909
4   1481
8   1904
16  2666
32  3106
64  3555
128 5000
256 9014

** git
 select (CL.QUORUM) 
threads  operations/sec
1   666
2   1538
4   2500
8   
16  4210
32  5333
64  6597
128 7356
256 8075

Thanks,
Yuji


Re: Difference between yum and git

2017-05-10 Thread Yuji Ito
Hi Joaquin,

> Were both tests run from the same machine at close the same time?
Yes. I run the both tests within 30 min.
I retried them today. The result was the same as yesterday.

The test run on the same instances and the same Java.

Thanks,
Yuji


On Thu, May 11, 2017 at 3:27 AM, Joaquin Casares 
wrote:

> Hi Yuji,
>
> Were both tests run from the same machine at close the same time? If not,
> noisy neighbors may be affecting your performance on different AWS
> instances.
>
> You should verify that you're using the same version of Java during both
> tests.
>
> Also, ensure that you're using the same test instance (that is not running
> Cassandra) to connect to both Cassandra clusters.
>
> Cheers,
>
> Joaquin
>
> Joaquin Casares
> Consultant
> Austin, TX
>
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> On Wed, May 10, 2017 at 5:01 AM, Yuji Ito  wrote:
>
>> Hi all,
>>
>> I'm trying a simple performance test.
>> The test requests select operations (CL.SERIAL or CL.QUORUM) by
>> increasing the number of threads.
>> There is the difference of the performance between C* installed by yum
>> and C* which I built by myself.
>> What causes the difference?
>>
>> I use C* 2.2.8.
>> One of them was installed by yum (# yum install cassandra22).
>> Another was acquired by git from https://github.com/apache/cass
>> andra/tree/cassandra-2.2.8 and built it by myself.
>> I changed cassandra.yaml to set `commitlog_sync: batch` and
>> `commitlog_sync_batch_window_in_ms: 2`.
>>
>> My environment:
>> - a cluster has 3 nodes
>> - node: AWS EC2 m4.large with 200 IOPS EBS volume
>> - Replication Factor: 3
>> - 1 rows
>>
>> Result:
>> ** yum
>>  select (CL.SERIAL) 
>> threads  operations/sec
>> 1   188
>> 2   156
>> 4   434
>> 8   396
>> 16  837
>> 32  1176
>> 64  2206
>> 128 4115
>> 256 7272
>>
>> ** git
>>  select (CL.SERIAL) 
>> threads  operations/sec
>> 1   192
>> 2   162
>> 4   264
>> 8   446
>> 16  733
>> 32  1114
>> 64  1715
>> 128 2776
>> 256 3920
>>
>> ** yum
>>  select (CL.QUORUM) 
>> threads  operations/sec
>> 1   434
>> 2   909
>> 4   1481
>> 8   1904
>> 16  2666
>> 32  3106
>> 64  3555
>> 128 5000
>> 256 9014
>>
>> ** git
>>  select (CL.QUORUM) 
>> threads  operations/sec
>> 1   666
>> 2   1538
>> 4   2500
>> 8   
>> 16  4210
>> 32  5333
>> 64  6597
>> 128 7356
>> 256 8075
>>
>> Thanks,
>> Yuji
>>
>>
>


cassandra 3.10

2017-05-10 Thread Gopal, Dhruva
Hi –
  We’re currently on 3.9 and have been told that Cassandra 3.10 is a more 
stable version to be on. We’ve been using the datastax-ddc rpms in our 
production and dev environments (on 3.9) and it appears there is no 3.10 rpm 
version out yet. We tried to build our own rpm (our devops processes use rpms, 
so changing to using tarballs is not easily done) and found that the build 
process fails (to do with the byteman-3.0.3 jar) that we manage to patch and 
get working (with rpmbuild). My concerns/questions are these:

-  Is the 3.10 version actually stable enough given that the build 
failed (we obtained the source from this location: 
http://apache.mirrors.tds.net/cassandra/3.10/apache-cassandra-3.10-src.tar.gz) 
and used the attached patch file for byteman during the build process)?

-  Are there any other issues with the binaries that we need to be 
aware of (other patches)?

I’m concerned that there may be other issues and that we really won’t know 
since we’re not Cassandra experts, so looking for feedback from this group on 
whether we should just stay with 3.9 or if it’s safe to proceed with this 
approach. I can share the spec file and patch files that we’ve setup for the 
build process, if desired.


Regards,
DHRUVA GOPAL
sr. MANAGER, ENGINEERING
REPORTING, ANALYTICS AND BIG DATA
+1 408.325.2011 WORK
+1 408.219.1094 MOBILE
UNITED STATES
dhruva.go...@aspect.com
aspect.com
[escription: http://webapp2.aspect.com/EmailSigLogo-rev.jpg]

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


cassandra-3.10-build.patch
Description: cassandra-3.10-build.patch

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Difference between yum and git

2017-05-10 Thread Jonathan Haddad
Where are you getting Cassandra 2.2 built from yum?
On Wed, May 10, 2017 at 9:54 PM Yuji Ito  wrote:

> Hi Joaquin,
>
> > Were both tests run from the same machine at close the same time?
> Yes. I run the both tests within 30 min.
> I retried them today. The result was the same as yesterday.
>
> The test run on the same instances and the same Java.
>
> Thanks,
> Yuji
>
>
> On Thu, May 11, 2017 at 3:27 AM, Joaquin Casares <
> joaq...@thelastpickle.com> wrote:
>
>> Hi Yuji,
>>
>> Were both tests run from the same machine at close the same time? If not,
>> noisy neighbors may be affecting your performance on different AWS
>> instances.
>>
>> You should verify that you're using the same version of Java during both
>> tests.
>>
>> Also, ensure that you're using the same test instance (that is not
>> running Cassandra) to connect to both Cassandra clusters.
>>
>> Cheers,
>>
>> Joaquin
>>
>> Joaquin Casares
>> Consultant
>> Austin, TX
>>
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>> On Wed, May 10, 2017 at 5:01 AM, Yuji Ito  wrote:
>>
>>> Hi all,
>>>
>>> I'm trying a simple performance test.
>>> The test requests select operations (CL.SERIAL or CL.QUORUM) by
>>> increasing the number of threads.
>>> There is the difference of the performance between C* installed by yum
>>> and C* which I built by myself.
>>> What causes the difference?
>>>
>>> I use C* 2.2.8.
>>> One of them was installed by yum (# yum install cassandra22).
>>> Another was acquired by git from
>>> https://github.com/apache/cassandra/tree/cassandra-2.2.8 and built it
>>> by myself.
>>> I changed cassandra.yaml to set `commitlog_sync: batch` and
>>> `commitlog_sync_batch_window_in_ms: 2`.
>>>
>>> My environment:
>>> - a cluster has 3 nodes
>>> - node: AWS EC2 m4.large with 200 IOPS EBS volume
>>> - Replication Factor: 3
>>> - 1 rows
>>>
>>> Result:
>>> ** yum
>>>  select (CL.SERIAL) 
>>> threads  operations/sec
>>> 1   188
>>> 2   156
>>> 4   434
>>> 8   396
>>> 16  837
>>> 32  1176
>>> 64  2206
>>> 128 4115
>>> 256 7272
>>>
>>> ** git
>>>  select (CL.SERIAL) 
>>> threads  operations/sec
>>> 1   192
>>> 2   162
>>> 4   264
>>> 8   446
>>> 16  733
>>> 32  1114
>>> 64  1715
>>> 128 2776
>>> 256 3920
>>>
>>> ** yum
>>>  select (CL.QUORUM) 
>>> threads  operations/sec
>>> 1   434
>>> 2   909
>>> 4   1481
>>> 8   1904
>>> 16  2666
>>> 32  3106
>>> 64  3555
>>> 128 5000
>>> 256 9014
>>>
>>> ** git
>>>  select (CL.QUORUM) 
>>> threads  operations/sec
>>> 1   666
>>> 2   1538
>>> 4   2500
>>> 8   
>>> 16  4210
>>> 32  5333
>>> 64  6597
>>> 128 7356
>>> 256 8075
>>>
>>> Thanks,
>>> Yuji
>>>
>>>
>>
>


Re: Node containing all data of the cluster

2017-05-10 Thread Varun Gupta
Hi Igor,

You can setup cluster with configuration as below.

Replication: DC1: 3 and DC2: 1.

If you are using datastax java driver, then use dcaware load balancing
policy and pass DC1, as input. As well as add DC2 node in ignore nodes, so
request never goes to that node.

Thanks,
Varun

On Wed, May 10, 2017 at 1:21 PM, Igor Leão  wrote:

> Hey everyone,
>
> Imagine a have Cassandra cluster with 4 nodes.
>
> Is it possible to have a separate node which would not receive requests
> but would by in sync with the rest of the cluster? Ideally this super node
> would have all data of the cluster.
>
> I want to take a snapshot of this node from time to time in order to
> reproduce scenarios that are happening in production.
>
> Thanks in advance!
>
>
>
>
>
>
>
>


Nodetool cleanup doesn't work

2017-05-10 Thread Jai Bheemsen Rao Dhanwada
Hello,

I am running into an issue where *nodetool cleanup *fails to cleanup data.
We are running 2.1.16 version of Cassandra.


[user@host ~]$ nodetool cleanup
Aborted cleaning up atleast one column family in keyspace user, check
server logs for more information.
Aborted cleaning up atleast one column family in keyspace org, check server
logs for more information.
error: nodetool failed, check server logs
-- StackTrace --
java.lang.RuntimeException: nodetool failed, check server logs
at
org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:294)
at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:206)

*Logs:*

INFO  [RMI TCP Connection(17)-x.x.x.x] 2017-05-05 04:04:07,987
CompactionManager.java:415 - Cleanup cannot run before a node has joined
the ring
INFO  [RMI TCP Connection(17)-x.x.x.x] 2017-05-05 04:04:08,010
CompactionManager.java:415 - Cleanup cannot run before a node has joined
the ring

All the nodes in the cluster are up and running. We tried doing a rolling
restart of all nodes and no luck.

After looking at the Cassandra JIRA :
https://issues.apache.org/jira/browse/CASSANDRA-10991 looks like the issue
is fixed with 2.2.6 and 3.0 version.
While we have plans to upgrade to the latest versions(which might take
longer time), does any know if there is any work around to mitigate the
issue?


DSE 5.0 Upgrade

2017-05-10 Thread cass savy
Team,

1. What is the stable version for DSE 5.0 to upgrade from DSE 4.8.x?

2. Is anybody switched to using DSE Unified auth model which enfores to use
one auth policy as primary and other secondary?

3. Do I need to use multi-auth/DSE Unified auth for me upgrade to DSE 5.0
or higher?
Our old clusters are using internal authentication and we learnt that
we have to go with DSE Unifiedauth model as part of upgrade?

4. Does anybody migrate from Java driver to Datastax enterprise Java driver
1.1 or 1.2 recently which will be only driver that will support DSE 5.0 and
above and new auth policies added by DSE?

Or

Has anybody made Java driver version 3.2 work with DSE 5.0 or 5.1?

4. For AWS, what is prod recommended AMI with CentOS and for DSE 5.x
versions?


Node containing all data of the cluster

2017-05-10 Thread Igor Leão
Hey everyone,

Imagine a have Cassandra cluster with 4 nodes.

Is it possible to have a separate node which would not receive requests but
would by in sync with the rest of the cluster? Ideally this super node
would have all data of the cluster.

I want to take a snapshot of this node from time to time in order to
reproduce scenarios that are happening in production.

Thanks in advance!


Re: Difference between yum and git

2017-05-10 Thread Joaquin Casares
Hi Yuji,

Were both tests run from the same machine at close the same time? If not,
noisy neighbors may be affecting your performance on different AWS
instances.

You should verify that you're using the same version of Java during both
tests.

Also, ensure that you're using the same test instance (that is not running
Cassandra) to connect to both Cassandra clusters.

Cheers,

Joaquin

Joaquin Casares
Consultant
Austin, TX

Apache Cassandra Consulting
http://www.thelastpickle.com

On Wed, May 10, 2017 at 5:01 AM, Yuji Ito  wrote:

> Hi all,
>
> I'm trying a simple performance test.
> The test requests select operations (CL.SERIAL or CL.QUORUM) by increasing
> the number of threads.
> There is the difference of the performance between C* installed by yum and
> C* which I built by myself.
> What causes the difference?
>
> I use C* 2.2.8.
> One of them was installed by yum (# yum install cassandra22).
> Another was acquired by git from https://github.com/apache/cass
> andra/tree/cassandra-2.2.8 and built it by myself.
> I changed cassandra.yaml to set `commitlog_sync: batch` and
> `commitlog_sync_batch_window_in_ms: 2`.
>
> My environment:
> - a cluster has 3 nodes
> - node: AWS EC2 m4.large with 200 IOPS EBS volume
> - Replication Factor: 3
> - 1 rows
>
> Result:
> ** yum
>  select (CL.SERIAL) 
> threads  operations/sec
> 1   188
> 2   156
> 4   434
> 8   396
> 16  837
> 32  1176
> 64  2206
> 128 4115
> 256 7272
>
> ** git
>  select (CL.SERIAL) 
> threads  operations/sec
> 1   192
> 2   162
> 4   264
> 8   446
> 16  733
> 32  1114
> 64  1715
> 128 2776
> 256 3920
>
> ** yum
>  select (CL.QUORUM) 
> threads  operations/sec
> 1   434
> 2   909
> 4   1481
> 8   1904
> 16  2666
> 32  3106
> 64  3555
> 128 5000
> 256 9014
>
> ** git
>  select (CL.QUORUM) 
> threads  operations/sec
> 1   666
> 2   1538
> 4   2500
> 8   
> 16  4210
> 32  5333
> 64  6597
> 128 7356
> 256 8075
>
> Thanks,
> Yuji
>
>


RE: NoSE: Automated schema design for Cassandra

2017-05-10 Thread Jacques-Henri Berthemet
Hi,

This is interesting, I’d just advise to put full examples and more 
documentation on how to use it (the articles are a bit too detailed).
Also, you should not mention “column families” but just tables.

Was this used to generate a schema used for production?
Do you think it’s possible to generate test code to validate the workload?

--
Jacques-Henri Berthemet

From: michael.m...@gmail.com [mailto:michael.m...@gmail.com] On Behalf Of 
Michael Mior
Sent: mardi 9 mai 2017 17:30
To: user 
Subject: NoSE: Automated schema design for Cassandra

Hi all,

I wanted to share a tool I've been working on that tries to help automate the 
schema design process for Cassandra. The short description is that you provide 
information on the kind of data you want to store and the queries and updates 
you want to issue, and NoSE will perform a cost-based analysis to suggest an 
optimal schema.

There's lots of room for improvement and many Cassandra features which are not 
currently supported, but hopefully some in the community may still find it 
useful as a starting point.

Link to more details and the source code below:

https://michael.mior.ca/projects/nose/

If you're interested in trying it out, don't hesitate to reach out and I'm 
happy to help!

Cheers,
--
Michael Mior
mm...@uwaterloo.ca