Re: reducing disk space consumption

2016-02-11 Thread Romain Hardouin
As Mohammed said "nodetool clearsnaphost" will do the trick.
Cassandra takes a snapshot by default before keyspace/table dropping or 
truncation.
You can disable this feature if it's a dev node (see auto_snapshot in 
cassandra.yaml) but if it's a production node is a good thing to keep auto 
snapshots.

Best,

Romain


Re: Debugging write timeouts on Cassandra 2.2.5

2016-02-11 Thread Fabrice Facorat
Are your commitlog and data on the same disk ? If yes, you should put
commitlogs on a separate disk which don't have a lot of IO.

Others IO may have great impact impact on your commitlog writing and
it may even block.

An example of impact IO may have, even for Async writes:
https://engineering.linkedin.com/blog/2016/02/eliminating-large-jvm-gc-pauses-caused-by-background-io-traffic

2016-02-11 0:31 GMT+01:00 Mike Heffner :
> Jeff,
>
> We have both commitlog and data on a 4TB EBS with 10k IOPS.
>
> Mike
>
> On Wed, Feb 10, 2016 at 5:28 PM, Jeff Jirsa 
> wrote:
>>
>> What disk size are you using?
>>
>>
>>
>> From: Mike Heffner
>> Reply-To: "user@cassandra.apache.org"
>> Date: Wednesday, February 10, 2016 at 2:24 PM
>> To: "user@cassandra.apache.org"
>> Cc: Peter Norton
>> Subject: Re: Debugging write timeouts on Cassandra 2.2.5
>>
>> Paulo,
>>
>> Thanks for the suggestion, we ran some tests against CMS and saw the same
>> timeouts. On that note though, we are going to try doubling the instance
>> sizes and testing with double the heap (even though current usage is low).
>>
>> Mike
>>
>> On Wed, Feb 10, 2016 at 3:40 PM, Paulo Motta 
>> wrote:
>>>
>>> Are you using the same GC settings as the staging 2.0 cluster? If not,
>>> could you try using the default GC settings (CMS) and see if that changes
>>> anything? This is just a wild guess, but there were reports before of
>>> G1-caused instabilities with small heap sizes (< 16GB - see CASSANDRA-10403
>>> for more context). Please ignore if you already tried reverting back to CMS.
>>>
>>> 2016-02-10 16:51 GMT-03:00 Mike Heffner :

 Hi all,

 We've recently embarked on a project to update our Cassandra
 infrastructure running on EC2. We are long time users of 2.0.x and are
 testing out a move to version 2.2.5 running on VPC with EBS. Our test setup
 is a 3 node, RF=3 cluster supporting a small write load (mirror of our
 staging load).

 We are writing at QUORUM and while p95's look good compared to our
 staging 2.0.x cluster, we are seeing frequent write operations that time 
 out
 at the max write_request_timeout_in_ms (10 seconds). CPU across the cluster
 is < 10% and EBS write load is < 100 IOPS. Cassandra is running with the
 Oracle JDK 8u60 and we're using G1GC and any GC pauses are less than 500ms.

 We run on c4.2xl instances with GP2 EBS attached storage for data and
 commitlog directories. The nodes are using EC2 enhanced networking and have
 the latest Intel network driver module. We are running on HVM instances
 using Ubuntu 14.04.2.

 Our schema is 5 tables, all with COMPACT STORAGE. Each table is similar
 to the definition here:
 https://gist.github.com/mheffner/4d80f6b53ccaa24cc20a

 This is our cassandra.yaml:
 https://gist.github.com/mheffner/fea80e6e939dd483f94f#file-cassandra-yaml

 Like I mentioned we use 8u60 with G1GC and have used many of the GC
 settings in Al Tobey's tuning guide. This is our upstart config with JVM 
 and
 other CPU settings: https://gist.github.com/mheffner/dc44613620b25c4fa46d

 We've used several of the sysctl settings from Al's guide as well:
 https://gist.github.com/mheffner/ea40d58f58a517028152

 Our client application is able to write using either Thrift batches
 using Asytanax driver or CQL async INSERT's using the Datastax Java driver.

 For testing against Thrift (our legacy infra uses this) we write batches
 of anywhere from 6 to 1500 rows at a time. Our p99 for batch execution is
 around 45ms but our maximum (p100) sits less than 150ms except when it
 periodically spikes to the full 10seconds.

 Testing the same write path using CQL writes instead demonstrates
 similar behavior. Low p99s except for periodic full timeouts. We enabled
 tracing for several operations but were unable to get a trace that 
 completed
 successfully -- Cassandra started logging many messages as:

 INFO  [ScheduledTasks:1] - MessagingService.java:946 - _TRACE messages
 were dropped in last 5000 ms: 52499 for internal timeout and 0 for cross
 node timeout

 And all the traces contained rows with a "null" source_elapsed row:
 https://gist.githubusercontent.com/mheffner/1d68a70449bd6688a010/raw/0327d7d3d94c3a93af02b64212e3b7e7d8f2911b/trace.out


 We've exhausted as many configuration option permutations that we can
 think of. This cluster does not appear to be under any significant load and
 latencies seem to largely fall in two bands: low normal or max timeout. 
 This
 seems to imply that something is getting stuck and timing out at the max
 write timeout.

 Any suggestions on what to look for? We had debug enabled for awhile but
 we didn't see any msg that pointed to something 

Re: Keyspaces not found in cqlsh

2016-02-11 Thread Sebastian Estevez
If its a tarball then root should be fine but there were some files owned
by the Cassandra user so you may want to chown those back to root.

I haven't seen your exact issue before but you have two schema versions
from your describe cluster so a rolling restart should help.

all the best,

Sebastián
On Feb 11, 2016 9:28 AM, "kedar"  wrote:

> Thanks Sebastian,
>
> Cassandra installation in our case is simply an untar.
>
> Cassandra is started using supervisord and user as root, would you still
> recommend I try using Cassandra user.
>
>  ./nodetool describecluster
> Cluster Information:
> Name: Test Cluster
> Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
> Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
> Schema versions:
> cd361577-6947-3390-a787-be28fe499787: [Ip1]
>
> 9f5b5675-c9e7-3ae3-8ad6-6654fa4fb3e7: [Ip2]
>
> Interestingly
> ./nodetool cfstats shows all the tables
>
> Thanks,
> Kedar Parikh
>
> On Thursday 11 February 2016 07:34 PM, Sebastian Estevez wrote:
>
> Keep this on. The user list, it's not appropriate for the dev list.
>
> 1) I noticed that some of your files are owned by root and others by
> Cassandra. If this is a package install you should always start C* as a
> service and chown your files and directories so they are owned by the
> Cassandra user, not root.  Never start Cassandra directly as root.
>
> 2) Once you have fixed your file ownerships, restart Cassandra on each
> node one at a time. You should see your sstables and commitlog get picked
> up by Cassandra in the the system.log on startup. Share the output of
> 'nodetool describecluster' before and after.
>
> all the best,
>
> Sebastián
> On Feb 11, 2016 6:30 AM, "kedar"  wrote:
>
>> Thanks,
>>
>> kindly refer the following:
>>
>> https://gist.github.com/anonymous/3dddbe728a52c07d7c52
>> https://gist.github.com/anonymous/302ade0875dd6410087b
>>
>> Thanks,
>> Kedar Parikh
>>
>> 
>>
>> On Thursday 11 February 2016 04:35 PM, Romain Hardouin wrote:
>>
>>> Would you mind pasting the ouput for both nodes in gist/paste/whatever?
>>> https://gist.github.com http://paste.debian.net
>>>
>>>
>>>
>>> Le Jeudi 11 février 2016 11h57, kedar  a
>>> écrit :
>>> Thanks for the reply.
>>>
>>> ls -l cassandra/data/* lists various *.db files
>>>
>>> This problem is on both nodes.
>>>
>>> Thanks,
>>> Kedar Parikh
>>>
>>> 
>>>
>>>
>>>
>>>
>>
>>
>>
>


Re: Keyspaces not found in cqlsh

2016-02-11 Thread Ryan Svihla
Kedar,

I recommend asking the user list user@cassandra.apache.org this list is for the 
development of cassandra and you're more likely to find someone on the user 
list who may have hit this issue.

Curious issue though I haven't seen that myself.

Regards,
Ryan Svihla

> On Feb 11, 2016, at 7:56 AM, kedar  wrote:
> 
> Dev Team,
> 
> Need some help with a burning cqlsh issue
> 
> I am using cqlsh 5.0.1 | Cassandra 2.1.2, recently we are unable to see
> / desc keyspaces and query tables through cqlsh on either of the two nodes
> 
> cqlsh> desc keyspaces
> 
> 
> 
> cqlsh> use user_index;
> cqlsh:user_index> desc table list_1_10;
> 
> Keyspace 'user_index' not found.
> cqlsh:user_index>
> cqlsh>  select * from system.schema_keyspaces;
> Keyspace 'system' not found.
> cqlsh>
> We are running a 2 node cluster. The Python - Django app that inserts
> data is running without any failure and system logs show nothing abnormal.
> 
> ./nodetool repair on one node hasn't helped ./nodetool cfstats shows all
> the tables too
> 
> ls -l cassandra/data/*  on each node:
> 
> https://gist.github.com/anonymous/3dddbe728a52c07d7c52
> https://gist.github.com/anonymous/302ade0875dd6410087b
> 
> 
> 
> 
> --
> Thanks,
> Kedar Parikh
> 
> 
> 
> 
> 
> 
> 
> 
> 


Re: stefania.alborghe...@datastax.com

2016-02-11 Thread Sebastian Estevez
The monitoring UI is called DataStax OpsCenter and it has its own install
process.

Check out our documentation on the subject:

http://docs.datastax.com/en/opscenter/5.2/opsc/install/opscInstallOpsc_g.html

all the best,

Sebastián
On Feb 9, 2016 8:01 PM, "Ted Yu"  wrote:

> Hi,
> I am using DSE 4.8.4
> Here are the ports Cassandra daemon listens on:
>
> tcp0  0 xx.yy:9042  0.0.0.0:*
> LISTEN  30773/java
> tcp0  0 127.0.0.1:56498 0.0.0.0:*
>   LISTEN  30773/java
> tcp0  0 xx.yy:7000  0.0.0.0:*
> LISTEN  30773/java
> tcp0  0 127.0.0.1:7199  0.0.0.0:*
>   LISTEN  30773/java
> tcp0  0 xx.yy:9160  0.0.0.0:*
> LISTEN  30773/java
>
> Can you tell me how I can get to the DSE monitoring UI ?
>
> Thanks
>


Re: Security labels

2016-02-11 Thread oleg yusim
Hi Dani,

As promised, I sort of put all my questions under the "one roof". I would
really appreciate you opinion on them.

https://drive.google.com/open?id=0B2L9nW4Cyj41YWd1UkI4ZXVPYmM

Thanks,

Oleg

On Fri, Jan 29, 2016 at 3:28 PM, Dani Traphagen  wrote:

> ​Hi Oleg,
>
> Thanks that helped clear things up! This sounds like a daunting task. I
> wish you all the best with it.
>
> Cheers,
> Dani​
>
> On Fri, Jan 29, 2016 at 10:03 AM, oleg yusim  wrote:
>
>> Dani,
>>
>> I really appreciate you response. Actually, session timeouts and security
>> labels are two different topics (first is about attack when somebody
>> opened, say, ssh window to DB, left his machine unattended and somebody
>> else stole his session, second - to enable DB to support what called MAC
>> access model - stays for mandatory access control. It is widely used in the
>> government and military, but not outside of it, we all are used to DAC
>> access control model). However, I think you are right and I should move all
>> my queries under the one big roof and call this thread "Security". I will
>> do this today.
>>
>> Now, about what you have said, I just answered the same to Jon, in
>> Session Timeout thread, but would quickly re-cap here. I understand that
>> Cassandra's architecture was aimed and tailored for completely different
>> type of scenario. However, unfortunately, that doesn't mean that Cassandra
>> is not vulnerable to the same very set of attacks relational database would
>> be vulnerable to. It just means Cassandra is not protected against those
>> attacks, because protection against them was not thought of, when database
>> was created. I already gave the AAA and session's timeout example in Jon's
>> thread, and those are just one of many.
>>
>> Now what I'm trying to do, I'm trying to create a STIG - security federal
>> compliance document, which will assess Cassandra against SRG concepts
>> (security federal compliance recommendations for databases overall) and
>> will highlight what is not met, and can't be in current design (i.e. what
>> system architects should keep in mind and what they need to compensate for
>> with other controls on different layers of system model) and  what can be
>> met either with configuration or with little enhancement (and how).
>>
>> That document would be of great help for Cassandra as a product because
>> it would allow it to be marketed as a product with existing security
>> assessment and guidelines, performed according to DoD standards. It would
>> also allow to move product in the general direction of improving its
>> security posture. Finally, the document would be posted on DISA site (
>> http://iase.disa.mil/stigs/Pages/a-z.aspx) available for every security
>> architect to utilize, which would greatly reduce the risk for Cassandra
>> product to be hacked in a field.
>>
>> To clear things out - what I ask about are not my expectations. I really
>> do not expect developers of Cassandra to run and start implementing
>> security labels, just because I asked about it. :) My questions are to
>> build my internal knowledge of DB current design, so that I can build my
>> security assessment based of it, not more, not less.
>>
>> I guess, summarizing what I said on top, from what I'm doing Cassandra as
>> a product would end up benefiting quite a bit. That is why I think it would
>> make sense for Cassandra community to help me with my questions even if
>> they sound completely of the traditional "grid".
>>
>> Thanks again, I really appreciate your response and conversation overall.
>>
>> Oleg
>>
>> On Fri, Jan 29, 2016 at 11:20 AM, Dani Traphagen <
>> dani.trapha...@datastax.com> wrote:
>>
>>> Also -- it looks like you're really asking questions about session
>>> timeouts and security labels as they associate, would be more helpful to
>>> keep in one thread. :)
>>>
>>>
>>> On Friday, January 29, 2016, Dani Traphagen 
>>> wrote:
>>>
 Hi Oleg,

 I understand your frustration but unfortunately, in the terms of your
 security assessment, you have fallen into a mismatch for Cassandra's
 utility.

 The eventuality of having multiple sockets open without the query input
 for long durations of time isn't something that was
 architected...because...Cassnadra was built to take massive quantities
 of queries both in volume and velocity.

 Your expectation of the database isn't in line with how our why it was
 designed. Generally, security solutions are architected
 around Cassandra, baked into the data model, many solutions
 are home-brewed, written into the application or provided by using another
 security client.

 DSE has different security aspects rolling out in the next release
 as addressed earlier by Jack, like commit log and hint encryptions, as well
 as, unified authentication...but secuirty labels aren't on anyone's radar
 

Re: Security labels

2016-02-11 Thread Dani Traphagen
Hi Oleg,

I'm happy to take a look. Will update after review.

Thanks,
Dani

On Thu, Feb 11, 2016 at 12:23 PM, oleg yusim  wrote:

> Hi Dani,
>
> As promised, I sort of put all my questions under the "one roof". I would
> really appreciate you opinion on them.
>
> https://drive.google.com/open?id=0B2L9nW4Cyj41YWd1UkI4ZXVPYmM
>
> Thanks,
>
> Oleg
>
> On Fri, Jan 29, 2016 at 3:28 PM, Dani Traphagen <
> dani.trapha...@datastax.com> wrote:
>
>> ​Hi Oleg,
>>
>> Thanks that helped clear things up! This sounds like a daunting task. I
>> wish you all the best with it.
>>
>> Cheers,
>> Dani​
>>
>> On Fri, Jan 29, 2016 at 10:03 AM, oleg yusim  wrote:
>>
>>> Dani,
>>>
>>> I really appreciate you response. Actually, session timeouts and
>>> security labels are two different topics (first is about attack when
>>> somebody opened, say, ssh window to DB, left his machine unattended and
>>> somebody else stole his session, second - to enable DB to support what
>>> called MAC access model - stays for mandatory access control. It is widely
>>> used in the government and military, but not outside of it, we all are used
>>> to DAC access control model). However, I think you are right and I should
>>> move all my queries under the one big roof and call this thread "Security".
>>> I will do this today.
>>>
>>> Now, about what you have said, I just answered the same to Jon, in
>>> Session Timeout thread, but would quickly re-cap here. I understand that
>>> Cassandra's architecture was aimed and tailored for completely different
>>> type of scenario. However, unfortunately, that doesn't mean that Cassandra
>>> is not vulnerable to the same very set of attacks relational database would
>>> be vulnerable to. It just means Cassandra is not protected against those
>>> attacks, because protection against them was not thought of, when database
>>> was created. I already gave the AAA and session's timeout example in Jon's
>>> thread, and those are just one of many.
>>>
>>> Now what I'm trying to do, I'm trying to create a STIG - security
>>> federal compliance document, which will assess Cassandra against SRG
>>> concepts (security federal compliance recommendations for databases
>>> overall) and will highlight what is not met, and can't be in current design
>>> (i.e. what system architects should keep in mind and what they need to
>>> compensate for with other controls on different layers of system model) and
>>>  what can be met either with configuration or with little enhancement (and
>>> how).
>>>
>>> That document would be of great help for Cassandra as a product because
>>> it would allow it to be marketed as a product with existing security
>>> assessment and guidelines, performed according to DoD standards. It would
>>> also allow to move product in the general direction of improving its
>>> security posture. Finally, the document would be posted on DISA site (
>>> http://iase.disa.mil/stigs/Pages/a-z.aspx) available for every security
>>> architect to utilize, which would greatly reduce the risk for Cassandra
>>> product to be hacked in a field.
>>>
>>> To clear things out - what I ask about are not my expectations. I really
>>> do not expect developers of Cassandra to run and start implementing
>>> security labels, just because I asked about it. :) My questions are to
>>> build my internal knowledge of DB current design, so that I can build my
>>> security assessment based of it, not more, not less.
>>>
>>> I guess, summarizing what I said on top, from what I'm doing Cassandra
>>> as a product would end up benefiting quite a bit. That is why I think it
>>> would make sense for Cassandra community to help me with my questions even
>>> if they sound completely of the traditional "grid".
>>>
>>> Thanks again, I really appreciate your response and conversation overall.
>>>
>>> Oleg
>>>
>>> On Fri, Jan 29, 2016 at 11:20 AM, Dani Traphagen <
>>> dani.trapha...@datastax.com> wrote:
>>>
 Also -- it looks like you're really asking questions about session
 timeouts and security labels as they associate, would be more helpful to
 keep in one thread. :)


 On Friday, January 29, 2016, Dani Traphagen <
 dani.trapha...@datastax.com> wrote:

> Hi Oleg,
>
> I understand your frustration but unfortunately, in the terms of your
> security assessment, you have fallen into a mismatch for Cassandra's
> utility.
>
> The eventuality of having multiple sockets open without the query
> input for long durations of time isn't something that was
> architected...because...Cassnadra was built to take massive quantities
> of queries both in volume and velocity.
>
> Your expectation of the database isn't in line with how our why it was
> designed. Generally, security solutions are architected
> around Cassandra, baked into the data model, many solutions
> are home-brewed, written into the application or provided by using 

Re: Session timeout

2016-02-11 Thread oleg yusim
Robert, Jack, Bryan,

As you suggested, I put together document, titled
Cassandra_Security_Topics_to_Discuss, put it on Google Drive and shared it
with everybody on this list. The document contains list of questions I have
on Cassandra, my take on it, and has a place for notes Community would like
to make on it.

Please, review. Any help would be appreciated greatly.

https://drive.google.com/open?id=0B2L9nW4Cyj41YWd1UkI4ZXVPYmM

Oleg

On Fri, Jan 29, 2016 at 6:30 PM, Bryan Cheng  wrote:

> To throw my (unsolicited) 2 cents into the ring, Oleg, you work for a
> well-funded and fairly large company. You are certainly free to continue
> using the list and asking for community support (I am definitely not in any
> position to tell you otherwise, anyway), but that community support is by
> definition ad-hoc and best effort. Furthermore, your questions range from
> trivial to, as Jonathan as mentioned earlier, concepts that many of us have
> no reason to consider at this time (perhaps your work will convince us
> otherwise- but you'll need to finish it first ;) )
>
> What I'm getting at here is that perhaps, if you need faster, deeper
> level, and more elaborate support than this list can provide, you should
> look into the services of a paid Cassandra support company like Datastax.
>
> On Fri, Jan 29, 2016 at 3:34 PM, Robert Coli  wrote:
>
>> On Fri, Jan 29, 2016 at 3:12 PM, Jack Krupansky > > wrote:
>>
>>> One last time, I'll simply renew my objection to the way you are abusing
>>> this list.
>>>
>>
>> FWIW, while I appreciate that OP (Oleg) is attempting to do a service for
>> the community, I agree that the flood of single topic, context-lacking
>> posts regarding deep internals of Cassandra is likely to inspire the
>> opposite of a helpful response.
>>
>> This is important work, however, so hopefully we can collectively find a
>> way through the meta and can discuss this topic without acrimony! :D
>>
>> =Rob
>>
>>
>
>


Security assessment of Cassandra

2016-02-11 Thread oleg yusim
Greetings,

Performing security assessment of Cassandra with the goal of generating
STIG for Cassandra (iase.disa.mil/stigs/Pages/a-z.aspx) I ran across some
questions regarding the way certain security features are implemented (or
not) in Cassandra.

I composed the list of questions on these topics, which I wasn't able to
find definitive answer to anywhere else and posted it here:

https://drive.google.com/open?id=0B2L9nW4Cyj41YWd1UkI4ZXVPYmM

It is shared with all the members of that list, and any of the members of
this list is welcome to comment on this document (there is a place for
community comments specially reserved near each of the questions and my
take on it).

I would greatly appreciate Cassandra community help here.

Thanks,

Oleg


Re: Increase compaction performance

2016-02-11 Thread Michał Łowicki
On Thu, Feb 11, 2016 at 5:38 PM, Alain RODRIGUEZ  wrote:

> Also, are you using incremental repairs (not sure about the available
> options in Spotify Reaper) what command did you run ?
>
>
No.


> 2016-02-11 17:33 GMT+01:00 Alain RODRIGUEZ :
>
>> CPU load is fine, SSD disks below 30% utilization, no long GC pauses
>>
>>
>>
>> What is your current compaction throughput ?  The current value of
>> 'concurrent_compactors' (cassandra.yaml or through JMX) ?
>>
>

Throughput was initially set to 1024 and I've gradually increased it to
2048, 4K and 16K but haven't seen any changes. Tried to change it both from
`nodetool` and also cassandra.yaml (with restart after changes).


>
>> nodetool getcompactionthroughput
>>
>> How to speed up compaction? Increased compaction throughput and
>>> concurrent compactors but no change. Seems there is plenty idle
>>> resources but can't force C* to use it.
>>>
>>
>> You might want to try un-throttle the compaction throughput through:
>>
>> nodetool setcompactionsthroughput 0
>>
>> Choose a canari node. Monitor compaction pending and disk throughput
>> (make sure server is ok too - CPU...)
>>
>

Yes, I'll try it out but if increasing it 16 times didn't help I'm a bit
sceptical about it.


>
>> Some other information could be useful:
>>
>> What is your number of cores per machine and the compaction strategies
>> for the 'most compacting' tables. What are write/update patterns, any TTL
>> or tombstones ? Do you use a high number of vnodes ?
>>
>
I'm using bare-metal box, 40CPU, 64GB, 2 SSD each. num_tokens is set to
256.

Using LCS for all tables. Write / update heavy. No warnings about large
number of tombstones but we're removing items frequently.



>
>> Also what is your repair routine and your values for gc_grace_seconds ?
>> When was your last repair and do you think your cluster is suffering of a
>> high entropy ?
>>
>
We're having problem with repair for months (CASSANDRA-9935).
gc_grace_seconds is set to 345600 now. Yes, as we haven't launched it
successfully for long time I guess cluster is suffering of high entropy.


>
>> You can lower the stream throughput to make sure nodes can cope with what
>> repairs are feeding them.
>>
>> nodetool getstreamthroughput
>> nodetool setstreamthroughput X
>>
>
Yes, this sounds interesting. As we're having problem with repair for
months it could that lots of things are transferred between nodes.

Thanks!


>
>> C*heers,
>>
>> -
>> Alain Rodriguez
>> France
>>
>> The Last Pickle
>> http://www.thelastpickle.com
>>
>> 2016-02-11 16:55 GMT+01:00 Michał Łowicki :
>>
>>> Hi,
>>>
>>> Using 2.1.12 across 3 DCs. Each DC has 8 nodes. Trying to run repair
>>> using Cassandra Reaper but nodes after couple of hours are full of pending
>>> compaction tasks (regular not the ones about validation)
>>>
>>> CPU load is fine, SSD disks below 30% utilization, no long GC pauses.
>>>
>>> How to speed up compaction? Increased compaction throughput and
>>> concurrent compactors but no change. Seems there is plenty idle
>>> resources but can't force C* to use it.
>>>
>>> Any clue where there might be a bottleneck?
>>>
>>>
>>> --
>>> BR,
>>> Michał Łowicki
>>>
>>>
>>
>


-- 
BR,
Michał Łowicki


Re: Rows with same key

2016-02-11 Thread Kai Wang
Are you supplying timestamps from the client side? Are clocks in sync cross
your nodes?


On Thu, Feb 11, 2016 at 11:52 AM, Yulian Oifa  wrote:

> Hello to all
> I have multiple rows with same id on one of cfs, one row is completely
> empty ,another one has vaues.
> Values are wrotten into new row , however they are retreived from old
> row...
> I guess one row is created due to removed values, and stucked somehow.
> I am trying to remove it with no luck ( compact , flush , repair , etc ).
> I have set gc grace to this CF , however i beleive the old row has old
> value.
> How can i get rid of this row?
> Best regards
> Yulian Oifa
>


Keyspaces not found in cqlsh

2016-02-11 Thread kedar
I am using cqlsh 5.0.1 | Cassandra 2.1.2, recently we are unable to see 
/ desc keyspaces and query tables through cqlsh on either of the two nodes


cqlsh> desc keyspaces



cqlsh> use user_index;
cqlsh:user_index> desc table list_1_10;

Keyspace 'user_index' not found.
cqlsh:user_index>
cqlsh>  select * from system.schema_keyspaces;
Keyspace 'system' not found.
cqlsh>
We are running a 2 node cluster. The Python - Django app that inserts 
data is running without any failure and system logs show nothing abnormal.


./nodetool repair on one node hasn't helped ./nodetool cfstats shows all 
the tables too


--
Thanks,
Kedar Parikh

Ext : 2224
Dir : +91 22 61782224
Mob : +91 9819634734
Email : kedar.par...@netcore.co.in
Web : www.netcore.co.in





Re: Keyspaces not found in cqlsh

2016-02-11 Thread kedar

Thanks for the reply.

ls -l cassandra/data/* lists various *.db files

This problem is on both nodes.

Thanks,
Kedar Parikh

Ext : 2224
Dir : +91 22 61782224
Mob : +91 9819634734
Email : kedar.par...@netcore.co.in
Web : www.netcore.co.in

On Thursday 11 February 2016 03:57 PM, Romain Hardouin wrote:

What is the output on both nodes of the following command?
ls -l /var/lib/cassandra/data/system/*
If one node seems odd you can try "nodetool resetlocalschema" but the other 
node must be in clean state.

Best,
Romain


Le Jeudi 11 février 2016 11h10, kedar  a écrit :
I am using cqlsh 5.0.1 | Cassandra 2.1.2, recently we are unable to see
/ desc keyspaces and query tables through cqlsh on either of the two nodes

cqlsh> desc keyspaces



cqlsh> use user_index;
cqlsh:user_index> desc table list_1_10;

Keyspace 'user_index' not found.
cqlsh:user_index>
cqlsh>  select * from system.schema_keyspaces;
Keyspace 'system' not found.
cqlsh>
We are running a 2 node cluster. The Python - Django app that inserts
data is running without any failure and system logs show nothing abnormal.

./nodetool repair on one node hasn't helped ./nodetool cfstats shows all
the tables too







Re: 3k sstables during a repair incremental !!

2016-02-11 Thread Jean Carlo
Hi !

@Paulo
.
Yes we are using vnodes, 256 per node and we have 6 nodes in the cluster.
RF=3. The data is inserted using jmeter with consistency=LOCAL_ONE.
Because this is a test, we generate our data and we insert them using
jmeter.

After the repair finished, all the nodes seems to be freezed, no compaction
were runing, even if nodetool tpstats says
CompactionExecutor
283   1424 0 0

After I restart cassandra, the comactions started to run.




Saludos

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay

On Thu, Feb 11, 2016 at 8:42 AM, Marcus Eriksson  wrote:

> The reason for this is probably
> https://issues.apache.org/jira/browse/CASSANDRA-10831 (which only affects
> 2.1)
>
> So, if you had problems with incremental repair and LCS before, upgrade to
> 2.1.13 and try again
>
> /Marcus
>
> On Wed, Feb 10, 2016 at 2:59 PM, horschi  wrote:
>
>> Hi Jean,
>>
>> we had the same issue, but on SizeTieredCompaction. During repair the
>> number of SSTables and pending compactions were exploding.
>>
>> It not only affected latencies, at some point Cassandra ran out of heap.
>>
>> After the upgrade to 2.2 things got much better.
>>
>> regards,
>> Christian
>>
>>
>> On Wed, Feb 10, 2016 at 2:46 PM, Jean Carlo 
>> wrote:
>> > Hi Horschi !!!
>> >
>> > I have the 2.1.12. But I think it is something related to Level
>> compaction
>> > strategy. It is impressive that we passed from 6 sstables to 3k sstable.
>> > I think this will affect the latency on production because the number of
>> > compactions going on
>> >
>> >
>> >
>> > Best regards
>> >
>> > Jean Carlo
>> >
>> > "The best way to predict the future is to invent it" Alan Kay
>> >
>> > On Wed, Feb 10, 2016 at 2:37 PM, horschi  wrote:
>> >>
>> >> Hi Jean,
>> >>
>> >> which Cassandra version do you use?
>> >>
>> >> Incremental repair got much better in 2.2 (for us at least).
>> >>
>> >> kind regards,
>> >> Christian
>> >>
>> >> On Wed, Feb 10, 2016 at 2:33 PM, Jean Carlo > >
>> >> wrote:
>> >> > Hello guys!
>> >> >
>> >> > I am testing the repair inc in my custer cassandra. I am doing my
>> test
>> >> > over
>> >> > these tables
>> >> >
>> >> > CREATE TABLE pns_nonreg_bench.cf3 (
>> >> > s text,
>> >> > sp int,
>> >> > d text,
>> >> > dp int,
>> >> > m map,
>> >> > t timestamp,
>> >> > PRIMARY KEY (s, sp, d, dp)
>> >> > ) WITH CLUSTERING ORDER BY (sp ASC, d ASC, dp ASC)
>> >> >
>> >> > AND compaction = {'class':
>> >> > 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
>> >> > AND compression = {'sstable_compression':
>> >> > 'org.apache.cassandra.io.compress.SnappyCompressor'}
>> >> >
>> >> > CREATE TABLE pns_nonreg_bench.cf1 (
>> >> > ise text PRIMARY KEY,
>> >> > int_col int,
>> >> > text_col text,
>> >> > ts_col timestamp,
>> >> > uuid_col uuid
>> >> > ) WITH bloom_filter_fp_chance = 0.01
>> >> >  AND compaction = {'class':
>> >> > 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
>> >> > AND compression = {'sstable_compression':
>> >> > 'org.apache.cassandra.io.compress.SnappyCompressor'}
>> >> >
>> >> > table cf1
>> >> > Space used (live): 665.7 MB
>> >> > table cf2
>> >> > Space used (live): 697.03 MB
>> >> >
>> >> > It happens that when I do repair -inc -par on theses tables, cf2 got
>> a
>> >> > pick
>> >> > of 3k sstables. When the repair finish, it takes 30 min or more to
>> >> > finish
>> >> > all the compactations and return to 6 sstable.
>> >> >
>> >> > I am a little concern about if this will happen on production. is it
>> >> > normal?
>> >> >
>> >> > Saludos
>> >> >
>> >> > Jean Carlo
>> >> >
>> >> > "The best way to predict the future is to invent it" Alan Kay
>> >
>> >
>>
>
>


Re: Keyspaces not found in cqlsh

2016-02-11 Thread Romain Hardouin
What is the output on both nodes of the following command? 
ls -l /var/lib/cassandra/data/system/* 
If one node seems odd you can try "nodetool resetlocalschema" but the other 
node must be in clean state.

Best,
Romain


Le Jeudi 11 février 2016 11h10, kedar  a écrit :
I am using cqlsh 5.0.1 | Cassandra 2.1.2, recently we are unable to see 
/ desc keyspaces and query tables through cqlsh on either of the two nodes

cqlsh> desc keyspaces



cqlsh> use user_index;
cqlsh:user_index> desc table list_1_10;

Keyspace 'user_index' not found.
cqlsh:user_index>
cqlsh>  select * from system.schema_keyspaces;
Keyspace 'system' not found.
cqlsh>
We are running a 2 node cluster. The Python - Django app that inserts 
data is running without any failure and system logs show nothing abnormal.

./nodetool repair on one node hasn't helped ./nodetool cfstats shows all 
the tables too

-- 
Thanks,
Kedar Parikh

Ext : 2224
Dir : +91 22 61782224
Mob : +91 9819634734
Email : kedar.par...@netcore.co.in
Web : www.netcore.co.in


Re: reducing disk space consumption

2016-02-11 Thread Ted Yu
Thanks, Mohammed and Romain.

On Thu, Feb 11, 2016 at 12:54 AM, Romain Hardouin 
wrote:

> As Mohammed said "nodetool clearsnaphost" will do the trick.
> Cassandra takes a snapshot by default before keyspace/table dropping or
> truncation.
> You can disable this feature if it's a dev node (see auto_snapshot in
> cassandra.yaml) but if it's a production node is a good thing to keep auto
> snapshots.
>
> Best,
>
> Romain
>


Re: Keyspaces not found in cqlsh

2016-02-11 Thread Sebastian Estevez
Keep this on. The user list, it's not appropriate for the dev list.

1) I noticed that some of your files are owned by root and others by
Cassandra. If this is a package install you should always start C* as a
service and chown your files and directories so they are owned by the
Cassandra user, not root.  Never start Cassandra directly as root.

2) Once you have fixed your file ownerships, restart Cassandra on each node
one at a time. You should see your sstables and commitlog get picked up by
Cassandra in the the system.log on startup. Share the output of 'nodetool
describecluster' before and after.

all the best,

Sebastián
On Feb 11, 2016 6:30 AM, "kedar"  wrote:

> Thanks,
>
> kindly refer the following:
>
> https://gist.github.com/anonymous/3dddbe728a52c07d7c52
> https://gist.github.com/anonymous/302ade0875dd6410087b
>
> Thanks,
> Kedar Parikh
>
> Ext : 2224
> Dir : +91 22 61782224
> Mob : +91 9819634734
> Email : kedar.par...@netcore.co.in
> Web : www.netcore.co.in
>
> On Thursday 11 February 2016 04:35 PM, Romain Hardouin wrote:
>
>> Would you mind pasting the ouput for both nodes in gist/paste/whatever?
>> https://gist.github.com http://paste.debian.net
>>
>>
>>
>> Le Jeudi 11 février 2016 11h57, kedar  a
>> écrit :
>> Thanks for the reply.
>>
>> ls -l cassandra/data/* lists various *.db files
>>
>> This problem is on both nodes.
>>
>> Thanks,
>> Kedar Parikh
>>
>> Ext : 2224
>> Dir : +91 22 61782224
>> Mob : +91 9819634734
>> Email : kedar.par...@netcore.co.in
>> Web : www.netcore.co.in
>>
>>
>>
>>
>
>
>


Re: Keyspaces not found in cqlsh

2016-02-11 Thread kedar

Thanks Sebastian,

Cassandra installation in our case is simply an untar.

Cassandra is started using supervisord and user as root, would you still 
recommend I try using Cassandra user.


 ./nodetool describecluster
Cluster Information:
Name: Test Cluster
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
cd361577-6947-3390-a787-be28fe499787: [Ip1]

9f5b5675-c9e7-3ae3-8ad6-6654fa4fb3e7: [Ip2]

Interestingly
./nodetool cfstats shows all the tables

Thanks,
Kedar Parikh

On Thursday 11 February 2016 07:34 PM, Sebastian Estevez wrote:


Keep this on. The user list, it's not appropriate for the dev list.

1) I noticed that some of your files are owned by root and others by 
Cassandra. If this is a package install you should always start C* as 
a service and chown your files and directories so they are owned by 
the Cassandra user, not root. Never start Cassandra directly as root.


2) Once you have fixed your file ownerships, restart Cassandra on each 
node one at a time. You should see your sstables and commitlog get 
picked up by Cassandra in the the system.log on startup. Share the 
output of 'nodetool describecluster' before and after.


all the best,

Sebastián

On Feb 11, 2016 6:30 AM, "kedar" > wrote:


Thanks,

kindly refer the following:

https://gist.github.com/anonymous/3dddbe728a52c07d7c52
https://gist.github.com/anonymous/302ade0875dd6410087b

Thanks,
Kedar Parikh



On Thursday 11 February 2016 04:35 PM, Romain Hardouin wrote:

Would you mind pasting the ouput for both nodes in
gist/paste/whatever? https://gist.github.com
http://paste.debian.net



Le Jeudi 11 février 2016 11h57, kedar
> a écrit :
Thanks for the reply.

ls -l cassandra/data/* lists various *.db files

This problem is on both nodes.

Thanks,
Kedar Parikh














Re: Keyspaces not found in cqlsh

2016-02-11 Thread kedar

Thanks a ton Sebastian.

On restart of one node 1 could see repeated errors like " Mutation of 
22076203 bytes is too large for the maxiumum size of 16777216"


So I increased commitlog_segment_size_in_mb from 32 to 64mb.

Followed by a rolling restart again. And now there is a single version 
and keyspaces are back in cqlsh.


Cluster Information:
Name: Test Cluster
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
9f5b5675-c9e7-3ae3-8ad6-6654fa4fb3e7: [IP1, IP2]

On the flip side what could be implications of increasing 
commitlog_segment_size_in_mb


Thanks,
Kedar Parikh

On Thursday 11 February 2016 08:18 PM, Sebastian Estevez wrote:


If its a tarball then root should be fine but there were some files 
owned by the Cassandra user so you may want to chown those back to root.


I haven't seen your exact issue before but you have two schema 
versions from your describe cluster so a rolling restart should help.


all the best,

Sebastián

On Feb 11, 2016 9:28 AM, "kedar" > wrote:


Thanks Sebastian,

Cassandra installation in our case is simply an untar.

Cassandra is started using supervisord and user as root, would you
still recommend I try using Cassandra user.

 ./nodetool describecluster
Cluster Information:
Name: Test Cluster
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
cd361577-6947-3390-a787-be28fe499787: [Ip1]

9f5b5675-c9e7-3ae3-8ad6-6654fa4fb3e7: [Ip2]

Interestingly
./nodetool cfstats shows all the tables

Thanks,
Kedar Parikh

On Thursday 11 February 2016 07:34 PM, Sebastian Estevez wrote:


Keep this on. The user list, it's not appropriate for the dev list.

1) I noticed that some of your files are owned by root and others
by Cassandra. If this is a package install you should always
start C* as a service and chown your files and directories so
they are owned by the Cassandra user, not root.  Never start
Cassandra directly as root.

2) Once you have fixed your file ownerships, restart Cassandra on
each node one at a time. You should see your sstables and
commitlog get picked up by Cassandra in the the system.log on
startup. Share the output of 'nodetool describecluster' before
and after.

all the best,

Sebastián

On Feb 11, 2016 6:30 AM, "kedar" > wrote:

Thanks,

kindly refer the following:

https://gist.github.com/anonymous/3dddbe728a52c07d7c52
https://gist.github.com/anonymous/302ade0875dd6410087b

Thanks,
Kedar Parikh



On Thursday 11 February 2016 04:35 PM, Romain Hardouin wrote:

Would you mind pasting the ouput for both nodes in
gist/paste/whatever? https://gist.github.com
http://paste.debian.net



Le Jeudi 11 février 2016 11h57, kedar
> a écrit :
Thanks for the reply.

ls -l cassandra/data/* lists various *.db files

This problem is on both nodes.

Thanks,
Kedar Parikh
















Re: Cassandra Collections performance issue

2016-02-11 Thread Clint Martin
I have experienced excessive performance issues while using collections as
well. Mostly my issue was due to the excessive number of cells per
partition that having a modest map size requires.

Since you are reading and writing the entire map, you can probably gain
some performance the same way I did. Convert you map to be a frozen map.
This essentially puts you in the same place as folks who migrate to a blob
of json, but it puts the onus on Cassandra to manage serializing and
deserializing the map.   It does have limitations over a regular map.. You
cant append values, you can't selectively ttl, reading single keys requires
deserializing the whole collection. Basically anything besides reading and
writing the whole collection becomes a little harder. But it is
considerably faster due to the lower cell count and management overhead.

Clint
On Feb 8, 2016 5:11 PM, "Agrawal, Pratik"  wrote:

> Hello all,
>
> Recently we added one of the table fields from as Map in 
> *Cassandra
> 2.1.11*. Currently we read every field from Map and overwrite map values.
> Map is of size 3. We saw that writes are 30-40% slower while reads are
> 70-80% slower. Please find below some metrics that can help.
>
> My question is, Are there any known issues in Cassandra map performance?
> As I understand it each of the CQL3 Map entry, maps to a column in
> cassandra, with that assumption we are just creating 3 columns right? Any
> insight on this issue would be helpful.
>
> Datastax Java Driver 2.1.6.
> Machine: Amazon C3 2x large
> CPU – pretty much same as before (around 30%)
> Memory – max around 4.8 GB
>
> CFSTATS:
>
> Keyspace: Keyspace
> Read Count: 28359044
> Read Latency: 2.847392469259542 ms.
> Write Count: 1152765
> Write Latency: 0.14778018590085576 ms.
> Pending Flushes: 0
> Table: table1
> SSTable count: 1
> SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0]
> Space used (live): 4119699
> Space used (total): 4119699
> Space used by snapshots (total): 90323640
> Off heap memory used (total): 2278
> SSTable Compression Ratio: 0.23172161124142604
> Number of keys (estimate): 14
> Memtable cell count: 6437
> Memtable data size: 872912
> Memtable off heap memory used: 0
> Memtable switch count: 7626
> Local read count: 27754634
> Local read latency: 1.921 ms
> Local write count: 1113668
> Local write latency: 0.142 ms
> Pending flushes: 0
> Bloom filter false positives: 0
> Bloom filter false ratio: 0.0
> Bloom filter space used: 96
> Bloom filter off heap memory used: 88
> Index summary off heap memory used: 46
> Compression metadata off heap memory used: 2144
> Compacted partition minimum bytes: 315853
> Compacted partition maximum bytes: 4055269
> Compacted partition mean bytes: 2444011
> Average live cells per slice (last five minutes): 17.536775249005437
> Maximum live cells per slice (last five minutes): 1225.0
> Average tombstones per slice (last five minutes): 34.99979575985972
> Maximum tombstones per slice (last five minutes): 3430.0
>
> Table: table2
> SSTable count: 1
> SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0]
> Space used (live): 869900
> Space used (total): 869900
> Space used by snapshots (total): 17279824
> Off heap memory used (total): 387
> SSTable Compression Ratio: 0.3999013540551859
> Number of keys (estimate): 2
> Memtable cell count: 1958
> Memtable data size: 8
> Memtable off heap memory used: 0
> Memtable switch count: 7484
> Local read count: 604412
> Local read latency: 45.421 ms
> Local write count: 39097
> Local write latency: 0.337 ms
> Pending flushes: 0
> Bloom filter false positives: 0
> Bloom filter false ratio: 0.0
> Bloom filter space used: 96
> Bloom filter off heap memory used: 88
> Index summary off heap memory used: 35
> Compression metadata off heap memory used: 264
> Compacted partition minimum bytes: 1955667
> Compacted partition maximum bytes: 2346799
> Compacted partition mean bytes: 2346799
> Average live cells per slice (last five minutes): 1963.0632242863855
> Maximum live cells per slice (last five minutes): 5001.0
> Average tombstones per slice (last five minutes): 0.0
> Maximum tombstones per slice (last five minutes): 0.0
>
> *NETSTATS:*
> Mode: NORMAL
> Not sending any streams.
> Read Repair Statistics:
> Attempted: 2853996
> Mismatch (Blocking): 67386
> Mismatch (Background): 9233
> Pool NameActive   Pending  Completed
> Commandsn/a 0   33953165
> Responses   n/a 0 370301
>
> *IOSTAT*
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>   15.200.830.560.100.04   83.27
>
> Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> xvda  2.79 0.4769.86 553719   82619304
> xvdb 14.49 3.39   775.564009600  917227536
> xvdc 15.13 2.98   819.933522250  969708944
> dm-0 49.67 6.36   

Increase compaction performance

2016-02-11 Thread Michał Łowicki
Hi,

Using 2.1.12 across 3 DCs. Each DC has 8 nodes. Trying to run repair using
Cassandra Reaper but nodes after couple of hours are full of pending
compaction tasks (regular not the ones about validation)

CPU load is fine, SSD disks below 30% utilization, no long GC pauses.

How to speed up compaction? Increased compaction throughput and concurrent
compactors but no change. Seems there is plenty idle resources but can't
force C* to use it.

Any clue where there might be a bottleneck?


-- 
BR,
Michał Łowicki


Re: Cassandra Collections performance issue

2016-02-11 Thread Jack Krupansky
Just to help other users reading along here, what is your access pattern
with maps? I mean, do you typically have a large or small number of keys
set, are you typically mostly adding keys or deleting keys a lot, adding
one at a time or adding and deleting a lot in a single request, or... what?
And are you indexing map columns, keys or values?

-- Jack Krupansky

On Thu, Feb 11, 2016 at 10:44 AM, Clint Martin <
clintlmar...@coolfiretechnologies.com> wrote:

> I have experienced excessive performance issues while using collections as
> well. Mostly my issue was due to the excessive number of cells per
> partition that having a modest map size requires.
>
> Since you are reading and writing the entire map, you can probably gain
> some performance the same way I did. Convert you map to be a frozen map.
> This essentially puts you in the same place as folks who migrate to a blob
> of json, but it puts the onus on Cassandra to manage serializing and
> deserializing the map.   It does have limitations over a regular map.. You
> cant append values, you can't selectively ttl, reading single keys requires
> deserializing the whole collection. Basically anything besides reading and
> writing the whole collection becomes a little harder. But it is
> considerably faster due to the lower cell count and management overhead.
>
> Clint
> On Feb 8, 2016 5:11 PM, "Agrawal, Pratik"  wrote:
>
>> Hello all,
>>
>> Recently we added one of the table fields from as Map in 
>> *Cassandra
>> 2.1.11*. Currently we read every field from Map and overwrite map
>> values. Map is of size 3. We saw that writes are 30-40% slower while reads
>> are 70-80% slower. Please find below some metrics that can help.
>>
>> My question is, Are there any known issues in Cassandra map performance?
>> As I understand it each of the CQL3 Map entry, maps to a column in
>> cassandra, with that assumption we are just creating 3 columns right? Any
>> insight on this issue would be helpful.
>>
>> Datastax Java Driver 2.1.6.
>> Machine: Amazon C3 2x large
>> CPU – pretty much same as before (around 30%)
>> Memory – max around 4.8 GB
>>
>> CFSTATS:
>>
>> Keyspace: Keyspace
>> Read Count: 28359044
>> Read Latency: 2.847392469259542 ms.
>> Write Count: 1152765
>> Write Latency: 0.14778018590085576 ms.
>> Pending Flushes: 0
>> Table: table1
>> SSTable count: 1
>> SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0]
>> Space used (live): 4119699
>> Space used (total): 4119699
>> Space used by snapshots (total): 90323640
>> Off heap memory used (total): 2278
>> SSTable Compression Ratio: 0.23172161124142604
>> Number of keys (estimate): 14
>> Memtable cell count: 6437
>> Memtable data size: 872912
>> Memtable off heap memory used: 0
>> Memtable switch count: 7626
>> Local read count: 27754634
>> Local read latency: 1.921 ms
>> Local write count: 1113668
>> Local write latency: 0.142 ms
>> Pending flushes: 0
>> Bloom filter false positives: 0
>> Bloom filter false ratio: 0.0
>> Bloom filter space used: 96
>> Bloom filter off heap memory used: 88
>> Index summary off heap memory used: 46
>> Compression metadata off heap memory used: 2144
>> Compacted partition minimum bytes: 315853
>> Compacted partition maximum bytes: 4055269
>> Compacted partition mean bytes: 2444011
>> Average live cells per slice (last five minutes): 17.536775249005437
>> Maximum live cells per slice (last five minutes): 1225.0
>> Average tombstones per slice (last five minutes): 34.99979575985972
>> Maximum tombstones per slice (last five minutes): 3430.0
>>
>> Table: table2
>> SSTable count: 1
>> SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0]
>> Space used (live): 869900
>> Space used (total): 869900
>> Space used by snapshots (total): 17279824
>> Off heap memory used (total): 387
>> SSTable Compression Ratio: 0.3999013540551859
>> Number of keys (estimate): 2
>> Memtable cell count: 1958
>> Memtable data size: 8
>> Memtable off heap memory used: 0
>> Memtable switch count: 7484
>> Local read count: 604412
>> Local read latency: 45.421 ms
>> Local write count: 39097
>> Local write latency: 0.337 ms
>> Pending flushes: 0
>> Bloom filter false positives: 0
>> Bloom filter false ratio: 0.0
>> Bloom filter space used: 96
>> Bloom filter off heap memory used: 88
>> Index summary off heap memory used: 35
>> Compression metadata off heap memory used: 264
>> Compacted partition minimum bytes: 1955667
>> Compacted partition maximum bytes: 2346799
>> Compacted partition mean bytes: 2346799
>> Average live cells per slice (last five minutes): 1963.0632242863855
>> Maximum live cells per slice (last five minutes): 5001.0
>> Average tombstones per slice (last five minutes): 0.0
>> Maximum tombstones per slice (last five minutes): 0.0
>>
>> *NETSTATS:*
>> Mode: NORMAL
>> Not sending any streams.
>> Read Repair Statistics:
>> Attempted: 2853996
>> Mismatch (Blocking): 67386
>> Mismatch (Background): 9233
>> Pool NameActive   Pending  

Re: Guava version check in 3.0

2016-02-11 Thread Andrew Jorgensen
To answer my own question I was able to shade the dependencies in my jar
which fixed the issue and allow the job to run on Hadoop


org.apache.maven.plugins
maven-shade-plugin



com.google.common

com.foo.com.google.common



${project.artifactId}-${project.version}-jar-with-dependencies



package

shade




*:*

META-INF/*.SF
META-INF/*.DSA
META-INF/*.RSA




true


${main.class}







-- 
Andrew Jorgensen
@ajorgensen

On Thu, Feb 11, 2016, at 03:40 PM, Andrew Jorgensen wrote:
> Hello,
> 
> I am trying to get a cassandra v3.0 cluster up and running with the
> v3.0.0 of the datastax client. I am hitting a number of cases where I am
> running into the following exception:
> 
> java.lang.IllegalStateException: Detected Guava issue #1635 which
> indicates that a version of Guava less than 16.01 is in use.  This
> introduces codec resolution issues and potentially other incompatibility
> issues in the driver.  Please upgrade to Guava 16.01 or later.
> 
> I was wondering if there are any potential work arounds. There are some
> cases where I can work around this issue because I am able to control
> the dependency but there are a number of cases (Hadoop) where Guava is a
> provided dependency and currently set at v11.0.2. As such I cannot work
> around the version of Guava and as such cannot use the datastax driver
> in this context. Are there any work around to getting the driver working
> with older versions of Guava or would it be possible to turn off the
> sanity check?
> 
> Thanks,
> -- 
> Andrew Jorgensen
> @ajorgensen


Re: Security labels

2016-02-11 Thread oleg yusim
Thanks Dani.

Oleg

On Thu, Feb 11, 2016 at 2:27 PM, Dani Traphagen  wrote:

> Hi Oleg,
>
> I'm happy to take a look. Will update after review.
>
> Thanks,
> Dani
>
> On Thu, Feb 11, 2016 at 12:23 PM, oleg yusim  wrote:
>
>> Hi Dani,
>>
>> As promised, I sort of put all my questions under the "one roof". I would
>> really appreciate you opinion on them.
>>
>> https://drive.google.com/open?id=0B2L9nW4Cyj41YWd1UkI4ZXVPYmM
>>
>> Thanks,
>>
>> Oleg
>>
>> On Fri, Jan 29, 2016 at 3:28 PM, Dani Traphagen <
>> dani.trapha...@datastax.com> wrote:
>>
>>> ​Hi Oleg,
>>>
>>> Thanks that helped clear things up! This sounds like a daunting task. I
>>> wish you all the best with it.
>>>
>>> Cheers,
>>> Dani​
>>>
>>> On Fri, Jan 29, 2016 at 10:03 AM, oleg yusim 
>>> wrote:
>>>
 Dani,

 I really appreciate you response. Actually, session timeouts and
 security labels are two different topics (first is about attack when
 somebody opened, say, ssh window to DB, left his machine unattended and
 somebody else stole his session, second - to enable DB to support what
 called MAC access model - stays for mandatory access control. It is widely
 used in the government and military, but not outside of it, we all are used
 to DAC access control model). However, I think you are right and I should
 move all my queries under the one big roof and call this thread "Security".
 I will do this today.

 Now, about what you have said, I just answered the same to Jon, in
 Session Timeout thread, but would quickly re-cap here. I understand that
 Cassandra's architecture was aimed and tailored for completely different
 type of scenario. However, unfortunately, that doesn't mean that Cassandra
 is not vulnerable to the same very set of attacks relational database would
 be vulnerable to. It just means Cassandra is not protected against those
 attacks, because protection against them was not thought of, when database
 was created. I already gave the AAA and session's timeout example in Jon's
 thread, and those are just one of many.

 Now what I'm trying to do, I'm trying to create a STIG - security
 federal compliance document, which will assess Cassandra against SRG
 concepts (security federal compliance recommendations for databases
 overall) and will highlight what is not met, and can't be in current design
 (i.e. what system architects should keep in mind and what they need to
 compensate for with other controls on different layers of system model) and
  what can be met either with configuration or with little enhancement (and
 how).

 That document would be of great help for Cassandra as a product because
 it would allow it to be marketed as a product with existing security
 assessment and guidelines, performed according to DoD standards. It would
 also allow to move product in the general direction of improving its
 security posture. Finally, the document would be posted on DISA site (
 http://iase.disa.mil/stigs/Pages/a-z.aspx) available for every
 security architect to utilize, which would greatly reduce the risk for
 Cassandra product to be hacked in a field.

 To clear things out - what I ask about are not my expectations. I
 really do not expect developers of Cassandra to run and start implementing
 security labels, just because I asked about it. :) My questions are to
 build my internal knowledge of DB current design, so that I can build my
 security assessment based of it, not more, not less.

 I guess, summarizing what I said on top, from what I'm doing Cassandra
 as a product would end up benefiting quite a bit. That is why I think it
 would make sense for Cassandra community to help me with my questions even
 if they sound completely of the traditional "grid".

 Thanks again, I really appreciate your response and conversation
 overall.

 Oleg

 On Fri, Jan 29, 2016 at 11:20 AM, Dani Traphagen <
 dani.trapha...@datastax.com> wrote:

> Also -- it looks like you're really asking questions about session
> timeouts and security labels as they associate, would be more helpful to
> keep in one thread. :)
>
>
> On Friday, January 29, 2016, Dani Traphagen <
> dani.trapha...@datastax.com> wrote:
>
>> Hi Oleg,
>>
>> I understand your frustration but unfortunately, in the terms of your
>> security assessment, you have fallen into a mismatch for Cassandra's
>> utility.
>>
>> The eventuality of having multiple sockets open without the query
>> input for long durations of time isn't something that was
>> architected...because...Cassnadra was built to take massive quantities
>> of queries both in volume and velocity.
>>
>> Your expectation of the database isn't in 

Re: Security labels

2016-02-11 Thread Jack Krupansky
Thanks for putting the items together in a list. This allows people to see
things with more context. Give people in the user community a little time
to respond. A week, maybe. Hopefully some of the senior Cassandra
committers will take a look as well.

Will the final assessment become a public document or is it strictly
internal for your employer? I know there is a database of these
assessments, but I don't know who controls what becomes public and when.

-- Jack Krupansky

On Thu, Feb 11, 2016 at 3:23 PM, oleg yusim  wrote:

> Hi Dani,
>
> As promised, I sort of put all my questions under the "one roof". I would
> really appreciate you opinion on them.
>
> https://drive.google.com/open?id=0B2L9nW4Cyj41YWd1UkI4ZXVPYmM
>
> Thanks,
>
> Oleg
>
> On Fri, Jan 29, 2016 at 3:28 PM, Dani Traphagen <
> dani.trapha...@datastax.com> wrote:
>
>> ​Hi Oleg,
>>
>> Thanks that helped clear things up! This sounds like a daunting task. I
>> wish you all the best with it.
>>
>> Cheers,
>> Dani​
>>
>> On Fri, Jan 29, 2016 at 10:03 AM, oleg yusim  wrote:
>>
>>> Dani,
>>>
>>> I really appreciate you response. Actually, session timeouts and
>>> security labels are two different topics (first is about attack when
>>> somebody opened, say, ssh window to DB, left his machine unattended and
>>> somebody else stole his session, second - to enable DB to support what
>>> called MAC access model - stays for mandatory access control. It is widely
>>> used in the government and military, but not outside of it, we all are used
>>> to DAC access control model). However, I think you are right and I should
>>> move all my queries under the one big roof and call this thread "Security".
>>> I will do this today.
>>>
>>> Now, about what you have said, I just answered the same to Jon, in
>>> Session Timeout thread, but would quickly re-cap here. I understand that
>>> Cassandra's architecture was aimed and tailored for completely different
>>> type of scenario. However, unfortunately, that doesn't mean that Cassandra
>>> is not vulnerable to the same very set of attacks relational database would
>>> be vulnerable to. It just means Cassandra is not protected against those
>>> attacks, because protection against them was not thought of, when database
>>> was created. I already gave the AAA and session's timeout example in Jon's
>>> thread, and those are just one of many.
>>>
>>> Now what I'm trying to do, I'm trying to create a STIG - security
>>> federal compliance document, which will assess Cassandra against SRG
>>> concepts (security federal compliance recommendations for databases
>>> overall) and will highlight what is not met, and can't be in current design
>>> (i.e. what system architects should keep in mind and what they need to
>>> compensate for with other controls on different layers of system model) and
>>>  what can be met either with configuration or with little enhancement (and
>>> how).
>>>
>>> That document would be of great help for Cassandra as a product because
>>> it would allow it to be marketed as a product with existing security
>>> assessment and guidelines, performed according to DoD standards. It would
>>> also allow to move product in the general direction of improving its
>>> security posture. Finally, the document would be posted on DISA site (
>>> http://iase.disa.mil/stigs/Pages/a-z.aspx) available for every security
>>> architect to utilize, which would greatly reduce the risk for Cassandra
>>> product to be hacked in a field.
>>>
>>> To clear things out - what I ask about are not my expectations. I really
>>> do not expect developers of Cassandra to run and start implementing
>>> security labels, just because I asked about it. :) My questions are to
>>> build my internal knowledge of DB current design, so that I can build my
>>> security assessment based of it, not more, not less.
>>>
>>> I guess, summarizing what I said on top, from what I'm doing Cassandra
>>> as a product would end up benefiting quite a bit. That is why I think it
>>> would make sense for Cassandra community to help me with my questions even
>>> if they sound completely of the traditional "grid".
>>>
>>> Thanks again, I really appreciate your response and conversation overall.
>>>
>>> Oleg
>>>
>>> On Fri, Jan 29, 2016 at 11:20 AM, Dani Traphagen <
>>> dani.trapha...@datastax.com> wrote:
>>>
 Also -- it looks like you're really asking questions about session
 timeouts and security labels as they associate, would be more helpful to
 keep in one thread. :)


 On Friday, January 29, 2016, Dani Traphagen <
 dani.trapha...@datastax.com> wrote:

> Hi Oleg,
>
> I understand your frustration but unfortunately, in the terms of your
> security assessment, you have fallen into a mismatch for Cassandra's
> utility.
>
> The eventuality of having multiple sockets open without the query
> input for long durations of time isn't something that was
> 

Re: Session timeout

2016-02-11 Thread Jack Krupansky
Thanks! A useful contribution, no matter what the outcome. I trust your
ability to read of the doc, so I don't expect a lot of change to the
responses, but we'll see. At a minimum, it will probably be good to have
doc to highlight areas where users will need to engage in explicit
mitigation efforts if their infrastructure does not implicitly effect
mitigation for various security exposures.

-- Jack Krupansky

On Thu, Feb 11, 2016 at 3:21 PM, oleg yusim  wrote:

> Robert, Jack, Bryan,
>
> As you suggested, I put together document, titled
> Cassandra_Security_Topics_to_Discuss, put it on Google Drive and shared it
> with everybody on this list. The document contains list of questions I have
> on Cassandra, my take on it, and has a place for notes Community would like
> to make on it.
>
> Please, review. Any help would be appreciated greatly.
>
> https://drive.google.com/open?id=0B2L9nW4Cyj41YWd1UkI4ZXVPYmM
>
> Oleg
>
> On Fri, Jan 29, 2016 at 6:30 PM, Bryan Cheng 
> wrote:
>
>> To throw my (unsolicited) 2 cents into the ring, Oleg, you work for a
>> well-funded and fairly large company. You are certainly free to continue
>> using the list and asking for community support (I am definitely not in any
>> position to tell you otherwise, anyway), but that community support is by
>> definition ad-hoc and best effort. Furthermore, your questions range from
>> trivial to, as Jonathan as mentioned earlier, concepts that many of us have
>> no reason to consider at this time (perhaps your work will convince us
>> otherwise- but you'll need to finish it first ;) )
>>
>> What I'm getting at here is that perhaps, if you need faster, deeper
>> level, and more elaborate support than this list can provide, you should
>> look into the services of a paid Cassandra support company like Datastax.
>>
>> On Fri, Jan 29, 2016 at 3:34 PM, Robert Coli 
>> wrote:
>>
>>> On Fri, Jan 29, 2016 at 3:12 PM, Jack Krupansky <
>>> jack.krupan...@gmail.com> wrote:
>>>
 One last time, I'll simply renew my objection to the way you are
 abusing this list.

>>>
>>> FWIW, while I appreciate that OP (Oleg) is attempting to do a service
>>> for the community, I agree that the flood of single topic, context-lacking
>>> posts regarding deep internals of Cassandra is likely to inspire the
>>> opposite of a helpful response.
>>>
>>> This is important work, however, so hopefully we can collectively find a
>>> way through the meta and can discuss this topic without acrimony! :D
>>>
>>> =Rob
>>>
>>>
>>
>>
>


Re: Rows with same key

2016-02-11 Thread Jack Krupansky
(Note to self... check docs to see if they give this troubleshooting tip. I
didn't see it at first glance.)

-- Jack Krupansky

On Thu, Feb 11, 2016 at 2:45 PM, Kai Wang  wrote:

> Are you supplying timestamps from the client side? Are clocks in sync
> cross your nodes?
>
>
> On Thu, Feb 11, 2016 at 11:52 AM, Yulian Oifa 
> wrote:
>
>> Hello to all
>> I have multiple rows with same id on one of cfs, one row is completely
>> empty ,another one has vaues.
>> Values are wrotten into new row , however they are retreived from old
>> row...
>> I guess one row is created due to removed values, and stucked somehow.
>> I am trying to remove it with no luck ( compact , flush , repair , etc ).
>> I have set gc grace to this CF , however i beleive the old row has old
>> value.
>> How can i get rid of this row?
>> Best regards
>> Yulian Oifa
>>
>
>


Re: Session timeout

2016-02-11 Thread oleg yusim
Jack,

This document doesn't cover all the areas where user will need to get
engaged in explicit mitigation, it only covers those, I wasn't sure about.
But - you are making a good point here. Let me update the document with the
rest of the gaps, so community would have a complete list here.

Thanks,

Oleg

On Thu, Feb 11, 2016 at 3:38 PM, Jack Krupansky 
wrote:

> Thanks! A useful contribution, no matter what the outcome. I trust your
> ability to read of the doc, so I don't expect a lot of change to the
> responses, but we'll see. At a minimum, it will probably be good to have
> doc to highlight areas where users will need to engage in explicit
> mitigation efforts if their infrastructure does not implicitly effect
> mitigation for various security exposures.
>
> -- Jack Krupansky
>
> On Thu, Feb 11, 2016 at 3:21 PM, oleg yusim  wrote:
>
>> Robert, Jack, Bryan,
>>
>> As you suggested, I put together document, titled
>> Cassandra_Security_Topics_to_Discuss, put it on Google Drive and shared it
>> with everybody on this list. The document contains list of questions I have
>> on Cassandra, my take on it, and has a place for notes Community would like
>> to make on it.
>>
>> Please, review. Any help would be appreciated greatly.
>>
>> https://drive.google.com/open?id=0B2L9nW4Cyj41YWd1UkI4ZXVPYmM
>>
>> Oleg
>>
>> On Fri, Jan 29, 2016 at 6:30 PM, Bryan Cheng 
>> wrote:
>>
>>> To throw my (unsolicited) 2 cents into the ring, Oleg, you work for a
>>> well-funded and fairly large company. You are certainly free to continue
>>> using the list and asking for community support (I am definitely not in any
>>> position to tell you otherwise, anyway), but that community support is by
>>> definition ad-hoc and best effort. Furthermore, your questions range from
>>> trivial to, as Jonathan as mentioned earlier, concepts that many of us have
>>> no reason to consider at this time (perhaps your work will convince us
>>> otherwise- but you'll need to finish it first ;) )
>>>
>>> What I'm getting at here is that perhaps, if you need faster, deeper
>>> level, and more elaborate support than this list can provide, you should
>>> look into the services of a paid Cassandra support company like Datastax.
>>>
>>> On Fri, Jan 29, 2016 at 3:34 PM, Robert Coli 
>>> wrote:
>>>
 On Fri, Jan 29, 2016 at 3:12 PM, Jack Krupansky <
 jack.krupan...@gmail.com> wrote:

> One last time, I'll simply renew my objection to the way you are
> abusing this list.
>

 FWIW, while I appreciate that OP (Oleg) is attempting to do a service
 for the community, I agree that the flood of single topic, context-lacking
 posts regarding deep internals of Cassandra is likely to inspire the
 opposite of a helpful response.

 This is important work, however, so hopefully we can collectively find
 a way through the meta and can discuss this topic without acrimony! :D

 =Rob


>>>
>>>
>>
>


Re: Security labels

2016-02-11 Thread oleg yusim
Jack,

I asked my management, if I can share with community my assessment
spreadsheet (whole thing, with gaps and desired configurations). Let's wait
for their answer. I would definitely update the document I shared with the
rest of gaps, so you, guys, would have it for sure.

Now, in case if my management would say no:

1) Here: http://iase.disa.mil/stigs/Pages/a-z.aspx the document titled
vRealize Operations STIG would be published. As part of it, there would be
Cassandra STIG (Cassadra is part of vRealize Operations VMware product).
This STIG would contain only suggestions on right (from the security point
of view) configuration, where it can be configured.
2) Community would have a full list of gaps (things which are needed, but
can't be configured) after I would update my document
3) The rest of the assessment are Not Applicable and Applicable -
Inherently Meet items, which nobody is interested at.
4) Also, when STIG for vRealize Operations would be published, look at the
VMware site for Security Guidelines for vRealize Operations. They would be
posted open to public and you would be able to download them free of
charge. Those would include mitigation, which VMware implemented for some
of the Cassandra gaps.

Thanks,

Oleg

On Thu, Feb 11, 2016 at 2:55 PM, Jack Krupansky 
wrote:

> Thanks for putting the items together in a list. This allows people to see
> things with more context. Give people in the user community a little time
> to respond. A week, maybe. Hopefully some of the senior Cassandra
> committers will take a look as well.
>
> Will the final assessment become a public document or is it strictly
> internal for your employer? I know there is a database of these
> assessments, but I don't know who controls what becomes public and when.
>
> -- Jack Krupansky
>
> On Thu, Feb 11, 2016 at 3:23 PM, oleg yusim  wrote:
>
>> Hi Dani,
>>
>> As promised, I sort of put all my questions under the "one roof". I would
>> really appreciate you opinion on them.
>>
>> https://drive.google.com/open?id=0B2L9nW4Cyj41YWd1UkI4ZXVPYmM
>>
>> Thanks,
>>
>> Oleg
>>
>> On Fri, Jan 29, 2016 at 3:28 PM, Dani Traphagen <
>> dani.trapha...@datastax.com> wrote:
>>
>>> ​Hi Oleg,
>>>
>>> Thanks that helped clear things up! This sounds like a daunting task. I
>>> wish you all the best with it.
>>>
>>> Cheers,
>>> Dani​
>>>
>>> On Fri, Jan 29, 2016 at 10:03 AM, oleg yusim 
>>> wrote:
>>>
 Dani,

 I really appreciate you response. Actually, session timeouts and
 security labels are two different topics (first is about attack when
 somebody opened, say, ssh window to DB, left his machine unattended and
 somebody else stole his session, second - to enable DB to support what
 called MAC access model - stays for mandatory access control. It is widely
 used in the government and military, but not outside of it, we all are used
 to DAC access control model). However, I think you are right and I should
 move all my queries under the one big roof and call this thread "Security".
 I will do this today.

 Now, about what you have said, I just answered the same to Jon, in
 Session Timeout thread, but would quickly re-cap here. I understand that
 Cassandra's architecture was aimed and tailored for completely different
 type of scenario. However, unfortunately, that doesn't mean that Cassandra
 is not vulnerable to the same very set of attacks relational database would
 be vulnerable to. It just means Cassandra is not protected against those
 attacks, because protection against them was not thought of, when database
 was created. I already gave the AAA and session's timeout example in Jon's
 thread, and those are just one of many.

 Now what I'm trying to do, I'm trying to create a STIG - security
 federal compliance document, which will assess Cassandra against SRG
 concepts (security federal compliance recommendations for databases
 overall) and will highlight what is not met, and can't be in current design
 (i.e. what system architects should keep in mind and what they need to
 compensate for with other controls on different layers of system model) and
  what can be met either with configuration or with little enhancement (and
 how).

 That document would be of great help for Cassandra as a product because
 it would allow it to be marketed as a product with existing security
 assessment and guidelines, performed according to DoD standards. It would
 also allow to move product in the general direction of improving its
 security posture. Finally, the document would be posted on DISA site (
 http://iase.disa.mil/stigs/Pages/a-z.aspx) available for every
 security architect to utilize, which would greatly reduce the risk for
 Cassandra product to be hacked in a field.

 To clear things out - what I ask about are 

Guava version check in 3.0

2016-02-11 Thread Andrew Jorgensen
Hello,

I am trying to get a cassandra v3.0 cluster up and running with the
v3.0.0 of the datastax client. I am hitting a number of cases where I am
running into the following exception:

java.lang.IllegalStateException: Detected Guava issue #1635 which
indicates that a version of Guava less than 16.01 is in use.  This
introduces codec resolution issues and potentially other incompatibility
issues in the driver.  Please upgrade to Guava 16.01 or later.

I was wondering if there are any potential work arounds. There are some
cases where I can work around this issue because I am able to control
the dependency but there are a number of cases (Hadoop) where Guava is a
provided dependency and currently set at v11.0.2. As such I cannot work
around the version of Guava and as such cannot use the datastax driver
in this context. Are there any work around to getting the driver working
with older versions of Guava or would it be possible to turn off the
sanity check?

Thanks,
-- 
Andrew Jorgensen
@ajorgensen


Re: Increase compaction performance

2016-02-11 Thread Alain RODRIGUEZ
Also, are you using incremental repairs (not sure about the available
options in Spotify Reaper) what command did you run ?

2016-02-11 17:33 GMT+01:00 Alain RODRIGUEZ :

> CPU load is fine, SSD disks below 30% utilization, no long GC pauses
>
>
>
> What is your current compaction throughput ?  The current value of
> 'concurrent_compactors' (cassandra.yaml or through JMX) ?
>
> nodetool getcompactionthroughput
>
> How to speed up compaction? Increased compaction throughput and concurrent
>> compactors but no change. Seems there is plenty idle resources but can't
>> force C* to use it.
>>
>
> You might want to try un-throttle the compaction throughput through:
>
> nodetool setcompactionsthroughput 0
>
> Choose a canari node. Monitor compaction pending and disk throughput (make
> sure server is ok too - CPU...)
>
> Some other information could be useful:
>
> What is your number of cores per machine and the compaction strategies for
> the 'most compacting' tables. What are write/update patterns, any TTL or
> tombstones ? Do you use a high number of vnodes ?
>
> Also what is your repair routine and your values for gc_grace_seconds ?
> When was your last repair and do you think your cluster is suffering of a
> high entropy ?
>
> You can lower the stream throughput to make sure nodes can cope with what
> repairs are feeding them.
>
> nodetool getstreamthroughput
> nodetool setstreamthroughput X
>
> C*heers,
>
> -
> Alain Rodriguez
> France
>
> The Last Pickle
> http://www.thelastpickle.com
>
> 2016-02-11 16:55 GMT+01:00 Michał Łowicki :
>
>> Hi,
>>
>> Using 2.1.12 across 3 DCs. Each DC has 8 nodes. Trying to run repair
>> using Cassandra Reaper but nodes after couple of hours are full of pending
>> compaction tasks (regular not the ones about validation)
>>
>> CPU load is fine, SSD disks below 30% utilization, no long GC pauses.
>>
>> How to speed up compaction? Increased compaction throughput and
>> concurrent compactors but no change. Seems there is plenty idle
>> resources but can't force C* to use it.
>>
>> Any clue where there might be a bottleneck?
>>
>>
>> --
>> BR,
>> Michał Łowicki
>>
>>
>


Re: Increase compaction performance

2016-02-11 Thread Alain RODRIGUEZ
>
> CPU load is fine, SSD disks below 30% utilization, no long GC pauses



What is your current compaction throughput ?  The current value of
'concurrent_compactors' (cassandra.yaml or through JMX) ?

nodetool getcompactionthroughput

How to speed up compaction? Increased compaction throughput and concurrent
> compactors but no change. Seems there is plenty idle resources but can't
> force C* to use it.
>

You might want to try un-throttle the compaction throughput through:

nodetool setcompactionsthroughput 0

Choose a canari node. Monitor compaction pending and disk throughput (make
sure server is ok too - CPU...)

Some other information could be useful:

What is your number of cores per machine and the compaction strategies for
the 'most compacting' tables. What are write/update patterns, any TTL or
tombstones ? Do you use a high number of vnodes ?

Also what is your repair routine and your values for gc_grace_seconds ?
When was your last repair and do you think your cluster is suffering of a
high entropy ?

You can lower the stream throughput to make sure nodes can cope with what
repairs are feeding them.

nodetool getstreamthroughput
nodetool setstreamthroughput X

C*heers,

-
Alain Rodriguez
France

The Last Pickle
http://www.thelastpickle.com

2016-02-11 16:55 GMT+01:00 Michał Łowicki :

> Hi,
>
> Using 2.1.12 across 3 DCs. Each DC has 8 nodes. Trying to run repair using
> Cassandra Reaper but nodes after couple of hours are full of pending
> compaction tasks (regular not the ones about validation)
>
> CPU load is fine, SSD disks below 30% utilization, no long GC pauses.
>
> How to speed up compaction? Increased compaction throughput and concurrent
> compactors but no change. Seems there is plenty idle resources but can't
> force C* to use it.
>
> Any clue where there might be a bottleneck?
>
>
> --
> BR,
> Michał Łowicki
>
>


Rows with same key

2016-02-11 Thread Yulian Oifa
Hello to all
I have multiple rows with same id on one of cfs, one row is completely
empty ,another one has vaues.
Values are wrotten into new row , however they are retreived from old row...
I guess one row is created due to removed values, and stucked somehow.
I am trying to remove it with no luck ( compact , flush , repair , etc ).
I have set gc grace to this CF , however i beleive the old row has old
value.
How can i get rid of this row?
Best regards
Yulian Oifa


Re: Keyspaces not found in cqlsh

2016-02-11 Thread Sebastian Estevez
>
> On restart of one node 1 could see repeated errors like " Mutation of
> 22076203 bytes is too large for the maxiumum size of 16777216"


Commitlog segment size is the right lever to get C* to accept larger writes
but this is not a traditional use for cassandra. Cassandra is built to
handle lots and lots of small writes, not huge ones. I envision you will
have other pains as a result of your large mutations. If you want to write
huge values consider chunking them up into smaller writes.

Followed by a rolling restart again. And now there is a single version and
> keyspaces are back in cqlsh.


Good, your original problem was due to the schema disagreement issue and
the rolling restart solved it.



All the best,


[image: datastax_logo.png] 

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png]  [image:
facebook.png]  [image: twitter.png]
 [image: g+.png]







DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Thu, Feb 11, 2016 at 10:41 AM, kedar  wrote:

> Thanks a ton Sebastian.
>
> On restart of one node 1 could see repeated errors like " Mutation of
> 22076203 bytes is too large for the maxiumum size of 16777216"
>
> So I increased commitlog_segment_size_in_mb from 32 to 64mb.
>
> Followed by a rolling restart again. And now there is a single version and
> keyspaces are back in cqlsh.
>
> Cluster Information:
> Name: Test Cluster
> Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
> Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
> Schema versions:
> 9f5b5675-c9e7-3ae3-8ad6-6654fa4fb3e7: [IP1, IP2]
>
> On the flip side what could be implications of increasing
> commitlog_segment_size_in_mb
>
> Thanks,
> Kedar Parikh
>
> On Thursday 11 February 2016 08:18 PM, Sebastian Estevez wrote:
>
> If its a tarball then root should be fine but there were some files owned
> by the Cassandra user so you may want to chown those back to root.
>
> I haven't seen your exact issue before but you have two schema versions
> from your describe cluster so a rolling restart should help.
>
> all the best,
>
> Sebastián
> On Feb 11, 2016 9:28 AM, "kedar" < 
> kedar.par...@netcore.co.in> wrote:
>
>> Thanks Sebastian,
>>
>> Cassandra installation in our case is simply an untar.
>>
>> Cassandra is started using supervisord and user as root, would you still
>> recommend I try using Cassandra user.
>>
>>  ./nodetool describecluster
>> Cluster Information:
>> Name: Test Cluster
>> Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
>> Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>> Schema versions:
>> cd361577-6947-3390-a787-be28fe499787: [Ip1]
>>
>> 9f5b5675-c9e7-3ae3-8ad6-6654fa4fb3e7: [Ip2]
>>
>> Interestingly
>> ./nodetool cfstats shows all the tables
>>
>> Thanks,
>> Kedar Parikh
>>
>> On Thursday 11 February 2016 07:34 PM, Sebastian Estevez wrote:
>>
>> Keep this on. The user list, it's not appropriate for the dev list.
>>
>> 1) I noticed that some of your files are owned by root and others by
>> Cassandra. If this is a package install you should always start C* as a
>> service and chown your files and directories so they are owned by the
>> Cassandra user, not root.  Never start Cassandra directly as root.
>>
>> 2) Once you have fixed your file ownerships, restart Cassandra on each
>> node one at a time. You should see your sstables and commitlog get picked
>> up by Cassandra in the the system.log on startup. Share the output of
>> 'nodetool describecluster' before and after.
>>
>> all the best,
>>
>> Sebastián
>> On Feb 11, 2016 6:30 AM, "kedar"  wrote:
>>
>>> Thanks,
>>>
>>> kindly refer the following:
>>>
>>> https://gist.github.com/anonymous/3dddbe728a52c07d7c52
>>> https://gist.github.com/anonymous/302ade0875dd6410087b
>>>
>>> Thanks,
>>> Kedar Parikh
>>>
>>>
>>>
>>> On Thursday 11 February 2016 04:35 PM, Romain Hardouin wrote:
>>>
 Would you mind pasting the ouput for both nodes in gist/paste/whatever?
 https://gist.github.com 
 http://paste.debian.net



 Le Jeudi 11 février 2016 11h57, kedar < 
 kedar.par...@netcore.co.in> a écrit :
 Thanks 

Re: OpsCenter 5.2

2016-02-11 Thread Ted Yu
The installation partially failed.

Please let me know how I can resume the installation or, the error below
can be ignored.
I verified on the Linux host that setuptools-0.6c11-py2.7.egg can be
downloaded.

Python 2.7.11 (default, Jan  5 2016, 11:21:51)

--

python ez_setup.py
Downloading
http://pypi.python.org/packages/2.7/s/setuptools/setuptools-0.6c11-py2.7.egg
Traceback (most recent call last):
  File "ez_setup.py", line 278, in 
main(sys.argv[1:])
  File "ez_setup.py", line 210, in main
egg = download_setuptools(version, delay=0)
  File "ez_setup.py", line 158, in download_setuptools
src = urllib2.urlopen(url)
  File "/usr/local/lib/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
  File "/usr/local/lib/python2.7/urllib2.py", line 437, in open
response = meth(req, response)
  File "/usr/local/lib/python2.7/urllib2.py", line 550, in http_response
'http', request, response, code, msg, hdrs)
  File "/usr/local/lib/python2.7/urllib2.py", line 469, in error
Press [Enter] to continue:
result = self._call_chain(*args)
  File "/usr/local/lib/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
  File "/usr/local/lib/python2.7/urllib2.py", line 656, in http_error_302
return self.parent.open(new, timeout=req.timeout)
  File "/usr/local/lib/python2.7/urllib2.py", line 431, in open
response = self._open(req, data)
  File "/usr/local/lib/python2.7/urllib2.py", line 454, in _open
'unknown_open', req)
  File "/usr/local/lib/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
  File "/usr/local/lib/python2.7/urllib2.py", line 1265, in unknown_open
raise URLError('unknown url type: %s' % type)
urllib2.URLError: 
ERROR: Unable to install ez_setup.py
=
easy_install pip
Searching for pip
Best match: pip 8.0.2
Adding pip 8.0.2 to easy-install.pth file
Installing pip script to /usr/bin
Installing pip3.5 script to /usr/bin
Installing pip3 script to /usr/bin

On Thu, Feb 11, 2016 at 9:45 AM, Sebastian Estevez <
sebastian.este...@datastax.com> wrote:

> Confirmed.
>
> all the best,
>
> Sebastián
> On Feb 11, 2016 12:44 PM, "Ted Yu"  wrote:
>
>> Thanks for the pointer.
>>
>> Just want to confirm that OpsCenter 5.2 is compatible with DSE 4.8.4
>> which I have deployed.
>>
>> Cheers
>>
>> On Thu, Feb 11, 2016 at 7:00 AM, Sebastian Estevez <
>> sebastian.este...@datastax.com> wrote:
>>
>>> The monitoring UI is called DataStax OpsCenter and it has its own
>>> install process.
>>>
>>> Check out our documentation on the subject:
>>>
>>>
>>> http://docs.datastax.com/en/opscenter/5.2/opsc/install/opscInstallOpsc_g.html
>>>
>>> all the best,
>>>
>>> Sebastián
>>> On Feb 9, 2016 8:01 PM, "Ted Yu"  wrote:
>>>
 Hi,
 I am using DSE 4.8.4
 Here are the ports Cassandra daemon listens on:

 tcp0  0 xx.yy:9042  0.0.0.0:*
 LISTEN  30773/java
 tcp0  0 127.0.0.1:56498 0.0.0.0:*
   LISTEN  30773/java
 tcp0  0 xx.yy:7000  0.0.0.0:*
 LISTEN  30773/java
 tcp0  0 127.0.0.1:7199  0.0.0.0:*
   LISTEN  30773/java
 tcp0  0 xx.yy:9160  0.0.0.0:*
 LISTEN  30773/java

 Can you tell me how I can get to the DSE monitoring UI ?

 Thanks

>>>
>>


Re: Keyspaces not found in cqlsh

2016-02-11 Thread Romain Hardouin
Would you mind pasting the ouput for both nodes in gist/paste/whatever? 
https://gist.github.com http://paste.debian.net



Le Jeudi 11 février 2016 11h57, kedar  a écrit :
Thanks for the reply.

ls -l cassandra/data/* lists various *.db files

This problem is on both nodes.

Thanks,
Kedar Parikh

Ext : 2224
Dir : +91 22 61782224
Mob : +91 9819634734
Email : kedar.par...@netcore.co.in
Web : www.netcore.co.in


Re: Keyspaces not found in cqlsh

2016-02-11 Thread kedar

Thanks,

kindly refer the following:

https://gist.github.com/anonymous/3dddbe728a52c07d7c52
https://gist.github.com/anonymous/302ade0875dd6410087b

Thanks,
Kedar Parikh

Ext : 2224
Dir : +91 22 61782224
Mob : +91 9819634734
Email : kedar.par...@netcore.co.in
Web : www.netcore.co.in

On Thursday 11 February 2016 04:35 PM, Romain Hardouin wrote:

Would you mind pasting the ouput for both nodes in gist/paste/whatever? 
https://gist.github.com http://paste.debian.net



Le Jeudi 11 février 2016 11h57, kedar  a écrit :
Thanks for the reply.

ls -l cassandra/data/* lists various *.db files

This problem is on both nodes.

Thanks,
Kedar Parikh

Ext : 2224
Dir : +91 22 61782224
Mob : +91 9819634734
Email : kedar.par...@netcore.co.in
Web : www.netcore.co.in









OpsCenter 5.2

2016-02-11 Thread Ted Yu
Thanks for the pointer.

Just want to confirm that OpsCenter 5.2 is compatible with DSE 4.8.4 which
I have deployed.

Cheers

On Thu, Feb 11, 2016 at 7:00 AM, Sebastian Estevez <
sebastian.este...@datastax.com> wrote:

> The monitoring UI is called DataStax OpsCenter and it has its own install
> process.
>
> Check out our documentation on the subject:
>
>
> http://docs.datastax.com/en/opscenter/5.2/opsc/install/opscInstallOpsc_g.html
>
> all the best,
>
> Sebastián
> On Feb 9, 2016 8:01 PM, "Ted Yu"  wrote:
>
>> Hi,
>> I am using DSE 4.8.4
>> Here are the ports Cassandra daemon listens on:
>>
>> tcp0  0 xx.yy:9042  0.0.0.0:*
>> LISTEN  30773/java
>> tcp0  0 127.0.0.1:56498 0.0.0.0:*
>> LISTEN  30773/java
>> tcp0  0 xx.yy:7000  0.0.0.0:*
>> LISTEN  30773/java
>> tcp0  0 127.0.0.1:7199  0.0.0.0:*
>> LISTEN  30773/java
>> tcp0  0 xx.yy:9160  0.0.0.0:*
>> LISTEN  30773/java
>>
>> Can you tell me how I can get to the DSE monitoring UI ?
>>
>> Thanks
>>
>


Re: OpsCenter 5.2

2016-02-11 Thread Sebastian Estevez
Confirmed.

all the best,

Sebastián
On Feb 11, 2016 12:44 PM, "Ted Yu"  wrote:

> Thanks for the pointer.
>
> Just want to confirm that OpsCenter 5.2 is compatible with DSE 4.8.4 which
> I have deployed.
>
> Cheers
>
> On Thu, Feb 11, 2016 at 7:00 AM, Sebastian Estevez <
> sebastian.este...@datastax.com> wrote:
>
>> The monitoring UI is called DataStax OpsCenter and it has its own install
>> process.
>>
>> Check out our documentation on the subject:
>>
>>
>> http://docs.datastax.com/en/opscenter/5.2/opsc/install/opscInstallOpsc_g.html
>>
>> all the best,
>>
>> Sebastián
>> On Feb 9, 2016 8:01 PM, "Ted Yu"  wrote:
>>
>>> Hi,
>>> I am using DSE 4.8.4
>>> Here are the ports Cassandra daemon listens on:
>>>
>>> tcp0  0 xx.yy:9042  0.0.0.0:*
>>> LISTEN  30773/java
>>> tcp0  0 127.0.0.1:56498 0.0.0.0:*
>>> LISTEN  30773/java
>>> tcp0  0 xx.yy:7000  0.0.0.0:*
>>> LISTEN  30773/java
>>> tcp0  0 127.0.0.1:7199  0.0.0.0:*
>>> LISTEN  30773/java
>>> tcp0  0 xx.yy:9160  0.0.0.0:*
>>> LISTEN  30773/java
>>>
>>> Can you tell me how I can get to the DSE monitoring UI ?
>>>
>>> Thanks
>>>
>>
>


ApacheCon NA 2016 - Important Dates!!!

2016-02-11 Thread Melissa Warnkin
 Hello everyone!
I hope this email finds you well.  I hope everyone is as excited about 
ApacheCon as I am!
I'd like to remind you all of a couple of important dates, as well as ask for 
your assistance in spreading the word! Please use your social media platform(s) 
to get the word out! The more visibility, the better ApacheCon will be for 
all!! :)
CFP Close: February 12, 2016CFP Notifications: February 29, 2016Schedule 
Announced: March 3, 2016
To submit a talk, please visit:  
http://events.linuxfoundation.org/events/apache-big-data-north-america/program/cfp

Link to the main site can be found here:  
http://events.linuxfoundation.org/events/apache-big-data-north-america

Apache: Big Data North America 2016 Registration Fees:
Attendee Registration Fee: US$599 through March 6, US$799 through April 10, 
US$999 thereafterCommitter Registration Fee: US$275 through April 10, US$375 
thereafterStudent Registration Fee: US$275 through April 10, $375 thereafter
Planning to attend ApacheCon North America 2016 May 11 - 13, 2016? There is an 
add-on option on the registration form to join the conference for a discounted 
fee of US$399, available only to Apache: Big Data North America attendees.
So, please tweet away!!
I look forward to seeing you in Vancouver! Have a groovy day!!
~Melissaon behalf of the ApacheCon Team