Re: Cassandra 3.1.1 with respect to HeapSpace

2016-01-14 Thread Jean Tremblay
How can I restart?
It blocks with the error listed below.
Are my memory settings good for my configuration?

On 14 Jan 2016, at 18:30, Jake Luciani 
> wrote:

Yes you can restart without data loss.

Can you please include info about how much data you have loaded per node and 
perhaps what your schema looks like?

Thanks

On Thu, Jan 14, 2016 at 12:24 PM, Jean Tremblay 
> 
wrote:

Ok, I will open a ticket.

How could I restart my cluster without loosing everything ?
Would there be a better memory configuration to select for my nodes? Currently 
I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.

Thanks

Jean

On 14 Jan 2016, at 18:19, Tyler Hobbs 
> wrote:

I don't think that's a known issue.  Can you open a ticket at 
https://issues.apache.org/jira/browse/CASSANDRA and attach your schema along 
with the commitlog files and the mutation that was saved to /tmp?

On Thu, Jan 14, 2016 at 10:56 AM, Jean Tremblay 
> 
wrote:
Hi,

I have a small Cassandra Cluster with 5 nodes, having 16MB of RAM.
I use Cassandra 3.1.1.
I use the following setup for the memory:
  MAX_HEAP_SIZE="6G"
HEAP_NEWSIZE="496M"

I have been loading a lot of data in this cluster over the last 24 hours. The 
system behaved I think very nicely. It was loading very fast, and giving 
excellent read time. There was no error messages until this one:


ERROR [SharedPool-Worker-35] 2016-01-14 17:05:23,602 
JVMStabilityInspector.java:139 - JVM state determined to be unstable.  Exiting 
forcefully due to:
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[na:1.8.0_65]
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_65]
at 
org.apache.cassandra.io.util.DataOutputBuffer.reallocate(DataOutputBuffer.java:126)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:86) 
~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:297)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:374) 
~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.rows.BufferCell$Serializer.serialize(BufferCell.java:263)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:183)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:96)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:132)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:298)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:136)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:128)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:123)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65) 
~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289) 
~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:47)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) 
~[apache-cassandra-3.1.1.jar:3.1.1]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_65]
at 
org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 

Re: Cassandra 3.1.1 with respect to HeapSpace

2016-01-14 Thread Jean Tremblay

Ok, I will open a ticket.

How could I restart my cluster without loosing everything ?
Would there be a better memory configuration to select for my nodes? Currently 
I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.

Thanks

Jean

On 14 Jan 2016, at 18:19, Tyler Hobbs 
> wrote:

I don't think that's a known issue.  Can you open a ticket at 
https://issues.apache.org/jira/browse/CASSANDRA and attach your schema along 
with the commitlog files and the mutation that was saved to /tmp?

On Thu, Jan 14, 2016 at 10:56 AM, Jean Tremblay 
> 
wrote:
Hi,

I have a small Cassandra Cluster with 5 nodes, having 16MB of RAM.
I use Cassandra 3.1.1.
I use the following setup for the memory:
  MAX_HEAP_SIZE="6G"
HEAP_NEWSIZE="496M"

I have been loading a lot of data in this cluster over the last 24 hours. The 
system behaved I think very nicely. It was loading very fast, and giving 
excellent read time. There was no error messages until this one:


ERROR [SharedPool-Worker-35] 2016-01-14 17:05:23,602 
JVMStabilityInspector.java:139 - JVM state determined to be unstable.  Exiting 
forcefully due to:
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[na:1.8.0_65]
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_65]
at 
org.apache.cassandra.io.util.DataOutputBuffer.reallocate(DataOutputBuffer.java:126)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:86) 
~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:297)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:374) 
~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.rows.BufferCell$Serializer.serialize(BufferCell.java:263)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:183)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:96)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:132)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:298)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:136)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:128)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:123)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65) 
~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289) 
~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:47)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) 
~[apache-cassandra-3.1.1.jar:3.1.1]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_65]
at 
org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
[apache-cassandra-3.1.1.jar:3.1.1]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]

4 nodes out of 5 crashed with this error message. Now when I want to restart 
the first node I have the following error;

ERROR [main] 2016-01-14 17:15:59,617 JVMStabilityInspector.java:81 - Exiting 
due to error while processing commit log during initialization.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: 
Unexpected error deserializing mutation; saved 

Re: Cassandra 3.1.1 with respect to HeapSpace

2016-01-14 Thread Jake Luciani
Yes you can restart without data loss.

Can you please include info about how much data you have loaded per node
and perhaps what your schema looks like?

Thanks

On Thu, Jan 14, 2016 at 12:24 PM, Jean Tremblay <
jean.tremb...@zen-innovations.com> wrote:

>
> Ok, I will open a ticket.
>
> How could I restart my cluster without loosing everything ?
> Would there be a better memory configuration to select for my nodes?
> Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.
>
> Thanks
>
> Jean
>
> On 14 Jan 2016, at 18:19, Tyler Hobbs  wrote:
>
> I don't think that's a known issue.  Can you open a ticket at
> https://issues.apache.org/jira/browse/CASSANDRA and attach your schema
> along with the commitlog files and the mutation that was saved to /tmp?
>
> On Thu, Jan 14, 2016 at 10:56 AM, Jean Tremblay <
> jean.tremb...@zen-innovations.com> wrote:
>
>> Hi,
>>
>> I have a small Cassandra Cluster with 5 nodes, having 16MB of RAM.
>> I use Cassandra 3.1.1.
>> I use the following setup for the memory:
>>   MAX_HEAP_SIZE="6G"
>> HEAP_NEWSIZE="496M"
>>
>> I have been loading a lot of data in this cluster over the last 24 hours.
>> The system behaved I think very nicely. It was loading very fast, and
>> giving excellent read time. There was no error messages until this one:
>>
>>
>> ERROR [SharedPool-Worker-35] 2016-01-14 17:05:23,602
>> JVMStabilityInspector.java:139 - JVM state determined to be unstable.
>> Exiting forcefully due to:
>> java.lang.OutOfMemoryError: Java heap space
>> at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[na:1.8.0_65]
>> at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_65]
>> at
>> org.apache.cassandra.io.util.DataOutputBuffer.reallocate(DataOutputBuffer.java:126)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:86)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:297)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:374)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.BufferCell$Serializer.serialize(BufferCell.java:263)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:183)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:96)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:132)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:298)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:136)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:128)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:123)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:47)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> ~[na:1.8.0_65]
>> at
>> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
>> [apache-cassandra-3.1.1.jar:3.1.1]
>> at 

Re: Slow performance after upgrading from 2.0.9 to 2.1.11

2016-01-14 Thread Zhiyan Shao
Praveen, if you search "Read is slower in 2.1.6 than 2.0.14" in this forum,
you can find another thread I sent a while ago. The perf test I did
indicated that read is slower for 2.1.6 than 2.0.14 so we stayed with
2.0.14.

On Tue, Jan 12, 2016 at 9:35 AM, Peddi, Praveen  wrote:

> Thanks Jeff for your reply. Sorry for delayed response. We were running
> some more tests and wanted to wait for the results.
>
> So basically we saw higher CPU with 2.1.11 was higher compared to 2.0.9
> (see below) for the same exact load test. Memory spikes were also
> aggressive on 2.1.11.
>
> So we wanted to rule out any of our custom setting so we ended up doing
> some testing with Cassandra stress test and default Cassandra installation.
> Here are the results we saw between 2.0.9 and 2.1.11. Both are default
> installations and both use Cassandra stress test with same params. This is
> the closest apple-apple comparison we can get. As you can see both read and
> write latencies are 30 to 50% worse in 2.1.11 than 2.0.9. Since we are
> using default installation.
>
> *Highlights of the test:*
> Load: 2x reads and 1x writes
> CPU:  2.0.9 (goes upto 25%)  compared to 2.1.11 (goes upto 60%)
>
> Local read latency: 0.039 ms for 2.0.9 and 0.066 ms for 2.1.11
>
> Local write Latency: 0.033 ms for 2.0.9 Vs 0.030 ms for 2.1.11
>
> *One observation is, As the number of threads are increased, 2.1.11 read
> latencies are getting worse compared to 2.0.9 (see below table for 24
> threads vs 54 threads)*
> Not sure if anyone has done this kind of comparison before and what their
> thoughts are. I am thinking for this same reason
>
> 2.0.9 Plain  type   total ops op/s pk/srow/s mean
> med 0.95 0.99 0.999  maxtime  16 threadCount  READ 66854 7205 7205
> 7205 1.6 1.3 2.8 3.5 9.6 85.3 9.3  16 threadCount  WRITE 33146 3572 3572
> 3572 1.3 1 2.6 3.3 7 206.5 9.3  16 threadCount  total 10 10777 10777
> 10777 1.5 1.3 2.7 3.4 7.9 206.5 9.3 2.1.11 Plain  16
> threadCount  READ 67096 6818 6818 6818 1.6 1.5 2.6 3.5 7.9 61.7 9.8  16
> threadCount  WRITE 32904 3344 3344 3344 1.4 1.3 2.3 3 6.5 56.7 9.8  16
> threadCount  total 10 10162 10162 10162 1.6 1.4 2.5 3.2 6 61.7 9.8 2.0.9
> Plain  24 threadCount  READ 66414 8167 8167 8167 2
> 1.6 3.7 7.5 16.7 208 8.1  24 threadCount  WRITE 33586 4130 4130 4130 1.7
> 1.3 3.4 5.4 25.6 45.4 8.1  24 threadCount  total 10 12297 12297 12297
> 1.9 1.5 3.5 6.2 15.2 208 8.1 2.1.11 Plain  24
> threadCount  READ 66628 7433 7433 7433 2.2 2.1 3.4 4.3 8.4 38.3 9  24
> threadCount  WRITE 33372 3723 3723 3723 2 1.9 3.1 3.8 21.9 37.2 9  24
> threadCount  total 10 11155 11155 11155 2.1 2 3.3 4.1 8.8 38.3 9 2.0.9
> Plain  54 threadCount  READ 67115 13419 13419
> 13419 2.8 2.6 4.2 6.4 36.9 82.4 5  54 threadCount  WRITE 32885 6575 6575
> 6575 2.5 2.3 3.9 5.6 15.9 81.5 5  54 threadCount  total 10 19993 19993
> 19993 2.7 2.5 4.1 5.7 13.9 82.4 5 2.1.11 Plain  54
> threadCount  READ 66780 8951 8951 8951 4.3 3.9 6.8 9.7 49.4 69.9 7.5  54
> threadCount  WRITE 33220 4453 4453 4453 3.5 3.2 5.7 8.2 36.8 68 7.5  54
> threadCount  total 10 13404 13404 13404 4 3.7 6.6 9.2 48 69.9 7.5
>
> From: Jeff Jirsa 
> Date: Thursday, January 7, 2016 at 1:01 AM
> To: "user@cassandra.apache.org" , Peddi
> Praveen 
> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>
> Anecdotal evidence typically agrees that 2.1 is faster than 2.0 (our
> experience was anywhere from 20-60%, depending on workload).
>
> However, it’s not necessarily true that everything behaves exactly the
> same – in particular, memtables are different, commitlog segment handling
> is different, and GC params may need to be tuned differently for 2.1 than
> 2.0.
>
> When the system is busy, what’s it actually DOING? Cassandra exposes a TON
> of metrics – have you plugged any into a reporting system to see what’s
> going on? Is your latency due to pegged cpu, iowait/disk queues or gc
> pauses?
>
> My colleagues spent a lot of time validating different AWS EBS configs
> (video from reinvent at https://www.youtube.com/watch?v=1R-mgOcOSd4), 2.1
> was faster in almost every case, but you’re using an instance size I don’t
> believe we tried (too little RAM to be viable in production).  c3.2xl only
> gives you 15G of ram – most “performance” based systems want 2-4x that
> (people running G1 heaps usually start at 16G heaps and leave another
> 16-30G for page cache), you’re running fairly small hardware – it’s
> possible that 2.1 isn’t “as good” on smaller hardware.
>
> (I do see your domain, presumably you know all of this, but just to be
> sure):
>
> You’re using c3, so presumably you’re using EBS – are you using GP2? Which
> volume sizes? Are they the same between versions? Are you hitting your iops
> limits? Running out 

Re: Cassandra is consuming a lot of disk space

2016-01-14 Thread Rahul Ramesh
Hi Jan,
I checked it. There are no old Key Spaces or tables.
Thanks for your pointer, I started looking inside the directories. I see
lot of snapshots directory inside the table directory. These directories
are consuming space.

However these snapshots are not shown  when I issue listsnapshots
./bin/nodetool listsnapshots
Snapshot Details:
There are no snapshots

Can I safely delete those snapshots? why listsnapshots is not showing the
snapshots? Also in future, how can we find out if there are snapshots?

Thanks,
Rahul



On Thu, Jan 14, 2016 at 12:50 PM, Jan Kesten  wrote:

> Hi Rahul,
>
> just an idea, did you have a look at the data directorys on disk
> (/var/lib/cassandra/data)? It could be that there are some from old
> keyspaces that have been deleted and snapshoted before. Try something like
> "du -sh /var/lib/cassandra/data/*" to verify which keyspace is consuming
> your space.
>
> Jan
>
> Von meinem iPhone gesendet
>
> Am 14.01.2016 um 07:25 schrieb Rahul Ramesh :
>
> Thanks for your suggestion.
>
> Compaction was happening on one of the large tables. The disk space did
> not decrease much after the compaction. So I ran an external compaction.
> The disk space decreased by around 10%. However it is still consuming close
> to 750Gb for load of 250Gb.
>
> I even restarted cassandra thinking there may be some open files. However
> it didnt help much.
>
> Is there any way to find out why so much of data is being consumed?
>
> I checked if there are any open files using lsof. There are not any open
> files.
>
> *Recovery:*
> Just a wild thought
> I am using replication factor of 2 and I have two nodes. If I delete
> complete data on one of the node, will I be able to recover all the data
> from the active node?
> I don't want to pursue this path as I want to find out the root cause of
> the issue!
>
>
> Any help will be greatly appreciated
>
> Thank you,
>
> Rahul
>
>
>
>
>
>
> On Wed, Jan 13, 2016 at 3:37 PM, Carlos Rolo  wrote:
>
>> You can check if the snapshot exists in the snapshot folder.
>> Repairs stream sstables over, than can temporary increase disk space. But
>> I think Carlos Alonso might be correct. Running compactions might be the
>> issue.
>>
>> Regards,
>>
>> Carlos Juzarte Rolo
>> Cassandra Consultant
>>
>> Pythian - Love your data
>>
>> rolo@pythian | Twitter: @cjrolo | Linkedin: 
>> *linkedin.com/in/carlosjuzarterolo
>> *
>> Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649
>> www.pythian.com
>>
>> On Wed, Jan 13, 2016 at 9:24 AM, Carlos Alonso 
>> wrote:
>>
>>> I'd have a look also at possible running compactions.
>>>
>>> If you have big column families with STCS then large compactions may be
>>> happening.
>>>
>>> Check it with nodetool compactionstats
>>>
>>> Carlos Alonso | Software Engineer | @calonso
>>> 
>>>
>>> On 13 January 2016 at 05:22, Kevin O'Connor  wrote:
>>>
 Have you tried restarting? It's possible there's open file handles to
 sstables that have been compacted away. You can verify by doing lsof and
 grepping for DEL or deleted.

 If it's not that, you can run nodetool cleanup on each node to scan all
 of the sstables on disk and remove anything that it's not responsible for.
 Generally this would only work if you added nodes recently.


 On Tuesday, January 12, 2016, Rahul Ramesh  wrote:

> We have a 2 node Cassandra cluster with a replication factor of 2.
>
> The load factor on the nodes is around 350Gb
>
> Datacenter: Cassandra
> ==
> Address  RackStatus State   LoadOwns
>  Token
>
>   -5072018636360415943
> 172.31.7.91  rack1   Up Normal  328.5 GB100.00%
>   -7068746880841807701
> 172.31.7.92  rack1   Up Normal  351.7 GB100.00%
>   -5072018636360415943
>
> However,if I use df -h,
>
> /dev/xvdf   252G  223G   17G  94% /HDD1
> /dev/xvdg   493G  456G   12G  98% /HDD2
> /dev/xvdh   197G  167G   21G  90% /HDD3
>
>
> HDD1,2,3 contains only cassandra data. It amounts to close to 1Tb in
> one of the machine and in another machine it is close to 650Gb.
>
> I started repair 2 days ago, after running repair, the amount of disk
> space consumption has actually increased.
> I also checked if this is because of snapshots. nodetool listsnapshot
> intermittently lists a snapshot but it goes away after sometime.
>
> Can somebody please help me understand,
> 1. why so much disk space is consumed?
> 2. Why did it increase after repair?
> 3. Is there any way to recover from this state.
>
>
> Thanks,
> Rahul
>
>
>>>
>>
>> --
>>
>>
>>
>>
>


Re: Cassandra is consuming a lot of disk space

2016-01-14 Thread Rahul Ramesh
One update. I cleared the snapshot using nodetool clearsnapshot command.
Disk space is recovered now.

Because of this issue, I have mounted one more drive to the server and
there are some data files there. How can I migrate the data so that I can
decommission the drive?
Will it work if I just copy all the contents in the table directory to one
of the drives?

Thanks for all the help.

Regards,
Rahul

On Thursday 14 January 2016, Rahul Ramesh  wrote:

> Hi Jan,
> I checked it. There are no old Key Spaces or tables.
> Thanks for your pointer, I started looking inside the directories. I see
> lot of snapshots directory inside the table directory. These directories
> are consuming space.
>
> However these snapshots are not shown  when I issue listsnapshots
> ./bin/nodetool listsnapshots
> Snapshot Details:
> There are no snapshots
>
> Can I safely delete those snapshots? why listsnapshots is not showing the
> snapshots? Also in future, how can we find out if there are snapshots?
>
> Thanks,
> Rahul
>
>
>
> On Thu, Jan 14, 2016 at 12:50 PM, Jan Kesten  > wrote:
>
>> Hi Rahul,
>>
>> just an idea, did you have a look at the data directorys on disk
>> (/var/lib/cassandra/data)? It could be that there are some from old
>> keyspaces that have been deleted and snapshoted before. Try something like
>> "du -sh /var/lib/cassandra/data/*" to verify which keyspace is consuming
>> your space.
>>
>> Jan
>>
>> Von meinem iPhone gesendet
>>
>> Am 14.01.2016 um 07:25 schrieb Rahul Ramesh > >:
>>
>> Thanks for your suggestion.
>>
>> Compaction was happening on one of the large tables. The disk space did
>> not decrease much after the compaction. So I ran an external compaction.
>> The disk space decreased by around 10%. However it is still consuming close
>> to 750Gb for load of 250Gb.
>>
>> I even restarted cassandra thinking there may be some open files. However
>> it didnt help much.
>>
>> Is there any way to find out why so much of data is being consumed?
>>
>> I checked if there are any open files using lsof. There are not any open
>> files.
>>
>> *Recovery:*
>> Just a wild thought
>> I am using replication factor of 2 and I have two nodes. If I delete
>> complete data on one of the node, will I be able to recover all the data
>> from the active node?
>> I don't want to pursue this path as I want to find out the root cause of
>> the issue!
>>
>>
>> Any help will be greatly appreciated
>>
>> Thank you,
>>
>> Rahul
>>
>>
>>
>>
>>
>>
>> On Wed, Jan 13, 2016 at 3:37 PM, Carlos Rolo > > wrote:
>>
>>> You can check if the snapshot exists in the snapshot folder.
>>> Repairs stream sstables over, than can temporary increase disk space.
>>> But I think Carlos Alonso might be correct. Running compactions might be
>>> the issue.
>>>
>>> Regards,
>>>
>>> Carlos Juzarte Rolo
>>> Cassandra Consultant
>>>
>>> Pythian - Love your data
>>>
>>> rolo@pythian | Twitter: @cjrolo | Linkedin: 
>>> *linkedin.com/in/carlosjuzarterolo
>>> *
>>> Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649
>>> www.pythian.com
>>>
>>> On Wed, Jan 13, 2016 at 9:24 AM, Carlos Alonso >> > wrote:
>>>
 I'd have a look also at possible running compactions.

 If you have big column families with STCS then large compactions may be
 happening.

 Check it with nodetool compactionstats

 Carlos Alonso | Software Engineer | @calonso
 

 On 13 January 2016 at 05:22, Kevin O'Connor > wrote:

> Have you tried restarting? It's possible there's open file handles to
> sstables that have been compacted away. You can verify by doing lsof and
> grepping for DEL or deleted.
>
> If it's not that, you can run nodetool cleanup on each node to scan
> all of the sstables on disk and remove anything that it's not responsible
> for. Generally this would only work if you added nodes recently.
>
>
> On Tuesday, January 12, 2016, Rahul Ramesh  > wrote:
>
>> We have a 2 node Cassandra cluster with a replication factor of 2.
>>
>> The load factor on the nodes is around 350Gb
>>
>> Datacenter: Cassandra
>> ==
>> Address  RackStatus State   LoadOwns
>>Token
>>
>>   -5072018636360415943
>> 172.31.7.91  rack1   Up Normal  328.5 GB100.00%
>>   -7068746880841807701
>> 172.31.7.92  rack1   Up Normal  351.7 GB100.00%
>>   -5072018636360415943

Re: what consistency level should I set when using IF NOT EXIST or UPDATE IF statements ?

2016-01-14 Thread Hiroyuki Yamada
Thanks DuyHan !
That's clear and helpful.
(and I realized that we need to call setSerialConsistency for SERIAL and
setConsistency for others.)

Thanks,
Hiro

On Tue, Jan 12, 2016 at 9:34 PM, DuyHai Doan  wrote:

> There are 2 levels of consistency levels you can define on your query when
> using Lightweight Transaction:
>
> - one for the Paxos round: SERIAL or LOCAL_SERIAL (which indeed
> corresponds to QUORUM/LOCAL_QUORUM but named differently so people do not
> get confused)
>
> - one for the consistency of the mutation itself. In this case you can use
> any CL except SERIAL/LOCAL_SERIAL
>
> Setting the consistency level for Paxos is useful in the context of multi
> data centers only. SERIAL => require a majority wrt RF in all DCs.
> LOCAL_SERIAL => majority wrt RF in local DC only
>
> Hope that helps
>
>
>
> On Thu, Jan 7, 2016 at 10:44 AM, Hiroyuki Yamada 
> wrote:
>
>> Hi,
>>
>> I've been doing some POCs of lightweight transactions and
>> I come up with some questions, so please let me ask them to you here.
>>
>> So the question is:
>> what consistency level should I set when using IF NOT EXIST or UPDATE IF
>> statements ?
>>
>> I used the statements with ONE and QUORUM first, then it seems fine.
>> But, when I set SERIAL, it gave me the following error.
>>
>> === error message ===
>> Caused by: com.datastax.driver.core.exceptions.InvalidQueryException:
>> SERIAL is not supported as conditional update commit consistency. Use ANY
>> if you mean "make sure it is accepted but I don't care how many replicas
>> commit it for non-SERIAL reads"
>> === error message ===
>>
>>
>> So, I'm wondering what's SERIAL for when writing (and reading) and
>> what the differences are in setting ONE, QUORUM and ANY when using IF NOT
>> EXIST or UPDATE IF statements.
>>
>> Could you give me some advises ?
>>
>> Thanks,
>> Hiro
>>
>>
>>
>>
>>
>


Re: Cassandra is consuming a lot of disk space

2016-01-14 Thread Jan Kesten
Hi Rahul,

it should work as you would expect - simply copy over the sstables from
your extra disk to the original one. To minimize downtime of the node
you can do something like this:

- rsync the files while the node is still running (sstables are
immutable) to copy most of the data
- edit cassandra.yaml to remove the additional datadir
- shutdown the node
- rsync again (just for the case, a new sstable got written while the
first one was running)
- restart

HTH
Jan

Am 14.01.2016 um 08:38 schrieb Rahul Ramesh:
> One update. I cleared the snapshot using nodetool clearsnapshot command.
> Disk space is recovered now. 
> 
> Because of this issue, I have mounted one more drive to the server and
> there are some data files there. How can I migrate the data so that I
> can decommission the drive? 
> Will it work if I just copy all the contents in the table directory to
> one of the drives? 
> 
> Thanks for all the help.
> 
> Regards,
> Rahul
> 
> On Thursday 14 January 2016, Rahul Ramesh  > wrote:
> 
> Hi Jan,
> I checked it. There are no old Key Spaces or tables.
> Thanks for your pointer, I started looking inside the directories. I
> see lot of snapshots directory inside the table directory. These
> directories are consuming space.
> 
> However these snapshots are not shown  when I issue listsnapshots
> ./bin/nodetool listsnapshots
> Snapshot Details: 
> There are no snapshots
> 
> Can I safely delete those snapshots? why listsnapshots is not
> showing the snapshots? Also in future, how can we find out if there
> are snapshots?
> 
> Thanks,
> Rahul
> 
> 
> 
> On Thu, Jan 14, 2016 at 12:50 PM, Jan Kesten  > wrote:
> 
> Hi Rahul,
> 
> just an idea, did you have a look at the data directorys on disk
> (/var/lib/cassandra/data)? It could be that there are some from
> old keyspaces that have been deleted and snapshoted before. Try
> something like "du -sh /var/lib/cassandra/data/*" to verify
> which keyspace is consuming your space.
> 
> Jan
> 
> Von meinem iPhone gesendet
> 
> Am 14.01.2016 um 07:25 schrieb Rahul Ramesh  >:
> 
>> Thanks for your suggestion. 
>>
>> Compaction was happening on one of the large tables. The disk
>> space did not decrease much after the compaction. So I ran an
>> external compaction. The disk space decreased by around 10%.
>> However it is still consuming close to 750Gb for load of 250Gb. 
>>
>> I even restarted cassandra thinking there may be some open
>> files. However it didnt help much. 
>>
>> Is there any way to find out why so much of data is being
>> consumed? 
>>
>> I checked if there are any open files using lsof. There are
>> not any open files.
>>
>> *Recovery:*
>> Just a wild thought 
>> I am using replication factor of 2 and I have two nodes. If I
>> delete complete data on one of the node, will I be able to
>> recover all the data from the active node? 
>> I don't want to pursue this path as I want to find out the
>> root cause of the issue! 
>>
>>
>> Any help will be greatly appreciated
>>
>> Thank you,
>>
>> Rahul
>>
>>
>>
>>
>>
>>
>> On Wed, Jan 13, 2016 at 3:37 PM, Carlos Rolo > > wrote:
>>
>> You can check if the snapshot exists in the snapshot folder.
>> Repairs stream sstables over, than can temporary increase
>> disk space. But I think Carlos Alonso might be correct.
>> Running compactions might be the issue.
>>
>> Regards,
>>
>> Carlos Juzarte Rolo
>> Cassandra Consultant
>>  
>> Pythian - Love your data
>>
>> rolo@pythian | Twitter: @cjrolo | Linkedin:
>> _linkedin.com/in/carlosjuzarterolo
>> _
>> Mobile: +351 91 891 81 00
>>  | Tel: +1 613 565 8696
>> x1649 
>> www.pythian.com 
>>
>> On Wed, Jan 13, 2016 at 9:24 AM, Carlos Alonso
>> > > wrote:
>>
>> I'd have a look also at possible running compactions.
>>
>> If you have big column families with STCS then large
>> compactions may be happening.
>>
>> Check it with nodetool compactionstats
>>
>> Carlos Alonso | Software 

Modeling approach to widely used textual information

2016-01-14 Thread I PVP
Hi everyone,

I am new to Cassandra and moving a existing myqql application to Cassandra.

As a generic rule, what is the recommended approach for keeping textual 
information like a user_nickname, a company_name, product_title, that will 
potentially be updated at some time and is routinely and repeatedly displayed 
on many use cases across the application like when the end user: see a employee 
list, sees a contact list, send/receive chat messages, see RFQs, see an order, 
see a shipping provider information/tracking, see rantings and reviews, see 
invites and so on?

Situations that  MVs alone cannot solve because would involve multiple tables.

Options:
-

A) The text information should be copied and update on all CFs/tables, that 
were modeled to answer the many queries for the many use cases across the 
application, every time that the information changes on the "source" CF/table 
(like: user table, product table or company table?

OR

B)Should only the ids (person_id/company_id, product_id) be stored across the 
columns families/tables, and at the front end the "source" column familie/table 
is queried to retrieve that specific text field : the person_name/company_name 
and display it ? ( potentially leveraging REST http caching)

OR

C) Other approaches ?

-

I understand that the proper modeling it crucial and that "writes as cheap", 
but new tables will come sooner or later and changing a previously created 
business logic code every-time that a new CF/Table  is created  is not cheap.
At this moment option B is the most likely. Specially with some use cases 
allowing the data like a user's contact list id/names(only), the id/name(only) 
of the companies that the user is doing business with,to be downloaded to the 
frontend at once and used for a couple of seconds/minutes executing a tasks or 
for other use cases having specific REST Services ( /company/id/name, 
product/id/title) to provide these widely used information/fields and 
potentially leveraging http cache for some time to provide the text data across 
the application.

Any advice and guidance will be appreciated.

Thanks for your help.

--
IPVP




Re: New node has high network and disk usage.

2016-01-14 Thread James Griffin
A summary of what we've done this morning:

   - Noted that there are no GCInspector lines in system.log on bad node
   (there are GCInspector logs on other healthy nodes)
   - Turned on GC logging, noted that we had logs which stated out total
   time for which application threads were stopped was high - ~10s.
   - Not seeing failures or any kind (promotion or concurrent mark)
   - Attached Visual VM: noted that heap usage was very low (~5% usage and
   stable) and it didn't display hallmarks GC of activity. PermGen also very
   stable
   - Downloaded GC logs and examined in GC Viewer. Noted that:
   - We had lots of pauses (again around 10s), but no full GC.
  - From a 2,300s sample, just over 2,000s were spent with threads
  paused
  - Spotted many small GCs in the new space - realised that Xmn value
  was very low (200M against a heap size of 3750M). Increased Xmn to 937M -
  no change in server behaviour (high load, high reads/s on disk, high CPU
  wait)

Current output of jstat:

  S0 S1 E  O  P YGC YGCTFGCFGCT GCT
2 0.00  45.20  12.82  26.84  76.21   2333   63.684 20.039   63.724
3 63.58   0.00  33.68   8.04  75.19 141.812 20.1031.915

Correct me if I'm wrong, but it seems 3 is lot more healthy GC wise than 2
(which has normal load statistics).

Anywhere else you can recommend we look?

Griff

On 14 January 2016 at 01:25, Anuj Wadehra  wrote:

> Ok. I saw dropped mutations on your cluster and full gc is a common cause
> for that.
> Can you just search the word GCInspector in system.log and share the
> frequency of minor and full gc. Moreover, are you printing promotion
> failures in gc logs?? Why full gc ia getting triggered??promotion failures
> or concurrent mode failures?
>
> If you are on CMS, you need to fine tune your heap options to address full
> gc.
>
>
>
> Thanks
> Anuj
>
> Sent from Yahoo Mail on Android
> 
>
> On Thu, 14 Jan, 2016 at 12:57 am, James Griffin
>  wrote:
> I think I was incorrect in assuming GC wasn't an issue due to the lack of
> logs. Comparing jstat output on nodes 2 & 3 show some fairly marked
> differences, though
> comparing the startup flags on the two machines show the GC config is
> identical.:
>
> $ jstat -gcutil
>S0 S1 E  O  P YGC YGCTFGCFGCT GCT
> 2  5.08   0.00  55.72  18.24  59.90  25986  619.827281.597  621.424
> 3  0.00   0.00  22.79  17.87  59.99 422600 11225.979   668   57.383
> 11283.361
>
> Here's typical output for iostat on nodes 2 & 3 as well:
>
> $ iostat -dmx md0
>
>   Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> 2 md0   0.00 0.00  339.000.00 9.77 0.00
>  59.00 0.000.000.000.00   0.00   0.00
> 3 md0   0.00 0.00 2069.001.0085.85 0.00
>  84.94 0.000.000.000.00   0.00   0.00
>
> Griff
>
> On 13 January 2016 at 18:36, Anuj Wadehra  wrote:
>
>> Node 2 has slightly higher data but that should be ok. Not sure how read
>> ops are so high when no IO intensive activity such as repair and compaction
>> is running on node 3.May be you can try investigating logs to see whats
>> happening.
>>
>> Others on the mailing list could also share their views on the situation.
>>
>> Thanks
>> Anuj
>>
>>
>>
>> Sent from Yahoo Mail on Android
>> 
>>
>> On Wed, 13 Jan, 2016 at 11:46 pm, James Griffin
>>  wrote:
>> Hi Anuj,
>>
>> Below is the output of nodetool status. The nodes were replaced following
>> the instructions in Datastax documentation for replacing running nodes
>> since the nodes were running fine, it was that the servers had been
>> incorrectly initialised and they thus had less disk space. The status below
>> shows 2 has significantly higher load, however as I say 2 is operating
>> normally and is running compactions, so I guess that's not an issue?
>>
>> Datacenter: datacenter1
>> ===
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address Load   Tokens  Owns   Host ID
>>   Rack
>> UN  1   253.59 GB  256 31.7%
>>  6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>> UN  2   302.23 GB  256 35.3%
>>  faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>> UN  3   265.02 GB  256 33.1%
>>  74b15507-db5c-45df-81db-6e5bcb7438a3  rack1
>>
>> Griff
>>
>> On 13 January 2016 at 18:12, Anuj Wadehra  wrote:
>>
>>> Hi,
>>>
>>> Revisiting the thread I can see that nodetool status had both good and
>>> bad nodes at same time. How do you replace nodes? When you say bad node..I
>>> understand that the node is no more usable even though Cassandra 

Re: New node has high network and disk usage.

2016-01-14 Thread Kai Wang
James,

Can you post the result of "nodetool netstats" on the bad node?

On Thu, Jan 14, 2016 at 9:09 AM, James Griffin <
james.grif...@idioplatform.com> wrote:

> A summary of what we've done this morning:
>
>- Noted that there are no GCInspector lines in system.log on bad node
>(there are GCInspector logs on other healthy nodes)
>- Turned on GC logging, noted that we had logs which stated out total
>time for which application threads were stopped was high - ~10s.
>- Not seeing failures or any kind (promotion or concurrent mark)
>- Attached Visual VM: noted that heap usage was very low (~5% usage
>and stable) and it didn't display hallmarks GC of activity. PermGen also
>very stable
>- Downloaded GC logs and examined in GC Viewer. Noted that:
>- We had lots of pauses (again around 10s), but no full GC.
>   - From a 2,300s sample, just over 2,000s were spent with threads
>   paused
>   - Spotted many small GCs in the new space - realised that Xmn value
>   was very low (200M against a heap size of 3750M). Increased Xmn to 937M 
> -
>   no change in server behaviour (high load, high reads/s on disk, high CPU
>   wait)
>
> Current output of jstat:
>
>   S0 S1 E  O  P YGC YGCTFGCFGCT GCT
> 2 0.00  45.20  12.82  26.84  76.21   2333   63.684 20.039   63.724
> 3 63.58   0.00  33.68   8.04  75.19 141.812 20.1031.915
>
> Correct me if I'm wrong, but it seems 3 is lot more healthy GC wise than 2
> (which has normal load statistics).
>
> Anywhere else you can recommend we look?
>
> Griff
>
> On 14 January 2016 at 01:25, Anuj Wadehra  wrote:
>
>> Ok. I saw dropped mutations on your cluster and full gc is a common cause
>> for that.
>> Can you just search the word GCInspector in system.log and share the
>> frequency of minor and full gc. Moreover, are you printing promotion
>> failures in gc logs?? Why full gc ia getting triggered??promotion failures
>> or concurrent mode failures?
>>
>> If you are on CMS, you need to fine tune your heap options to address
>> full gc.
>>
>>
>>
>> Thanks
>> Anuj
>>
>> Sent from Yahoo Mail on Android
>> 
>>
>> On Thu, 14 Jan, 2016 at 12:57 am, James Griffin
>>  wrote:
>> I think I was incorrect in assuming GC wasn't an issue due to the lack of
>> logs. Comparing jstat output on nodes 2 & 3 show some fairly marked
>> differences, though
>> comparing the startup flags on the two machines show the GC config is
>> identical.:
>>
>> $ jstat -gcutil
>>S0 S1 E  O  P YGC YGCTFGCFGCT GCT
>> 2  5.08   0.00  55.72  18.24  59.90  25986  619.827281.597
>>  621.424
>> 3  0.00   0.00  22.79  17.87  59.99 422600 11225.979   668   57.383
>> 11283.361
>>
>> Here's typical output for iostat on nodes 2 & 3 as well:
>>
>> $ iostat -dmx md0
>>
>>   Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> 2 md0   0.00 0.00  339.000.00 9.77 0.00
>>  59.00 0.000.000.000.00   0.00   0.00
>> 3 md0   0.00 0.00 2069.001.0085.85 0.00
>>  84.94 0.000.000.000.00   0.00   0.00
>>
>> Griff
>>
>> On 13 January 2016 at 18:36, Anuj Wadehra  wrote:
>>
>>> Node 2 has slightly higher data but that should be ok. Not sure how read
>>> ops are so high when no IO intensive activity such as repair and compaction
>>> is running on node 3.May be you can try investigating logs to see whats
>>> happening.
>>>
>>> Others on the mailing list could also share their views on the situation.
>>>
>>> Thanks
>>> Anuj
>>>
>>>
>>>
>>> Sent from Yahoo Mail on Android
>>> 
>>>
>>> On Wed, 13 Jan, 2016 at 11:46 pm, James Griffin
>>>  wrote:
>>> Hi Anuj,
>>>
>>> Below is the output of nodetool status. The nodes were replaced
>>> following the instructions in Datastax documentation for replacing running
>>> nodes since the nodes were running fine, it was that the servers had been
>>> incorrectly initialised and they thus had less disk space. The status below
>>> shows 2 has significantly higher load, however as I say 2 is operating
>>> normally and is running compactions, so I guess that's not an issue?
>>>
>>> Datacenter: datacenter1
>>> ===
>>> Status=Up/Down
>>> |/ State=Normal/Leaving/Joining/Moving
>>> --  Address Load   Tokens  Owns   Host ID
>>> Rack
>>> UN  1   253.59 GB  256 31.7%
>>>  6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>>> UN  2   302.23 GB  256 35.3%
>>>  faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>>> UN  3   265.02 GB  256 33.1%
>>>  74b15507-db5c-45df-81db-6e5bcb7438a3  rack1
>>>
>>> Griff

Re: Cassandra 3.1.1 with respect to HeapSpace

2016-01-14 Thread Tyler Hobbs
I don't think that's a known issue.  Can you open a ticket at
https://issues.apache.org/jira/browse/CASSANDRA and attach your schema
along with the commitlog files and the mutation that was saved to /tmp?

On Thu, Jan 14, 2016 at 10:56 AM, Jean Tremblay <
jean.tremb...@zen-innovations.com> wrote:

> Hi,
>
> I have a small Cassandra Cluster with 5 nodes, having 16MB of RAM.
> I use Cassandra 3.1.1.
> I use the following setup for the memory:
>   MAX_HEAP_SIZE="6G"
> HEAP_NEWSIZE="496M"
>
> I have been loading a lot of data in this cluster over the last 24 hours.
> The system behaved I think very nicely. It was loading very fast, and
> giving excellent read time. There was no error messages until this one:
>
>
> ERROR [SharedPool-Worker-35] 2016-01-14 17:05:23,602
> JVMStabilityInspector.java:139 - JVM state determined to be unstable.
> Exiting forcefully due to:
> java.lang.OutOfMemoryError: Java heap space
> at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[na:1.8.0_65]
> at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_65]
> at
> org.apache.cassandra.io.util.DataOutputBuffer.reallocate(DataOutputBuffer.java:126)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:86)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:297)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:374)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.rows.BufferCell$Serializer.serialize(BufferCell.java:263)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:183)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:96)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:132)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:298)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:136)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:128)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:123)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:47)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[na:1.8.0_65]
> at
> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
> [apache-cassandra-3.1.1.jar:3.1.1]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
>
> 4 nodes out of 5 crashed with this error message. Now when I want to
> restart the first node I have the following error;
>
> ERROR [main] 2016-01-14 17:15:59,617 JVMStabilityInspector.java:81 -
> Exiting due to error while processing commit log during initialization.
> org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException:
> Unexpected error deserializing mutation; saved to
> /tmp/mutation7465380878750576105dat.  This may be caused by replaying a
> mutation against a table with the same name but incompatible schema.
> Exception follows: org.apache.cassandra.serializers.MarshalException: Not
> enough bytes to read a 

Re: Encryption in cassandra

2016-01-14 Thread Jack Krupansky
Cassandra supports both client to node and inter-node security. IOW,
Cassandra can also be a client to another Cassandra node.

To repeat (and you seem to keep ignoring this) - the presumption is that
the user, outside of Cassandra, is responsible for securing the system,
including the file system, so in theory there is no way for anyone besides
a system administrator to directly access any of the actual files within
Cassandra, so there is no way for anybody to access even a clear text file.

-- Jack Krupansky

On Thu, Jan 14, 2016 at 7:32 PM, oleg yusim  wrote:

> Jack, thank you for the link, but I'm not sure what you are referring to
> by Cassandra API security. If you mean TLS connection, Cassandra
> establishing to client and between nodes, than keystore and truststore do
> not seem to participate in it at all because Cassandra is using certs and
> keys, extracted from keystore during this connection, not those which are
> stored in it (that is what made me so surprised and prompted to start this
> discussion).
>
> Now, TLS connection per say would be secure or not secure regardless of
> how you position you keys and certs. What would be important here is
> ciphers you use (and Cassandra is doing that) and ability to use CRL (I do
> not think Cassandra is doing that).
>
> Now if we are talking if positioning of certificates and keys matters for
> Cassandra as a system, than - of course it matters. Certificates and keys
> are credentials Cassandra presents during TLS, so harm is the same as
> leaving password in clear text.
>
> So, help me out here, what am I missing?
>
> Thanks,
>
> Oleg
>
> On Thu, Jan 14, 2016 at 6:10 PM, Jack Krupansky 
> wrote:
>
>> Cassandra is definitely assuming that you, the user, are separately
>> assuring that no intruder gets access to the box/root/login. The keystore
>> and truststore in Cassandra having nothing to do with system security, they
>> are solely for Cassandra API security.
>>
>> System security and Cassandra API security are two completely separate
>> issues. The Cassandra doc on (Cassandra, not system) security is here:
>>
>> https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/secureIntro.html
>>
>>
>>
>> -- Jack Krupansky
>>
>> On Thu, Jan 14, 2016 at 5:49 PM, oleg yusim  wrote:
>>
>>> Jack,
>>>
>>> Thanks for your answer. I guess, I'm a little confused by general
>>> architecture choice. It doesn't seem to be consistent to me. I mean, if we
>>> are building the layer of database specific security (i.e. we are saying,
>>> let's assume intruder is on the box, and he is root, what we can do?), then
>>> it is perfectly logical to build keystore and truststore, hide our keys and
>>> certificates there, encrypt the file with passwords from these stores and
>>> keep the key of the box. That is great, and as a security architect I
>>> applaud this.
>>>
>>> Now, if we are saying - no, we are banking on the fact nobody will break
>>> into the box, and if root is lost - all bets are off, that is fine too. But
>>> in this case, what is the point to even have keystore and truststore?
>>>
>>> Thanks,
>>>
>>> Oleg
>>>
>>> On Thu, Jan 14, 2016 at 4:38 PM, Jack Krupansky <
>>> jack.krupan...@gmail.com> wrote:
>>>
 The point of encryption in Cassandra is to protect data in flight
 between the cluster and clients (or between nodes in the cluster.) The
 presumption is that normal system network access control (e.g., remote
 login, etc.) will preclude bad actors from directly accessing the file
 system on a cluster node.

 -- Jack Krupansky

 On Thu, Jan 14, 2016 at 5:16 PM, oleg yusim 
 wrote:

> Greetings,
>
> Guys, can you please help me to understand following:
>
> I'm reading through the way keystore and truststore are implemented,
> and it is all fine and great, but at the end Cassandra documentation
> instructing to extract all the keystore content and leave all certs and
> keys in a clear.
>
> Do I miss something here? Why are we doing it? What is the point to
> even have a keystore then? It doesn't look very secure to me...
>
> Another item - cassandra.yaml has passwords from keystore and
> truststore - clear text... what is the point to have these stores then, if
> passwords are out?
>
> Thanks,
>
> Oleg
>


>>>
>>
>


Encryption in cassandra

2016-01-14 Thread oleg yusim
Greetings,

Guys, can you please help me to understand following:

I'm reading through the way keystore and truststore are implemented, and it
is all fine and great, but at the end Cassandra documentation instructing
to extract all the keystore content and leave all certs and keys in a clear.

Do I miss something here? Why are we doing it? What is the point to even
have a keystore then? It doesn't look very secure to me...

Another item - cassandra.yaml has passwords from keystore and truststore -
clear text... what is the point to have these stores then, if passwords are
out?

Thanks,

Oleg


Re: Cassandra Performance on a Single Machine

2016-01-14 Thread Anurag Khandelwal
Hi Jack,

> So, your 1GB input size means roughly 716 thousand rows of data and 128GB 
> means roughly 92 million rows, correct?

Yes, that's correct.

> Are your gets and searches returning single rows, or a significant number of 
> rows?

Like I mentioned in my first email, get always returns a single row, and search 
returns variable number of rows. The number of rows returned varies from 1-4000.

> -- Jack Krupansky
> 
>> On Thu, Jan 14, 2016 at 4:43 PM, Anurag Khandelwal  
>> wrote:
>> To clarify: Input size is the size of the dataset as a CSV file, before 
>> loading it into Cassandra; for each input size, the number of columns is 
>> fixed but the number of rows is different. By 1.5KB record, I meant that 
>> each row, when represented as a CSV entry, occupies 1500 bytes. I've used 
>> the terms "row" and "record" interchangeably, which might have been the 
>> source of some confusion.
>> 
>> I'll run the stress tool and report the results as well; the hardware is 
>> whatever AWS provides for c3.8xlarge EC2 instance.
>> 
>> Anurag
>> 
>>> On Jan 14, 2016, at 1:33 PM, Jack Krupansky  
>>> wrote:
>>> 
>>> What exactly is "input size" here (1GB to 128GB)? I mean, the test spec 
>>> "The dataset used comprises of ~1.5KB records...  there are 105 attributes 
>>> in each record." Does each test run have exactly the same number of rows 
>>> and columns and you're just making each column bigger, or what?
>>> 
>>> Cassandra doesn't have "records", so are you really saying that you show 
>>> 1,500 rows? Is it one row per partition or do you have clustering?
>>> 
>>> What are you actually trying to measure? (Some more context would help.)
>>> 
>>> In any case, a latency of 200ms (5 per second) for yor search query seems 
>>> rather low, but we need some clarity on input size.
>>> 
>>> If you just run the cassandra stress tool on your hardware, what kinds of 
>>> numbers do you get. That should be the starting point for any benchmarking 
>>> - how does your hardware perform processing basic requests, before you 
>>> layer your own data modeling on top of that.
>>> 
>>> -- Jack Krupansky
>>> 
 On Thu, Jan 14, 2016 at 4:02 PM, Jonathan Haddad  
 wrote:
 I think you actually get a really useful metric by benchmarking 1 machine. 
  You understand your cluster's theoretical maximum performance, which 
 would be Nodes * number of queries.  Yes, adding in replication and CL is 
 important, but 1 machine lets you isolate certain performance metrics. 
 
> On Thu, Jan 14, 2016 at 12:23 PM Robert Wille  wrote:
> I disagree. I think that you can extrapolate very little information 
> about RF>1 and CL>1 by benchmarking with RF=1 and CL=1.
> 
>> On Jan 13, 2016, at 8:41 PM, Anurag Khandelwal  
>> wrote:
>> 
>> Hi John,
>> 
>> Thanks for responding!
>> 
>> The aim of this benchmark was not to benchmark Cassandra as an 
>> end-to-end distributed system, but to understand a break down of the 
>> performance. For instance, if we understand the performance 
>> characteristics that we can expect from a single machine cassandra 
>> instance with RF=Consistency=1, we can have a good estimate of what the 
>> distributed performance with higher replication factors and consistency 
>> are going to look like. Even in the ideal case, the performance 
>> improvement would scale at most linearly with more machines and replicas.
>> 
>> That being said, I still want to understand whether this is the 
>> performance I should expect for the setup I described; if the 
>> performance for the current setup can be improved, then clearly the 
>> performance for a production setup (with multiple nodes, replicas) would 
>> also improve. Does that make sense?
>> 
>> Thanks!
>> Anurag
>> 
>>> On Jan 6, 2016, at 9:31 AM, John Schulz  wrote:
>>> 
>>> Anurag,
>>> 
>>> Unless you are planning on continuing to use only one machine with RF=1 
>>> benchmarking a single system using RF=Consistancy=1 is mostly a waste 
>>> of time. If you are going to use RF=1 and a single host then why use 
>>> Cassandra at all. Plain old relational dbs should do the job just fine.
>>> Cassandra is designed to be distributed. You won't get the full impact 
>>> of how it scales and the limits on scaling unless you benchmark a 
>>> distributed system. For example the scaling impact of secondary indexes 
>>> will not be visible on a single node.
>>> 
>>> John
>>> 
>>> 
>>> 
 On Tue, Jan 5, 2016 at 3:16 PM, Anurag Khandelwal 
  wrote:
 Hi,
 
 I’ve been benchmarking Cassandra to get an idea of how the performance 
 scales with more data on a single machine. I just wanted to 

Re: Encryption in cassandra

2016-01-14 Thread oleg yusim
Jack,

Thanks for your answer. I guess, I'm a little confused by general
architecture choice. It doesn't seem to be consistent to me. I mean, if we
are building the layer of database specific security (i.e. we are saying,
let's assume intruder is on the box, and he is root, what we can do?), then
it is perfectly logical to build keystore and truststore, hide our keys and
certificates there, encrypt the file with passwords from these stores and
keep the key of the box. That is great, and as a security architect I
applaud this.

Now, if we are saying - no, we are banking on the fact nobody will break
into the box, and if root is lost - all bets are off, that is fine too. But
in this case, what is the point to even have keystore and truststore?

Thanks,

Oleg

On Thu, Jan 14, 2016 at 4:38 PM, Jack Krupansky 
wrote:

> The point of encryption in Cassandra is to protect data in flight between
> the cluster and clients (or between nodes in the cluster.) The presumption
> is that normal system network access control (e.g., remote login, etc.)
> will preclude bad actors from directly accessing the file system on a
> cluster node.
>
> -- Jack Krupansky
>
> On Thu, Jan 14, 2016 at 5:16 PM, oleg yusim  wrote:
>
>> Greetings,
>>
>> Guys, can you please help me to understand following:
>>
>> I'm reading through the way keystore and truststore are implemented, and
>> it is all fine and great, but at the end Cassandra documentation
>> instructing to extract all the keystore content and leave all certs and
>> keys in a clear.
>>
>> Do I miss something here? Why are we doing it? What is the point to even
>> have a keystore then? It doesn't look very secure to me...
>>
>> Another item - cassandra.yaml has passwords from keystore and truststore
>> - clear text... what is the point to have these stores then, if passwords
>> are out?
>>
>> Thanks,
>>
>> Oleg
>>
>
>


Re: Cassandra 3.1.1 with respect to HeapSpace

2016-01-14 Thread Sebastian Estevez
Try starting the other nodes. You may have to delete or mv the commitlog
segment referenced in the error message for the node to come up since
apparently it is corrupted.

All the best,


[image: datastax_logo.png] 

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png]  [image:
facebook.png]  [image: twitter.png]
 [image: g+.png]







DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Thu, Jan 14, 2016 at 1:00 PM, Jean Tremblay <
jean.tremb...@zen-innovations.com> wrote:

> How can I restart?
> It blocks with the error listed below.
> Are my memory settings good for my configuration?
>
> On 14 Jan 2016, at 18:30, Jake Luciani  wrote:
>
> Yes you can restart without data loss.
>
> Can you please include info about how much data you have loaded per node
> and perhaps what your schema looks like?
>
> Thanks
>
> On Thu, Jan 14, 2016 at 12:24 PM, Jean Tremblay <
> jean.tremb...@zen-innovations.com> wrote:
>
>>
>> Ok, I will open a ticket.
>>
>> How could I restart my cluster without loosing everything ?
>> Would there be a better memory configuration to select for my nodes?
>> Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.
>>
>> Thanks
>>
>> Jean
>>
>> On 14 Jan 2016, at 18:19, Tyler Hobbs  wrote:
>>
>> I don't think that's a known issue.  Can you open a ticket at
>> https://issues.apache.org/jira/browse/CASSANDRA and attach your schema
>> along with the commitlog files and the mutation that was saved to /tmp?
>>
>> On Thu, Jan 14, 2016 at 10:56 AM, Jean Tremblay <
>> jean.tremb...@zen-innovations.com> wrote:
>>
>>> Hi,
>>>
>>> I have a small Cassandra Cluster with 5 nodes, having 16MB of RAM.
>>> I use Cassandra 3.1.1.
>>> I use the following setup for the memory:
>>>   MAX_HEAP_SIZE="6G"
>>> HEAP_NEWSIZE="496M"
>>>
>>> I have been loading a lot of data in this cluster over the last 24
>>> hours. The system behaved I think very nicely. It was loading very fast,
>>> and giving excellent read time. There was no error messages until this one:
>>>
>>>
>>> ERROR [SharedPool-Worker-35] 2016-01-14 17:05:23,602
>>> JVMStabilityInspector.java:139 - JVM state determined to be unstable.
>>> Exiting forcefully due to:
>>> java.lang.OutOfMemoryError: Java heap space
>>> at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[na:1.8.0_65]
>>> at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_65]
>>> at
>>> org.apache.cassandra.io.util.DataOutputBuffer.reallocate(DataOutputBuffer.java:126)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:86)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:297)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:374)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.rows.BufferCell$Serializer.serialize(BufferCell.java:263)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:183)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:96)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:132)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77)
>>> 

Re: Encryption in cassandra

2016-01-14 Thread Jack Krupansky
Cassandra is definitely assuming that you, the user, are separately
assuring that no intruder gets access to the box/root/login. The keystore
and truststore in Cassandra having nothing to do with system security, they
are solely for Cassandra API security.

System security and Cassandra API security are two completely separate
issues. The Cassandra doc on (Cassandra, not system) security is here:
https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/secureIntro.html



-- Jack Krupansky

On Thu, Jan 14, 2016 at 5:49 PM, oleg yusim  wrote:

> Jack,
>
> Thanks for your answer. I guess, I'm a little confused by general
> architecture choice. It doesn't seem to be consistent to me. I mean, if we
> are building the layer of database specific security (i.e. we are saying,
> let's assume intruder is on the box, and he is root, what we can do?), then
> it is perfectly logical to build keystore and truststore, hide our keys and
> certificates there, encrypt the file with passwords from these stores and
> keep the key of the box. That is great, and as a security architect I
> applaud this.
>
> Now, if we are saying - no, we are banking on the fact nobody will break
> into the box, and if root is lost - all bets are off, that is fine too. But
> in this case, what is the point to even have keystore and truststore?
>
> Thanks,
>
> Oleg
>
> On Thu, Jan 14, 2016 at 4:38 PM, Jack Krupansky 
> wrote:
>
>> The point of encryption in Cassandra is to protect data in flight between
>> the cluster and clients (or between nodes in the cluster.) The presumption
>> is that normal system network access control (e.g., remote login, etc.)
>> will preclude bad actors from directly accessing the file system on a
>> cluster node.
>>
>> -- Jack Krupansky
>>
>> On Thu, Jan 14, 2016 at 5:16 PM, oleg yusim  wrote:
>>
>>> Greetings,
>>>
>>> Guys, can you please help me to understand following:
>>>
>>> I'm reading through the way keystore and truststore are implemented, and
>>> it is all fine and great, but at the end Cassandra documentation
>>> instructing to extract all the keystore content and leave all certs and
>>> keys in a clear.
>>>
>>> Do I miss something here? Why are we doing it? What is the point to even
>>> have a keystore then? It doesn't look very secure to me...
>>>
>>> Another item - cassandra.yaml has passwords from keystore and truststore
>>> - clear text... what is the point to have these stores then, if passwords
>>> are out?
>>>
>>> Thanks,
>>>
>>> Oleg
>>>
>>
>>
>


Re: Encryption in cassandra

2016-01-14 Thread Jack Krupansky
The point of encryption in Cassandra is to protect data in flight between
the cluster and clients (or between nodes in the cluster.) The presumption
is that normal system network access control (e.g., remote login, etc.)
will preclude bad actors from directly accessing the file system on a
cluster node.

-- Jack Krupansky

On Thu, Jan 14, 2016 at 5:16 PM, oleg yusim  wrote:

> Greetings,
>
> Guys, can you please help me to understand following:
>
> I'm reading through the way keystore and truststore are implemented, and
> it is all fine and great, but at the end Cassandra documentation
> instructing to extract all the keystore content and leave all certs and
> keys in a clear.
>
> Do I miss something here? Why are we doing it? What is the point to even
> have a keystore then? It doesn't look very secure to me...
>
> Another item - cassandra.yaml has passwords from keystore and truststore -
> clear text... what is the point to have these stores then, if passwords are
> out?
>
> Thanks,
>
> Oleg
>


Re: Encryption in cassandra

2016-01-14 Thread daemeon reiydelle
The keys don't have to be on the box. You do need a logi/password for c*.

sent from my mobile
Daemeon C.M. Reiydelle
USA 415.501.0198
London +44.0.20.8144.9872
On Jan 14, 2016 5:16 PM, "oleg yusim"  wrote:

> Greetings,
>
> Guys, can you please help me to understand following:
>
> I'm reading through the way keystore and truststore are implemented, and
> it is all fine and great, but at the end Cassandra documentation
> instructing to extract all the keystore content and leave all certs and
> keys in a clear.
>
> Do I miss something here? Why are we doing it? What is the point to even
> have a keystore then? It doesn't look very secure to me...
>
> Another item - cassandra.yaml has passwords from keystore and truststore -
> clear text... what is the point to have these stores then, if passwords are
> out?
>
> Thanks,
>
> Oleg
>


Re: Encryption in cassandra

2016-01-14 Thread oleg yusim
Daemeon,

Can you, please, give me a bit of beef to your idea? I'm not sure I'm fully
on board here.

Thanks,

Oleg

On Thu, Jan 14, 2016 at 4:52 PM, daemeon reiydelle 
wrote:

> The keys don't have to be on the box. You do need a logi/password for c*.
>
> sent from my mobile
> Daemeon C.M. Reiydelle
> USA 415.501.0198
> London +44.0.20.8144.9872
> On Jan 14, 2016 5:16 PM, "oleg yusim"  wrote:
>
>> Greetings,
>>
>> Guys, can you please help me to understand following:
>>
>> I'm reading through the way keystore and truststore are implemented, and
>> it is all fine and great, but at the end Cassandra documentation
>> instructing to extract all the keystore content and leave all certs and
>> keys in a clear.
>>
>> Do I miss something here? Why are we doing it? What is the point to even
>> have a keystore then? It doesn't look very secure to me...
>>
>> Another item - cassandra.yaml has passwords from keystore and truststore
>> - clear text... what is the point to have these stores then, if passwords
>> are out?
>>
>> Thanks,
>>
>> Oleg
>>
>


Re: Encryption in cassandra

2016-01-14 Thread oleg yusim
Jack, thank you for the link, but I'm not sure what you are referring to by
Cassandra API security. If you mean TLS connection, Cassandra establishing
to client and between nodes, than keystore and truststore do not seem to
participate in it at all because Cassandra is using certs and keys,
extracted from keystore during this connection, not those which are stored
in it (that is what made me so surprised and prompted to start this
discussion).

Now, TLS connection per say would be secure or not secure regardless of how
you position you keys and certs. What would be important here is ciphers
you use (and Cassandra is doing that) and ability to use CRL (I do not
think Cassandra is doing that).

Now if we are talking if positioning of certificates and keys matters for
Cassandra as a system, than - of course it matters. Certificates and keys
are credentials Cassandra presents during TLS, so harm is the same as
leaving password in clear text.

So, help me out here, what am I missing?

Thanks,

Oleg

On Thu, Jan 14, 2016 at 6:10 PM, Jack Krupansky 
wrote:

> Cassandra is definitely assuming that you, the user, are separately
> assuring that no intruder gets access to the box/root/login. The keystore
> and truststore in Cassandra having nothing to do with system security, they
> are solely for Cassandra API security.
>
> System security and Cassandra API security are two completely separate
> issues. The Cassandra doc on (Cassandra, not system) security is here:
>
> https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/secureIntro.html
>
>
>
> -- Jack Krupansky
>
> On Thu, Jan 14, 2016 at 5:49 PM, oleg yusim  wrote:
>
>> Jack,
>>
>> Thanks for your answer. I guess, I'm a little confused by general
>> architecture choice. It doesn't seem to be consistent to me. I mean, if we
>> are building the layer of database specific security (i.e. we are saying,
>> let's assume intruder is on the box, and he is root, what we can do?), then
>> it is perfectly logical to build keystore and truststore, hide our keys and
>> certificates there, encrypt the file with passwords from these stores and
>> keep the key of the box. That is great, and as a security architect I
>> applaud this.
>>
>> Now, if we are saying - no, we are banking on the fact nobody will break
>> into the box, and if root is lost - all bets are off, that is fine too. But
>> in this case, what is the point to even have keystore and truststore?
>>
>> Thanks,
>>
>> Oleg
>>
>> On Thu, Jan 14, 2016 at 4:38 PM, Jack Krupansky > > wrote:
>>
>>> The point of encryption in Cassandra is to protect data in flight
>>> between the cluster and clients (or between nodes in the cluster.) The
>>> presumption is that normal system network access control (e.g., remote
>>> login, etc.) will preclude bad actors from directly accessing the file
>>> system on a cluster node.
>>>
>>> -- Jack Krupansky
>>>
>>> On Thu, Jan 14, 2016 at 5:16 PM, oleg yusim  wrote:
>>>
 Greetings,

 Guys, can you please help me to understand following:

 I'm reading through the way keystore and truststore are implemented,
 and it is all fine and great, but at the end Cassandra documentation
 instructing to extract all the keystore content and leave all certs and
 keys in a clear.

 Do I miss something here? Why are we doing it? What is the point to
 even have a keystore then? It doesn't look very secure to me...

 Another item - cassandra.yaml has passwords from keystore and
 truststore - clear text... what is the point to have these stores then, if
 passwords are out?

 Thanks,

 Oleg

>>>
>>>
>>
>


Re: max connection per user

2016-01-14 Thread oleg yusim
Let me revive this thread a little.

I see, it is possible to limit concurrent connections based on IP or client:

# The maximum number of concurrent client connections.
# The default is -1, which means unlimited.
# native_transport_max_concurrent_connections: -1

# The maximum number of concurrent client connections per source ip.
# The default is -1, which means unlimited.
# native_transport_max_concurrent_connections_per_ip: -1

My question would be - what is the current recommendations Cassandra team
has for those? How many database can handle for sure?

Thanks,

Oleg

On Wed, Jan 13, 2016 at 9:04 PM, oleg yusim  wrote:

> Brian - absolutely.
>
> To give you are brief description of what I'm doing. I'm working for
> VMware as security architect, and they tasked me with creating a STIG
> (working with DISA ) for Cassandra DB. To create a STIG I would walk
> through the Database SRG security controls and assess them against
> Cassandra DB configuration. As the result, I would have to address all the
> security controls in SRG, proposing mitigations where Cassandra can't meet
> it by means of configuring and specifying desired configuration, where it
> would be possible to do so.
>
> At this particular place, I'm dealing with following security control:
>
> The DBMS must limit the number of concurrent sessions to an
> organization-defined number per user for all accounts and/or account types.
>
> Here is the brief dive into why it is needed:
>
>
> Database management includes the ability to control the number of users
> and user sessions utilizing a DBMS. Unlimited concurrent connections to the
> DBMS could allow a successful Denial of Service (DoS) attack by exhausting
> connection resources; and a system can also fail or be degraded by an
> overload of legitimate users. Limiting the number of concurrent sessions
> per user is helpful in reducing these risks.
>
> This requirement addresses concurrent session control for a single
> account. It does not address concurrent sessions by a single user via
> multiple system accounts; and it does not deal with the total number of
> sessions across all accounts.
>
> The capability to limit the number of concurrent sessions per user must be
> configured in or added to the DBMS (for example, by use of a logon
> trigger), when this is technically feasible. Note that it is not sufficient
> to limit sessions via a web server or application server alone, because
> legitimate users and adversaries can potentially connect to the DBMS by
> other means.
>
> The organization will need to define the maximum number of concurrent
> sessions by account type, by account, or a combination thereof. In deciding
> on the appropriate number, it is important to consider the work
> requirements of the various types of users. For example, 2 might be an
> acceptable limit for general users accessing the database via an
> application; but 10 might be too few for a database administrator using a
> database management GUI tool, where each query tab and navigation pane may
> count as a separate session.
>
> (Sessions may also be referred to as connections or logons, which for the
> purposes of this requirement are synonyms.)
>
>
> Now with that in mind, typical way to DoS database would be open more
> connections than database can support, bringing server to its knees.
> Typical way to counter it is limiting number of concurrent user sessions to
> two and number of concurrent administrator sessions to 10.
>
> With the answer Rob provided me with, I'm reduced to searching for
> mitigation control. That might be limiting maximum amount of connections to
> database, to the amount database for sure can support. I know JDBC driver
> has such configuration switches, allowing to go for that. The question now
> is - how many? What is the number of simultanious connections Cassandra
> would be able to bare?
>
> Thanks,
>
> Oleg
>
> On Wed, Jan 13, 2016 at 8:40 PM, Bryan Cheng 
> wrote:
>
>> Are you actively exposing your database to users outside of your
>> organization, or are you just asking about security best practices?
>>
>> If you mean the former, this isn't really a common use case and there
>> isn't a huge amount out of the box that Cassandra will do to help.
>>
>> If you're just asking about security best-practices,
>> http://www.datastax.com/wp-content/uploads/2014/04/WP-DataStax-Enterprise-Best-Practices.pdf
>> has a brief blurb, and there are many resources online for securing
>> Cassandra specifically and databases in general- the approaches are going
>> to be largely the same.
>>
>> Can you describe what avenues you're expecting either intrusion or DOS?
>>
>> On Wed, Jan 13, 2016 at 6:01 PM, oleg yusim  wrote:
>>
>>> OK Rob, I see what you saying. Well, let's dive into the long questions
>>> and answers at this case a bit:
>>>
>>> 1) Is there any other approach Cassandra currently utilizes to mitigate
>>> DoS attacks?
>>> 

Re: Slow performance after upgrading from 2.0.9 to 2.1.11

2016-01-14 Thread Jeff Jirsa
This may be due to https://issues.apache.org/jira/browse/CASSANDRA-10249 / 
https://issues.apache.org/jira/browse/CASSANDRA-8894 - whether or not this is 
really the case depends on how much of your data is in page cache, and whether 
or not you’re using mmap. Since the original question was asked by someone 
using small RAM instances, it’s possible. 

We mitigate this by dropping compression_chunk_size in order to force a smaller 
buffer on reads, so we don’t over read very small blocks. This has other side 
effects (lower compression ratio, more garbage during streaming), but 
significantly speeds up read workloads for us.


From:  Zhiyan Shao
Date:  Thursday, January 14, 2016 at 9:49 AM
To:  "user@cassandra.apache.org"
Cc:  Jeff Jirsa, "Agrawal, Pratik"
Subject:  Re: Slow performance after upgrading from 2.0.9 to 2.1.11

Praveen, if you search "Read is slower in 2.1.6 than 2.0.14" in this forum, you 
can find another thread I sent a while ago. The perf test I did indicated that 
read is slower for 2.1.6 than 2.0.14 so we stayed with 2.0.14.

On Tue, Jan 12, 2016 at 9:35 AM, Peddi, Praveen  wrote:
Thanks Jeff for your reply. Sorry for delayed response. We were running some 
more tests and wanted to wait for the results.

So basically we saw higher CPU with 2.1.11 was higher compared to 2.0.9 (see 
below) for the same exact load test. Memory spikes were also aggressive on 
2.1.11.

So we wanted to rule out any of our custom setting so we ended up doing some 
testing with Cassandra stress test and default Cassandra installation. Here are 
the results we saw between 2.0.9 and 2.1.11. Both are default installations and 
both use Cassandra stress test with same params. This is the closest 
apple-apple comparison we can get. As you can see both read and write latencies 
are 30 to 50% worse in 2.1.11 than 2.0.9. Since we are using default 
installation.

Highlights of the test:
Load: 2x reads and 1x writes
CPU:  2.0.9 (goes upto 25%)  compared to 2.1.11 (goes upto 60%)
Local read latency: 0.039 ms for 2.0.9 and 0.066 ms for 2.1.11

Local write Latency: 0.033 ms for 2.0.9 Vs 0.030 ms for 2.1.11

One observation is, As the number of threads are increased, 2.1.11 read 
latencies are getting worse compared to 2.0.9 (see below table for 24 threads 
vs 54 threads)

Not sure if anyone has done this kind of comparison before and what their 
thoughts are. I am thinking for this same reason 

2.0.9 Plain type  total opsop/spk/s   row/smean 
med0.950.990.999 max   time
 16 threadCount READ668547205720572051.61.32.83.59.685.39.3
 16 threadCount WRITE331463572357235721.312.63.37206.59.3
 16 threadCount total101077710777107771.51.32.73.47.9206.59.3
2.1.11 Plain
 16 threadCount READ670966818681868181.61.52.63.57.961.79.8
 16 threadCount WRITE329043344334433441.41.32.336.556.79.8
 16 threadCount total101016210162101621.61.42.53.2661.79.8
2.0.9 Plain
 24 threadCount READ6641481678167816721.63.77.516.72088.1
 24 threadCount WRITE335864130413041301.71.33.45.425.645.48.1
 24 threadCount total101229712297122971.91.53.56.215.22088.1
2.1.11 Plain
 24 threadCount READ666287433743374332.22.13.44.38.438.39
 24 threadCount WRITE3337237233723372321.93.13.821.937.29
 24 threadCount total101115511155111552.123.34.18.838.39
2.0.9 Plain
 54 threadCount READ671151341913419134192.82.64.26.436.982.45
 54 threadCount WRITE328856575657565752.52.33.95.615.981.55
 54 threadCount total101999319993199932.72.54.15.713.982.45
2.1.11 Plain
 54 threadCount READ667808951895189514.33.96.89.749.469.97.5
 54 threadCount WRITE332204453445344533.53.25.78.236.8687.5
 54 threadCount total1013404134041340443.76.69.24869.97.5

From: Jeff Jirsa 
Date: Thursday, January 7, 2016 at 1:01 AM
To: "user@cassandra.apache.org" , Peddi Praveen 

Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11

Anecdotal evidence typically agrees that 2.1 is faster than 2.0 (our experience 
was anywhere from 20-60%, depending on workload).

However, it’s not necessarily true that everything behaves exactly the same – 
in particular, memtables are different, commitlog segment handling is 
different, and GC params may need to be tuned differently for 2.1 than 2.0.

When the system is busy, what’s it actually DOING? Cassandra exposes a TON of 
metrics – have you plugged any into a reporting system to see what’s going on? 
Is your latency due to pegged cpu, iowait/disk queues or gc pauses? 

My colleagues spent a lot of time validating different AWS EBS configs (video 
from reinvent at https://www.youtube.com/watch?v=1R-mgOcOSd4), 2.1 was faster 
in almost every case, but you’re using an instance size I don’t believe we 
tried (too little RAM to be viable in production).  c3.2xl only gives you 15G 
of ram – most “performance” based systems want 2-4x that (people running G1 
heaps 

Re: Slow performance after upgrading from 2.0.9 to 2.1.11

2016-01-14 Thread Peddi, Praveen
Hi,
We will try with reduced “rar_buffer_size” to 4KB. However 
CASSANDRA-10249 says 
"this only affects users who have 1. disabled compression, 2. switched to 
buffered i/o from mmap’d”. None of this is true for us I believe. We use 
default disk_access_mode which should be mmap. We also used LZ4Compressor when 
created table.

We will let you know if this property had any effect. We were testing with 
2.1.11 and this was only fixed in 2.1.12 so we need to play with latest version.

Praveen





From: Jeff Jirsa >
Reply-To: >
Date: Thursday, January 14, 2016 at 1:29 PM
To: Zhiyan Shao >, 
"user@cassandra.apache.org" 
>
Cc: "Agrawal, Pratik" >
Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11

This may be due to https://issues.apache.org/jira/browse/CASSANDRA-10249 / 
https://issues.apache.org/jira/browse/CASSANDRA-8894 - whether or not this is 
really the case depends on how much of your data is in page cache, and whether 
or not you’re using mmap. Since the original question was asked by someone 
using small RAM instances, it’s possible.

We mitigate this by dropping compression_chunk_size in order to force a smaller 
buffer on reads, so we don’t over read very small blocks. This has other side 
effects (lower compression ratio, more garbage during streaming), but 
significantly speeds up read workloads for us.


From: Zhiyan Shao
Date: Thursday, January 14, 2016 at 9:49 AM
To: "user@cassandra.apache.org"
Cc: Jeff Jirsa, "Agrawal, Pratik"
Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11

Praveen, if you search "Read is slower in 2.1.6 than 2.0.14" in this forum, you 
can find another thread I sent a while ago. The perf test I did indicated that 
read is slower for 2.1.6 than 2.0.14 so we stayed with 2.0.14.

On Tue, Jan 12, 2016 at 9:35 AM, Peddi, Praveen 
> wrote:
Thanks Jeff for your reply. Sorry for delayed response. We were running some 
more tests and wanted to wait for the results.

So basically we saw higher CPU with 2.1.11 was higher compared to 2.0.9 (see 
below) for the same exact load test. Memory spikes were also aggressive on 
2.1.11.

So we wanted to rule out any of our custom setting so we ended up doing some 
testing with Cassandra stress test and default Cassandra installation. Here are 
the results we saw between 2.0.9 and 2.1.11. Both are default installations and 
both use Cassandra stress test with same params. This is the closest 
apple-apple comparison we can get. As you can see both read and write latencies 
are 30 to 50% worse in 2.1.11 than 2.0.9. Since we are using default 
installation.

Highlights of the test:
Load: 2x reads and 1x writes
CPU:  2.0.9 (goes upto 25%)  compared to 2.1.11 (goes upto 60%)
Local read latency: 0.039 ms for 2.0.9 and 0.066 ms for 2.1.11
Local write Latency: 0.033 ms for 2.0.9 Vs 0.030 ms for 2.1.11
One observation is, As the number of threads are increased, 2.1.11 read 
latencies are getting worse compared to 2.0.9 (see below table for 24 threads 
vs 54 threads)
Not sure if anyone has done this kind of comparison before and what their 
thoughts are. I am thinking for this same reason

2.0.9 Plain  type total ops op/spk/s   
row/smean med0.950.990.999max   
time
 16 threadCount  READ   66854   7205720572051.6 1.3 2.8 
3.5 9.6 85.39.3
 16 threadCount  WRITE  33146   3572357235721.3 1   2.6 
3.3 7   206.5   9.3
 16 threadCount  total  10  10777   10777   10777   1.5 1.3 2.7 
3.4 7.9 206.5   9.3
2.1.11 Plain
 16 threadCount  READ   67096   6818681868181.6 1.5 2.6 
3.5 7.9 61.79.8
 16 threadCount  WRITE  32904   3344334433441.4 1.3 2.3 
3   6.5 56.79.8
 16 threadCount  total  10  10162   10162   10162   1.6 1.4 2.5 
3.2 6   61.79.8
2.0.9 Plain
 24 threadCount  READ   66414   8167816781672   1.6 3.7 
7.5 16.7208 8.1
 24 threadCount  WRITE  33586   4130413041301.7 1.3 3.4 
5.4 25.645.48.1
 24 threadCount  total  10  12297   12297   12297   1.9 1.5 3.5 
6.2 15.2208 8.1
2.1.11 Plain
 24 threadCount  READ   66628   7433743374332.2 2.1 3.4 
4.3 8.4 38.39
 24 threadCount  WRITE  33372   3723372337232   1.9 3.1 
3.8 21.937.29
 24 threadCount 

Re: Sorting & pagination in apache cassandra 2.1

2016-01-14 Thread anuja jain
@Jonathan
what do you mean by "you'll need to maintain your own materialized view
tables"?
does it mean we have to create new table for each query?

On Wed, Jan 13, 2016 at 7:40 PM, Narendra Sharma 
wrote:

> In the example you gave the primary key user _ name is the row key. Since
> the default partition is random you are getting rows in random order.
>
> Since each row no clustering column there is no further grouping of data.
> Or in simple terms each row has one record and is being returned ordered by
> column name.
>
> To see some meaningful ordering there should be some clustering column
> defined.
>
> You can use create additional column families to maintain ordering. Or use
> external solutions like elasticsearch.
> On Jan 12, 2016 10:07 PM, "anuja jain"  wrote:
>
>> I understand the meaning of SSTable but whats the reason behind sorting
>> the table on the basis of int columns first..
>> Is there any data type preference in cassandra?
>> Also What is the alternative to creating materialised views if my
>> cassandra version is prior to 3.0 (specifically 2.1) and which is already
>> in production.?
>>
>>
>> On Wed, Jan 13, 2016 at 12:17 AM, Robert Coli 
>> wrote:
>>
>>> On Mon, Jan 11, 2016 at 11:30 PM, anuja jain 
>>> wrote:
>>>
 1 more question, what does it mean by "cassandra inherently sorts data"?

>>>
>>> SSTable = Sorted Strings Table.
>>>
>>> It doesn't contain "Strings" anymore, really, but that's a hint.. :)
>>>
>>> =Rob
>>>
>>
>>


Re: Cassandra Performance on a Single Machine

2016-01-14 Thread Jack Krupansky
What exactly is "input size" here (1GB to 128GB)? I mean, the test spec "The
dataset used comprises of ~1.5KB records...  there are 105 attributes in
each record." Does each test run have exactly the same number of rows and
columns and you're just making each column bigger, or what?

Cassandra doesn't have "records", so are you really saying that you show
1,500 rows? Is it one row per partition or do you have clustering?

What are you actually trying to measure? (Some more context would help.)

In any case, a latency of 200ms (5 per second) for yor search query seems
rather low, but we need some clarity on input size.

If you just run the cassandra stress tool on your hardware, what kinds of
numbers do you get. That should be the starting point for any benchmarking
- how does your hardware perform processing basic requests, before you
layer your own data modeling on top of that.

-- Jack Krupansky

On Thu, Jan 14, 2016 at 4:02 PM, Jonathan Haddad  wrote:

> I think you actually get a really useful metric by benchmarking 1
> machine.  You understand your cluster's theoretical maximum performance,
> which would be Nodes * number of queries.  Yes, adding in replication and
> CL is important, but 1 machine lets you isolate certain performance
> metrics.
>
> On Thu, Jan 14, 2016 at 12:23 PM Robert Wille  wrote:
>
>> I disagree. I think that you can extrapolate very little information
>> about RF>1 and CL>1 by benchmarking with RF=1 and CL=1.
>>
>> On Jan 13, 2016, at 8:41 PM, Anurag Khandelwal 
>> wrote:
>>
>> Hi John,
>>
>> Thanks for responding!
>>
>> The aim of this benchmark was not to benchmark Cassandra as an end-to-end
>> distributed system, but to understand a break down of the performance. For
>> instance, if we understand the performance characteristics that we can
>> expect from a single machine cassandra instance with RF=Consistency=1, we
>> can have a good estimate of what the distributed performance with higher
>> replication factors and consistency are going to look like. Even in the
>> ideal case, the performance improvement would scale at most linearly with
>> more machines and replicas.
>>
>> That being said, I still want to understand whether this is the
>> performance I should expect for the setup I described; if the performance
>> for the current setup can be improved, then clearly the performance for a
>> production setup (with multiple nodes, replicas) would also improve. Does
>> that make sense?
>>
>> Thanks!
>> Anurag
>>
>> On Jan 6, 2016, at 9:31 AM, John Schulz  wrote:
>>
>> Anurag,
>>
>> Unless you are planning on continuing to use only one machine with RF=1
>> benchmarking a single system using RF=Consistancy=1 is mostly a waste of
>> time. If you are going to use RF=1 and a single host then why use Cassandra
>> at all. Plain old relational dbs should do the job just fine.
>>
>> Cassandra is designed to be distributed. You won't get the full impact of
>> how it scales and the limits on scaling unless you benchmark a distributed
>> system. For example the scaling impact of secondary indexes will not be
>> visible on a single node.
>>
>> John
>>
>>
>>
>>
>> On Tue, Jan 5, 2016 at 3:16 PM, Anurag Khandelwal 
>> wrote:
>>
>>> Hi,
>>>
>>> I’ve been benchmarking Cassandra to get an idea of how the performance
>>> scales with more data on a single machine. I just wanted to get some
>>> feedback to whether these are the numbers I should expect.
>>>
>>> The benchmarks are quite simple — I measure the latency and throughput
>>> for two kinds of queries:
>>>
>>> 1. get() queries - These fetch an entire row for a given primary key.
>>> 2. search() queries - These fetch all the primary keys for rows where a
>>> particular column matches a particular value (e.g., “name” is “John
>>> Smith”).
>>>
>>> Indexes are constructed for all columns that are queried.
>>>
>>> *Dataset*
>>>
>>> The dataset used comprises of ~1.5KB records (on an average) when
>>> represented as CSV; there are 105 attributes in each record.
>>>
>>> *Queries*
>>>
>>> For get() queries, randomly generated primary keys are used.
>>>
>>> For search() queries, column values are selected such that their total
>>> number of occurrences in the dataset is between 1 - 4000. For example, a
>>> query for  “name” = “John Smith” would only be performed if the number of
>>> rows that contain the same lies between 1-4000.
>>>
>>> The results for the benchmarks are provided below:
>>>
>>> *Latency Measurements*
>>>
>>> The latency measurements are an average of 1 queries.
>>>
>>>
>>>
>>>
>>>
>>> *Throughput Measurements*
>>>
>>> The throughput measurements were repeated for 1-16 client threads, and
>>> the numbers reported for each input size is for the configuration (i.e., #
>>> client threads) with the highest throughput.
>>>
>>>
>>>
>>>
>>>
>>> Any feedback here would be greatly appreciated!
>>>
>>> Thanks!
>>> Anurag
>>>
>>>
>>
>>
>> 

Re: Cassandra Performance on a Single Machine

2016-01-14 Thread Anurag Khandelwal
To clarify: Input size is the size of the dataset as a CSV file, before loading 
it into Cassandra; for each input size, the number of columns is fixed but the 
number of rows is different. By 1.5KB record, I meant that each row, when 
represented as a CSV entry, occupies 1500 bytes. I've used the terms "row" and 
"record" interchangeably, which might have been the source of some confusion.

I'll run the stress tool and report the results as well; the hardware is 
whatever AWS provides for c3.8xlarge EC2 instance.

Anurag

> On Jan 14, 2016, at 1:33 PM, Jack Krupansky  wrote:
> 
> What exactly is "input size" here (1GB to 128GB)? I mean, the test spec "The 
> dataset used comprises of ~1.5KB records...  there are 105 attributes in each 
> record." Does each test run have exactly the same number of rows and columns 
> and you're just making each column bigger, or what?
> 
> Cassandra doesn't have "records", so are you really saying that you show 
> 1,500 rows? Is it one row per partition or do you have clustering?
> 
> What are you actually trying to measure? (Some more context would help.)
> 
> In any case, a latency of 200ms (5 per second) for yor search query seems 
> rather low, but we need some clarity on input size.
> 
> If you just run the cassandra stress tool on your hardware, what kinds of 
> numbers do you get. That should be the starting point for any benchmarking - 
> how does your hardware perform processing basic requests, before you layer 
> your own data modeling on top of that.
> 
> -- Jack Krupansky
> 
>> On Thu, Jan 14, 2016 at 4:02 PM, Jonathan Haddad  wrote:
>> I think you actually get a really useful metric by benchmarking 1 machine.  
>> You understand your cluster's theoretical maximum performance, which would 
>> be Nodes * number of queries.  Yes, adding in replication and CL is 
>> important, but 1 machine lets you isolate certain performance metrics. 
>> 
>>> On Thu, Jan 14, 2016 at 12:23 PM Robert Wille  wrote:
>>> I disagree. I think that you can extrapolate very little information about 
>>> RF>1 and CL>1 by benchmarking with RF=1 and CL=1.
>>> 
 On Jan 13, 2016, at 8:41 PM, Anurag Khandelwal  
 wrote:
 
 Hi John,
 
 Thanks for responding!
 
 The aim of this benchmark was not to benchmark Cassandra as an end-to-end 
 distributed system, but to understand a break down of the performance. For 
 instance, if we understand the performance characteristics that we can 
 expect from a single machine cassandra instance with RF=Consistency=1, we 
 can have a good estimate of what the distributed performance with higher 
 replication factors and consistency are going to look like. Even in the 
 ideal case, the performance improvement would scale at most linearly with 
 more machines and replicas.
 
 That being said, I still want to understand whether this is the 
 performance I should expect for the setup I described; if the performance 
 for the current setup can be improved, then clearly the performance for a 
 production setup (with multiple nodes, replicas) would also improve. Does 
 that make sense?
 
 Thanks!
 Anurag
 
> On Jan 6, 2016, at 9:31 AM, John Schulz  wrote:
> 
> Anurag,
> 
> Unless you are planning on continuing to use only one machine with RF=1 
> benchmarking a single system using RF=Consistancy=1 is mostly a waste of 
> time. If you are going to use RF=1 and a single host then why use 
> Cassandra at all. Plain old relational dbs should do the job just fine.
> Cassandra is designed to be distributed. You won't get the full impact of 
> how it scales and the limits on scaling unless you benchmark a 
> distributed system. For example the scaling impact of secondary indexes 
> will not be visible on a single node.
> 
> John
> 
> 
> 
>> On Tue, Jan 5, 2016 at 3:16 PM, Anurag Khandelwal  
>> wrote:
>> Hi,
>> 
>> I’ve been benchmarking Cassandra to get an idea of how the performance 
>> scales with more data on a single machine. I just wanted to get some 
>> feedback to whether these are the numbers I should expect.
>> 
>> The benchmarks are quite simple — I measure the latency and throughput 
>> for two kinds of queries:
>> 
>> 1. get() queries - These fetch an entire row for a given primary key.
>> 2. search() queries - These fetch all the primary keys for rows where a 
>> particular column matches a particular value (e.g., “name” is “John 
>> Smith”). 
>> 
>> Indexes are constructed for all columns that are queried.
>> 
>> Dataset
>> 
>> The dataset used comprises of ~1.5KB records (on an average) when 
>> represented as CSV; there are 105 attributes in each record.
>> 
>> Queries

Re: Cassandra Performance on a Single Machine

2016-01-14 Thread Jonathan Haddad
I think you actually get a really useful metric by benchmarking 1 machine.
You understand your cluster's theoretical maximum performance, which would
be Nodes * number of queries.  Yes, adding in replication and CL is
important, but 1 machine lets you isolate certain performance metrics.

On Thu, Jan 14, 2016 at 12:23 PM Robert Wille  wrote:

> I disagree. I think that you can extrapolate very little information about
> RF>1 and CL>1 by benchmarking with RF=1 and CL=1.
>
> On Jan 13, 2016, at 8:41 PM, Anurag Khandelwal 
> wrote:
>
> Hi John,
>
> Thanks for responding!
>
> The aim of this benchmark was not to benchmark Cassandra as an end-to-end
> distributed system, but to understand a break down of the performance. For
> instance, if we understand the performance characteristics that we can
> expect from a single machine cassandra instance with RF=Consistency=1, we
> can have a good estimate of what the distributed performance with higher
> replication factors and consistency are going to look like. Even in the
> ideal case, the performance improvement would scale at most linearly with
> more machines and replicas.
>
> That being said, I still want to understand whether this is the
> performance I should expect for the setup I described; if the performance
> for the current setup can be improved, then clearly the performance for a
> production setup (with multiple nodes, replicas) would also improve. Does
> that make sense?
>
> Thanks!
> Anurag
>
> On Jan 6, 2016, at 9:31 AM, John Schulz  wrote:
>
> Anurag,
>
> Unless you are planning on continuing to use only one machine with RF=1
> benchmarking a single system using RF=Consistancy=1 is mostly a waste of
> time. If you are going to use RF=1 and a single host then why use Cassandra
> at all. Plain old relational dbs should do the job just fine.
>
> Cassandra is designed to be distributed. You won't get the full impact of
> how it scales and the limits on scaling unless you benchmark a distributed
> system. For example the scaling impact of secondary indexes will not be
> visible on a single node.
>
> John
>
>
>
>
> On Tue, Jan 5, 2016 at 3:16 PM, Anurag Khandelwal 
> wrote:
>
>> Hi,
>>
>> I’ve been benchmarking Cassandra to get an idea of how the performance
>> scales with more data on a single machine. I just wanted to get some
>> feedback to whether these are the numbers I should expect.
>>
>> The benchmarks are quite simple — I measure the latency and throughput
>> for two kinds of queries:
>>
>> 1. get() queries - These fetch an entire row for a given primary key.
>> 2. search() queries - These fetch all the primary keys for rows where a
>> particular column matches a particular value (e.g., “name” is “John
>> Smith”).
>>
>> Indexes are constructed for all columns that are queried.
>>
>> *Dataset*
>>
>> The dataset used comprises of ~1.5KB records (on an average) when
>> represented as CSV; there are 105 attributes in each record.
>>
>> *Queries*
>>
>> For get() queries, randomly generated primary keys are used.
>>
>> For search() queries, column values are selected such that their total
>> number of occurrences in the dataset is between 1 - 4000. For example, a
>> query for  “name” = “John Smith” would only be performed if the number of
>> rows that contain the same lies between 1-4000.
>>
>> The results for the benchmarks are provided below:
>>
>> *Latency Measurements*
>>
>> The latency measurements are an average of 1 queries.
>>
>>
>>
>>
>>
>> *Throughput Measurements*
>>
>> The throughput measurements were repeated for 1-16 client threads, and
>> the numbers reported for each input size is for the configuration (i.e., #
>> client threads) with the highest throughput.
>>
>>
>>
>>
>>
>> Any feedback here would be greatly appreciated!
>>
>> Thanks!
>> Anurag
>>
>>
>
>
> --
>
> John H. Schulz
>
> Principal Consultant
>
> Pythian - Love your data
>
>
> sch...@pythian.com |  Linkedin
> www.linkedin.com/pub/john-schulz/13/ab2/930/
>
> Mobile: 248-376-3380
>
> *www.pythian.com *
>
> --
>
>
>
>
>
>
>


Re: Slow performance after upgrading from 2.0.9 to 2.1.11

2016-01-14 Thread Jeff Jirsa
Sorry I wasn’t as explicit as I should have been

The same buffer size is used by compressed reads as well, but tuned with 
compression_chunk_size table property. It’s likely true that if you lower 
compression_chunk_size, you’ll see improved read performance. 

This was covered in the AWS re:Invent youtube link I sent in my original reply.



From:  "Peddi, Praveen"
Reply-To:  "user@cassandra.apache.org"
Date:  Thursday, January 14, 2016 at 11:36 AM
To:  "user@cassandra.apache.org", Zhiyan Shao
Cc:  "Agrawal, Pratik"
Subject:  Re: Slow performance after upgrading from 2.0.9 to 2.1.11

Hi,
We will try with reduced “rar_buffer_size” to 4KB. However CASSANDRA-10249 says 
"this only affects users who have 1. disabled compression, 2. switched to 
buffered i/o from mmap’d”. None of this is true for us I believe. We use 
default disk_access_mode which should be mmap. We also used LZ4Compressor when 
created table.

We will let you know if this property had any effect. We were testing with 
2.1.11 and this was only fixed in 2.1.12 so we need to play with latest version.

Praveen



From: Jeff Jirsa 
Reply-To: 
Date: Thursday, January 14, 2016 at 1:29 PM
To: Zhiyan Shao , "user@cassandra.apache.org" 

Cc: "Agrawal, Pratik" 
Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11

This may be due to https://issues.apache.org/jira/browse/CASSANDRA-10249 / 
https://issues.apache.org/jira/browse/CASSANDRA-8894 - whether or not this is 
really the case depends on how much of your data is in page cache, and whether 
or not you’re using mmap. Since the original question was asked by someone 
using small RAM instances, it’s possible. 

We mitigate this by dropping compression_chunk_size in order to force a smaller 
buffer on reads, so we don’t over read very small blocks. This has other side 
effects (lower compression ratio, more garbage during streaming), but 
significantly speeds up read workloads for us.


From: Zhiyan Shao
Date: Thursday, January 14, 2016 at 9:49 AM
To: "user@cassandra.apache.org"
Cc: Jeff Jirsa, "Agrawal, Pratik"
Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11

Praveen, if you search "Read is slower in 2.1.6 than 2.0.14" in this forum, you 
can find another thread I sent a while ago. The perf test I did indicated that 
read is slower for 2.1.6 than 2.0.14 so we stayed with 2.0.14.

On Tue, Jan 12, 2016 at 9:35 AM, Peddi, Praveen  wrote:
Thanks Jeff for your reply. Sorry for delayed response. We were running some 
more tests and wanted to wait for the results.

So basically we saw higher CPU with 2.1.11 was higher compared to 2.0.9 (see 
below) for the same exact load test. Memory spikes were also aggressive on 
2.1.11.

So we wanted to rule out any of our custom setting so we ended up doing some 
testing with Cassandra stress test and default Cassandra installation. Here are 
the results we saw between 2.0.9 and 2.1.11. Both are default installations and 
both use Cassandra stress test with same params. This is the closest 
apple-apple comparison we can get. As you can see both read and write latencies 
are 30 to 50% worse in 2.1.11 than 2.0.9. Since we are using default 
installation.

Highlights of the test:
Load: 2x reads and 1x writes
CPU:  2.0.9 (goes upto 25%)  compared to 2.1.11 (goes upto 60%)
Local read latency: 0.039 ms for 2.0.9 and 0.066 ms for 2.1.11

Local write Latency: 0.033 ms for 2.0.9 Vs 0.030 ms for 2.1.11

One observation is, As the number of threads are increased, 2.1.11 read 
latencies are getting worse compared to 2.0.9 (see below table for 24 threads 
vs 54 threads)

Not sure if anyone has done this kind of comparison before and what their 
thoughts are. I am thinking for this same reason 

2.0.9 Plain type  total opsop/spk/s   row/smean 
med0.950.990.999 max   time
 16 threadCount READ668547205720572051.61.32.83.59.685.39.3
 16 threadCount WRITE331463572357235721.312.63.37206.59.3
 16 threadCount total101077710777107771.51.32.73.47.9206.59.3
2.1.11 Plain
 16 threadCount READ670966818681868181.61.52.63.57.961.79.8
 16 threadCount WRITE329043344334433441.41.32.336.556.79.8
 16 threadCount total101016210162101621.61.42.53.2661.79.8
2.0.9 Plain
 24 threadCount READ6641481678167816721.63.77.516.72088.1
 24 threadCount WRITE335864130413041301.71.33.45.425.645.48.1
 24 threadCount total101229712297122971.91.53.56.215.22088.1
2.1.11 Plain
 24 threadCount READ666287433743374332.22.13.44.38.438.39
 24 threadCount WRITE3337237233723372321.93.13.821.937.29
 24 threadCount total101115511155111552.123.34.18.838.39
2.0.9 Plain
 54 threadCount READ671151341913419134192.82.64.26.436.982.45
 54 threadCount WRITE328856575657565752.52.33.95.615.981.55
 54 threadCount total101999319993199932.72.54.15.713.982.45
2.1.11 Plain  

Re: Cassandra Performance on a Single Machine

2016-01-14 Thread Robert Wille
I disagree. I think that you can extrapolate very little information about RF>1 
and CL>1 by benchmarking with RF=1 and CL=1.

On Jan 13, 2016, at 8:41 PM, Anurag Khandelwal 
> wrote:

Hi John,

Thanks for responding!

The aim of this benchmark was not to benchmark Cassandra as an end-to-end 
distributed system, but to understand a break down of the performance. For 
instance, if we understand the performance characteristics that we can expect 
from a single machine cassandra instance with RF=Consistency=1, we can have a 
good estimate of what the distributed performance with higher replication 
factors and consistency are going to look like. Even in the ideal case, the 
performance improvement would scale at most linearly with more machines and 
replicas.

That being said, I still want to understand whether this is the performance I 
should expect for the setup I described; if the performance for the current 
setup can be improved, then clearly the performance for a production setup 
(with multiple nodes, replicas) would also improve. Does that make sense?

Thanks!
Anurag

On Jan 6, 2016, at 9:31 AM, John Schulz 
> wrote:

Anurag,

Unless you are planning on continuing to use only one machine with RF=1 
benchmarking a single system using RF=Consistancy=1 is mostly a waste of time. 
If you are going to use RF=1 and a single host then why use Cassandra at all. 
Plain old relational dbs should do the job just fine.

Cassandra is designed to be distributed. You won't get the full impact of how 
it scales and the limits on scaling unless you benchmark a distributed system. 
For example the scaling impact of secondary indexes will not be visible on a 
single node.

John



On Tue, Jan 5, 2016 at 3:16 PM, Anurag Khandelwal 
> wrote:
Hi,

I’ve been benchmarking Cassandra to get an idea of how the performance scales 
with more data on a single machine. I just wanted to get some feedback to 
whether these are the numbers I should expect.

The benchmarks are quite simple — I measure the latency and throughput for two 
kinds of queries:

1. get() queries - These fetch an entire row for a given primary key.
2. search() queries - These fetch all the primary keys for rows where a 
particular column matches a particular value (e.g., “name” is “John Smith”).

Indexes are constructed for all columns that are queried.

Dataset

The dataset used comprises of ~1.5KB records (on an average) when represented 
as CSV; there are 105 attributes in each record.

Queries

For get() queries, randomly generated primary keys are used.

For search() queries, column values are selected such that their total number 
of occurrences in the dataset is between 1 - 4000. For example, a query for  
“name” = “John Smith” would only be performed if the number of rows that 
contain the same lies between 1-4000.

The results for the benchmarks are provided below:

Latency Measurements

The latency measurements are an average of 1 queries.





Throughput Measurements

The throughput measurements were repeated for 1-16 client threads, and the 
numbers reported for each input size is for the configuration (i.e., # client 
threads) with the highest throughput.





Any feedback here would be greatly appreciated!

Thanks!
Anurag




--

John H. Schulz

Principal Consultant

Pythian - Love your data


sch...@pythian.com |  Linkedin 
www.linkedin.com/pub/john-schulz/13/ab2/930/

Mobile: 248-376-3380

www.pythian.com


--







Re: New node has high network and disk usage.

2016-01-14 Thread Kai Wang
James,

I may miss something. You mentioned your cluster had RF=3. Then why does
"nodetool status" show each node owns 1/3 of the data especially after a
full repair?

On Thu, Jan 14, 2016 at 9:56 AM, James Griffin <
james.grif...@idioplatform.com> wrote:

> Hi Kai,
>
> Below - nothing going on that I can see
>
> $ nodetool netstats
> Mode: NORMAL
> Not sending any streams.
> Read Repair Statistics:
> Attempted: 0
> Mismatch (Blocking): 0
> Mismatch (Background): 0
> Pool NameActive   Pending  Completed
> Commandsn/a 0   6326
> Responses   n/a 0 219356
>
>
>
> Best wishes,
>
> Griff
>
> [image: idioplatform] James "Griff" Griffin
> CTO
> Switchboard: +44 (0)20 3540 1920 | Direct: +44 (0)7763 139 206 | Twitter:
> @imaginaryroots  | Skype: j.s.griffin
> idio helps major brands and publishers to build closer relationships with
> their customers and prospects by learning from their content consumption
> and acting on that insight. We call it Content Intelligence, and it
> integrates with your existing marketing technology to provide detailed
> customer interest profiles in real-time across all channels, and to
> personalize content into every channel for every customer. See
> http://idioplatform.com
> 
>  for
> more information.
>
> On 14 January 2016 at 14:22, Kai Wang  wrote:
>
>> James,
>>
>> Can you post the result of "nodetool netstats" on the bad node?
>>
>> On Thu, Jan 14, 2016 at 9:09 AM, James Griffin <
>> james.grif...@idioplatform.com> wrote:
>>
>>> A summary of what we've done this morning:
>>>
>>>- Noted that there are no GCInspector lines in system.log on bad
>>>node (there are GCInspector logs on other healthy nodes)
>>>- Turned on GC logging, noted that we had logs which stated out
>>>total time for which application threads were stopped was high - ~10s.
>>>- Not seeing failures or any kind (promotion or concurrent mark)
>>>- Attached Visual VM: noted that heap usage was very low (~5% usage
>>>and stable) and it didn't display hallmarks GC of activity. PermGen also
>>>very stable
>>>- Downloaded GC logs and examined in GC Viewer. Noted that:
>>>- We had lots of pauses (again around 10s), but no full GC.
>>>   - From a 2,300s sample, just over 2,000s were spent with threads
>>>   paused
>>>   - Spotted many small GCs in the new space - realised that Xmn
>>>   value was very low (200M against a heap size of 3750M). Increased Xmn 
>>> to
>>>   937M - no change in server behaviour (high load, high reads/s on 
>>> disk, high
>>>   CPU wait)
>>>
>>> Current output of jstat:
>>>
>>>   S0 S1 E  O  P YGC YGCTFGCFGCT GCT
>>> 2 0.00  45.20  12.82  26.84  76.21   2333   63.684 20.039
>>> 63.724
>>> 3 63.58   0.00  33.68   8.04  75.19 141.812 20.103
>>>  1.915
>>>
>>> Correct me if I'm wrong, but it seems 3 is lot more healthy GC wise than
>>> 2 (which has normal load statistics).
>>>
>>> Anywhere else you can recommend we look?
>>>
>>> Griff
>>>
>>> On 14 January 2016 at 01:25, Anuj Wadehra 
>>> wrote:
>>>
 Ok. I saw dropped mutations on your cluster and full gc is a common
 cause for that.
 Can you just search the word GCInspector in system.log and share the
 frequency of minor and full gc. Moreover, are you printing promotion
 failures in gc logs?? Why full gc ia getting triggered??promotion failures
 or concurrent mode failures?

 If you are on CMS, you need to fine tune your heap options to address
 full gc.



 Thanks
 Anuj

 Sent from Yahoo Mail on Android
 

 On Thu, 14 Jan, 2016 at 12:57 am, James Griffin
  wrote:
 I think I was incorrect in assuming GC wasn't an issue due to the lack
 of logs. Comparing jstat output on nodes 2 & 3 show some fairly marked
 differences, though
 comparing the startup flags on the two machines show the GC config is
 identical.:

 $ jstat -gcutil
S0 S1 E  O  P YGC YGCTFGCFGCT GCT
 2  5.08   0.00  55.72  18.24  59.90  25986  619.827281.597
  621.424
 3  0.00   0.00  22.79  17.87  59.99 422600 11225.979   668   57.383
 11283.361

 Here's typical output for iostat on nodes 2 & 3 as well:

 $ iostat -dmx md0

   Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s
 avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
 2 md0   0.00 0.00  339.000.00 9.77 

Re: New node has high network and disk usage.

2016-01-14 Thread James Griffin
Hi Kai,

Below - nothing going on that I can see

$ nodetool netstats
Mode: NORMAL
Not sending any streams.
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive   Pending  Completed
Commandsn/a 0   6326
Responses   n/a 0 219356



Best wishes,

Griff

[image: idioplatform] James "Griff" Griffin
CTO
Switchboard: +44 (0)20 3540 1920 | Direct: +44 (0)7763 139 206 | Twitter:
@imaginaryroots  | Skype: j.s.griffin
idio helps major brands and publishers to build closer relationships with
their customers and prospects by learning from their content consumption
and acting on that insight. We call it Content Intelligence, and it
integrates with your existing marketing technology to provide detailed
customer interest profiles in real-time across all channels, and to
personalize content into every channel for every customer. See
http://idioplatform.com

for
more information.

On 14 January 2016 at 14:22, Kai Wang  wrote:

> James,
>
> Can you post the result of "nodetool netstats" on the bad node?
>
> On Thu, Jan 14, 2016 at 9:09 AM, James Griffin <
> james.grif...@idioplatform.com> wrote:
>
>> A summary of what we've done this morning:
>>
>>- Noted that there are no GCInspector lines in system.log on bad node
>>(there are GCInspector logs on other healthy nodes)
>>- Turned on GC logging, noted that we had logs which stated out total
>>time for which application threads were stopped was high - ~10s.
>>- Not seeing failures or any kind (promotion or concurrent mark)
>>- Attached Visual VM: noted that heap usage was very low (~5% usage
>>and stable) and it didn't display hallmarks GC of activity. PermGen also
>>very stable
>>- Downloaded GC logs and examined in GC Viewer. Noted that:
>>- We had lots of pauses (again around 10s), but no full GC.
>>   - From a 2,300s sample, just over 2,000s were spent with threads
>>   paused
>>   - Spotted many small GCs in the new space - realised that Xmn
>>   value was very low (200M against a heap size of 3750M). Increased Xmn 
>> to
>>   937M - no change in server behaviour (high load, high reads/s on disk, 
>> high
>>   CPU wait)
>>
>> Current output of jstat:
>>
>>   S0 S1 E  O  P YGC YGCTFGCFGCT GCT
>> 2 0.00  45.20  12.82  26.84  76.21   2333   63.684 20.039   63.724
>> 3 63.58   0.00  33.68   8.04  75.19 141.812 20.103
>>  1.915
>>
>> Correct me if I'm wrong, but it seems 3 is lot more healthy GC wise than
>> 2 (which has normal load statistics).
>>
>> Anywhere else you can recommend we look?
>>
>> Griff
>>
>> On 14 January 2016 at 01:25, Anuj Wadehra  wrote:
>>
>>> Ok. I saw dropped mutations on your cluster and full gc is a common
>>> cause for that.
>>> Can you just search the word GCInspector in system.log and share the
>>> frequency of minor and full gc. Moreover, are you printing promotion
>>> failures in gc logs?? Why full gc ia getting triggered??promotion failures
>>> or concurrent mode failures?
>>>
>>> If you are on CMS, you need to fine tune your heap options to address
>>> full gc.
>>>
>>>
>>>
>>> Thanks
>>> Anuj
>>>
>>> Sent from Yahoo Mail on Android
>>> 
>>>
>>> On Thu, 14 Jan, 2016 at 12:57 am, James Griffin
>>>  wrote:
>>> I think I was incorrect in assuming GC wasn't an issue due to the lack
>>> of logs. Comparing jstat output on nodes 2 & 3 show some fairly marked
>>> differences, though
>>> comparing the startup flags on the two machines show the GC config is
>>> identical.:
>>>
>>> $ jstat -gcutil
>>>S0 S1 E  O  P YGC YGCTFGCFGCT GCT
>>> 2  5.08   0.00  55.72  18.24  59.90  25986  619.827281.597
>>>  621.424
>>> 3  0.00   0.00  22.79  17.87  59.99 422600 11225.979   668   57.383
>>> 11283.361
>>>
>>> Here's typical output for iostat on nodes 2 & 3 as well:
>>>
>>> $ iostat -dmx md0
>>>
>>>   Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s
>>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>>> 2 md0   0.00 0.00  339.000.00 9.77 0.00
>>>  59.00 0.000.000.000.00   0.00   0.00
>>> 3 md0   0.00 0.00 2069.001.0085.85 0.00
>>>  84.94 0.000.000.000.00   0.00   0.00
>>>
>>> Griff
>>>
>>> On 13 January 2016 at 18:36, Anuj Wadehra 
>>> wrote:
>>>
 Node 2 has slightly higher data but that should be ok. Not sure how
 read ops are so high when no IO 

Re: New node has high network and disk usage.

2016-01-14 Thread James Griffin
Hi Kai,

Well observed - running `nodetool status` without specifying keyspace does
report ~33% on each node. We have two keyspaces on this cluster - if I
specify either of them the ownership reported by each node is 100%, so I
believe the repair completed successfully.

Best wishes,

Griff

[image: idioplatform] James "Griff" Griffin
CTO
Switchboard: +44 (0)20 3540 1920 | Direct: +44 (0)7763 139 206 | Twitter:
@imaginaryroots  | Skype: j.s.griffin
idio helps major brands and publishers to build closer relationships with
their customers and prospects by learning from their content consumption
and acting on that insight. We call it Content Intelligence, and it
integrates with your existing marketing technology to provide detailed
customer interest profiles in real-time across all channels, and to
personalize content into every channel for every customer. See
http://idioplatform.com

for
more information.

On 14 January 2016 at 15:08, Kai Wang  wrote:

> James,
>
> I may miss something. You mentioned your cluster had RF=3. Then why does
> "nodetool status" show each node owns 1/3 of the data especially after a
> full repair?
>
> On Thu, Jan 14, 2016 at 9:56 AM, James Griffin <
> james.grif...@idioplatform.com> wrote:
>
>> Hi Kai,
>>
>> Below - nothing going on that I can see
>>
>> $ nodetool netstats
>> Mode: NORMAL
>> Not sending any streams.
>> Read Repair Statistics:
>> Attempted: 0
>> Mismatch (Blocking): 0
>> Mismatch (Background): 0
>> Pool NameActive   Pending  Completed
>> Commandsn/a 0   6326
>> Responses   n/a 0 219356
>>
>>
>>
>> Best wishes,
>>
>> Griff
>>
>> [image: idioplatform] James "Griff" Griffin
>> CTO
>> Switchboard: +44 (0)20 3540 1920 | Direct: +44 (0)7763 139 206 |
>> Twitter: @imaginaryroots  | Skype:
>> j.s.griffin
>> idio helps major brands and publishers to build closer relationships with
>> their customers and prospects by learning from their content consumption
>> and acting on that insight. We call it Content Intelligence, and it
>> integrates with your existing marketing technology to provide detailed
>> customer interest profiles in real-time across all channels, and to
>> personalize content into every channel for every customer. See
>> http://idioplatform.com
>> 
>>  for
>> more information.
>>
>> On 14 January 2016 at 14:22, Kai Wang  wrote:
>>
>>> James,
>>>
>>> Can you post the result of "nodetool netstats" on the bad node?
>>>
>>> On Thu, Jan 14, 2016 at 9:09 AM, James Griffin <
>>> james.grif...@idioplatform.com> wrote:
>>>
 A summary of what we've done this morning:

- Noted that there are no GCInspector lines in system.log on bad
node (there are GCInspector logs on other healthy nodes)
- Turned on GC logging, noted that we had logs which stated out
total time for which application threads were stopped was high - ~10s.
- Not seeing failures or any kind (promotion or concurrent mark)
- Attached Visual VM: noted that heap usage was very low (~5% usage
and stable) and it didn't display hallmarks GC of activity. PermGen also
very stable
- Downloaded GC logs and examined in GC Viewer. Noted that:
- We had lots of pauses (again around 10s), but no full GC.
   - From a 2,300s sample, just over 2,000s were spent with threads
   paused
   - Spotted many small GCs in the new space - realised that Xmn
   value was very low (200M against a heap size of 3750M). Increased 
 Xmn to
   937M - no change in server behaviour (high load, high reads/s on 
 disk, high
   CPU wait)

 Current output of jstat:

   S0 S1 E  O  P YGC YGCTFGCFGCT GCT
 2 0.00  45.20  12.82  26.84  76.21   2333   63.684 20.039
 63.724
 3 63.58   0.00  33.68   8.04  75.19 141.812 20.103
  1.915

 Correct me if I'm wrong, but it seems 3 is lot more healthy GC wise
 than 2 (which has normal load statistics).

 Anywhere else you can recommend we look?

 Griff

 On 14 January 2016 at 01:25, Anuj Wadehra 
 wrote:

> Ok. I saw dropped mutations on your cluster and full gc is a common
> cause for that.
> Can you just search the word GCInspector in system.log and share the
> frequency of minor and full gc. Moreover, are 

Cassandra 3.1.1 with respect to HeapSpace

2016-01-14 Thread Jean Tremblay
Hi,

I have a small Cassandra Cluster with 5 nodes, having 16MB of RAM.
I use Cassandra 3.1.1.
I use the following setup for the memory:
  MAX_HEAP_SIZE="6G"
HEAP_NEWSIZE="496M"

I have been loading a lot of data in this cluster over the last 24 hours. The 
system behaved I think very nicely. It was loading very fast, and giving 
excellent read time. There was no error messages until this one:


ERROR [SharedPool-Worker-35] 2016-01-14 17:05:23,602 
JVMStabilityInspector.java:139 - JVM state determined to be unstable.  Exiting 
forcefully due to:
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[na:1.8.0_65]
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_65]
at 
org.apache.cassandra.io.util.DataOutputBuffer.reallocate(DataOutputBuffer.java:126)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:86) 
~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:297)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:374) 
~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.rows.BufferCell$Serializer.serialize(BufferCell.java:263)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:183)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:96)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:132)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:298)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:136)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:128)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:123)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65) 
~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289) 
~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:47)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) 
~[apache-cassandra-3.1.1.jar:3.1.1]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_65]
at 
org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
 ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
[apache-cassandra-3.1.1.jar:3.1.1]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]

4 nodes out of 5 crashed with this error message. Now when I want to restart 
the first node I have the following error;

ERROR [main] 2016-01-14 17:15:59,617 JVMStabilityInspector.java:81 - Exiting 
due to error while processing commit log during initialization.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: 
Unexpected error deserializing mutation; saved to 
/tmp/mutation7465380878750576105dat.  This may be caused by replaying a 
mutation against a table with the same name but incompatible schema.  Exception 
follows: org.apache.cassandra.serializers.MarshalException: Not enough bytes to 
read a map
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:633)
 [apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:556)
 [apache-cassandra-3.1.1.jar:3.1.1]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:509)
 [apache-cassandra-3.1.1.jar:3.1.1]
at 

Re: Cassandra Performance on a Single Machine

2016-01-14 Thread Jack Krupansky
Thanks for that clarification.

So, your 1GB input size means roughly 716 thousand rows of data and 128GB
means roughly 92 million rows, correct?

FWIW, a best practice recommendation is that you avoid using secondary
indexes in favor of using "query tables" - store the same data in multiple
tables but with a primary key that includes data column you wish to query
by. In general, avoid using secondary indexes with either very high or very
low cardinality.

Are your gets and searches returning single rows, or a significant number
of rows?


-- Jack Krupansky

On Thu, Jan 14, 2016 at 4:43 PM, Anurag Khandelwal 
wrote:

> To clarify: Input size is the size of the dataset as a CSV file, before
> loading it into Cassandra; for each input size, the number of columns is
> fixed but the number of rows is different. By 1.5KB record, I meant that
> each row, when represented as a CSV entry, occupies 1500 bytes. I've used
> the terms "row" and "record" interchangeably, which might have been the
> source of some confusion.
>
> I'll run the stress tool and report the results as well; the hardware is
> whatever AWS provides for c3.8xlarge EC2 instance.
>
> Anurag
>
> On Jan 14, 2016, at 1:33 PM, Jack Krupansky 
> wrote:
>
> What exactly is "input size" here (1GB to 128GB)? I mean, the test spec "The
> dataset used comprises of ~1.5KB records...  there are 105 attributes in
> each record." Does each test run have exactly the same number of rows and
> columns and you're just making each column bigger, or what?
>
> Cassandra doesn't have "records", so are you really saying that you show
> 1,500 rows? Is it one row per partition or do you have clustering?
>
> What are you actually trying to measure? (Some more context would help.)
>
> In any case, a latency of 200ms (5 per second) for yor search query seems
> rather low, but we need some clarity on input size.
>
> If you just run the cassandra stress tool on your hardware, what kinds of
> numbers do you get. That should be the starting point for any benchmarking
> - how does your hardware perform processing basic requests, before you
> layer your own data modeling on top of that.
>
> -- Jack Krupansky
>
> On Thu, Jan 14, 2016 at 4:02 PM, Jonathan Haddad 
> wrote:
>
>> I think you actually get a really useful metric by benchmarking 1
>> machine.  You understand your cluster's theoretical maximum performance,
>> which would be Nodes * number of queries.  Yes, adding in replication and
>> CL is important, but 1 machine lets you isolate certain performance
>> metrics.
>>
>> On Thu, Jan 14, 2016 at 12:23 PM Robert Wille  wrote:
>>
>>> I disagree. I think that you can extrapolate very little information
>>> about RF>1 and CL>1 by benchmarking with RF=1 and CL=1.
>>>
>>> On Jan 13, 2016, at 8:41 PM, Anurag Khandelwal 
>>> wrote:
>>>
>>> Hi John,
>>>
>>> Thanks for responding!
>>>
>>> The aim of this benchmark was not to benchmark Cassandra as an
>>> end-to-end distributed system, but to understand a break down of the
>>> performance. For instance, if we understand the performance characteristics
>>> that we can expect from a single machine cassandra instance with
>>> RF=Consistency=1, we can have a good estimate of what the distributed
>>> performance with higher replication factors and consistency are going to
>>> look like. Even in the ideal case, the performance improvement would scale
>>> at most linearly with more machines and replicas.
>>>
>>> That being said, I still want to understand whether this is the
>>> performance I should expect for the setup I described; if the performance
>>> for the current setup can be improved, then clearly the performance for a
>>> production setup (with multiple nodes, replicas) would also improve. Does
>>> that make sense?
>>>
>>> Thanks!
>>> Anurag
>>>
>>> On Jan 6, 2016, at 9:31 AM, John Schulz  wrote:
>>>
>>> Anurag,
>>>
>>> Unless you are planning on continuing to use only one machine with RF=1
>>> benchmarking a single system using RF=Consistancy=1 is mostly a waste of
>>> time. If you are going to use RF=1 and a single host then why use Cassandra
>>> at all. Plain old relational dbs should do the job just fine.
>>>
>>> Cassandra is designed to be distributed. You won't get the full impact
>>> of how it scales and the limits on scaling unless you benchmark a
>>> distributed system. For example the scaling impact of secondary indexes
>>> will not be visible on a single node.
>>>
>>> John
>>>
>>>
>>>
>>>
>>> On Tue, Jan 5, 2016 at 3:16 PM, Anurag Khandelwal 
>>> wrote:
>>>
 Hi,

 I’ve been benchmarking Cassandra to get an idea of how the performance
 scales with more data on a single machine. I just wanted to get some
 feedback to whether these are the numbers I should expect.

 The benchmarks are quite simple — I measure the latency and throughput
 for two