Re: Restarting nodes and reported load

2017-06-01 Thread Victor Chen
Regarding mtime, I'm just talking about using something like the following
(assuming you are on linux) "find *pathtoyourdatadir *-mtime -1 -ls" which
will find all files in your datadir last modifed within the past 24h. You
can compare increase in your reported nodetool load within the past N days
and then use the same period of time to look for files modified that could
match that size. Not really sure what sort of load or how long that would
take on 3-4T of data though.

Regarding compactionstats and tpstats, I would just be interested if there
are increasing "pending" tasks for either. Did you say you observed latency
issues or degraded performance or not?

What version of java/cassandra did you say you were running and what type
of gc are you using?

Regarding not showing a node not creating "DOWN" entry in log, if a node
experiences a sufficiently long gc pause (I'm not sure what the threshold
is, maybe somebody more knowledgeable can chime in?), then even though the
node itself still "thinks" it's up, *other* nodes will mark it as DN, thus
you wouldn't see a "is now DOWN" entry in the system.log of the gc-ing
node, but you *would* see a "is now DOWN" entry in the system.log of the
remote nodes (and a corresponding "is now UP" entry when the node comes out
of its gc pause. Assuming the logs have not been rotated off, if you just
grep system.log for "DOWN" on your nodes, that usually reveals a useful
timestamp from where to start looking on the problematic node's system.log
or gc.log.

Do you have peristent cpu/memory disk io/ space monitoring mechanisms? You
should think about putting something in place to gathering that info if you
don't ... I find myself coming back to Al Tobey's tuning guide

frequently if nothing else for the tools he mentions and notes on the java
gc. I want to say heap size of 15G sounds a little high but I am starting
to talk a bit out of my depth when it comes to java tuning. see
datastax's official
cassandra 2.1 jvm tuning doc

and also this stackoverflow thread.


good luck!



On Thu, Jun 1, 2017 at 4:06 PM, Daniel Steuernol 
wrote:

> I'll try to capture answer to questions in the last 2 messages.
>
> Network traffic looks pretty steady overall. About 0.5 up to 2
> megabytes/s. The cluster handles about 100k to 500k operations per minute,
> right now the read/write comparison is about 50/50 right now, eventually
> though it will probably be 70% writes and 30% reads.
>
> There does seem to be some nodes that are affected more frequently then
> others. I haven't captured cpu/memory stats vs other nodes at the time the
> problem is occurring, I will do that next time it happens. Also I will look
> at compaction stats and tpstats, what are some things that I should be
> looking for in tpstats in particular, I'm not exactly sure how to read the
> output from that command.
>
> The heap size is set to 15GB on each node, and each node has 60GB of ram
> available.
>
> In regards to the "... is now DOWN" messages. I'm unable to find one in
> the system.log for a time I know that a node was having problems. I've
> built a system that polls nodetool status and parses the output, and if it
> sees a node reporting as DN it sends a message to a slack channel. Is it
> possible for a node to report as DN, but not have the message show up in th
> log?
> The system polling nodetool status is not the status that was reported as
> DN.
>
> I'm a bit unclear about the last point about mtime/size of files and how
> to check, can you provide more information there?
>
> Thanks for the all the help, I really appreciate it.
>
>
>
> On Jun 1 2017, at 10:33 am, Victor Chen  wrote:
>
>> Hi Daniel,
>>
>> In my experience when a node shows DN and then comes back up by itself
>> that sounds some sort of gc pause (especially if nodtool status when run
>> from the "DN" node itself shows it is up-- assuming there isn't a spotty
>> network issue). Perhaps I missed this info due to length of thread but have
>> you shared info about the following?
>>
>>- cpu/memory usage of affected nodes (are all nodes affected
>>comparably, or some more than others?)
>>- nodetool compactionstats and tpstats output (especially as the )
>>- what is your heap size set to?
>>- system.log and gc.logs: for investigating node "DN" symptoms I will
>>usually start by noting the timestamp of the "123.56.78.901 is now DOWN"
>>entries in system.log of other nodes to tell me where to look in 
>> system.log
>>of node in question. Then it's a question answer "what was this node doing
>>up to that point?"
>>- mtime/size of files in data directory-- which files are growing in
>>size?
>>
>> That will help reduce how much we 

Re: Restarting nodes and reported load

2017-06-01 Thread Daniel Steuernol
I'll try to capture answer to questions in the last 2 messages.Network traffic looks pretty steady overall. About 0.5 up to 2 megabytes/s. The cluster handles about 100k to 500k operations per minute, right now the read/write comparison is about 50/50 right now, eventually though it will probably be 70% writes and 30% reads.There does seem to be some nodes that are affected more frequently then others. I haven't captured cpu/memory stats vs other nodes at the time the problem is occurring, I will do that next time it happens. Also I will look at compaction stats and tpstats, what are some things that I should be looking for in tpstats in particular, I'm not exactly sure how to read the output from that command.The heap size is set to 15GB on each node, and each node has 60GB of ram available.In regards to the "... is now DOWN" messages. I'm unable to find one in the system.log for a time I know that a node was having problems. I've built a system that polls nodetool status and parses the output, and if it sees a node reporting as DN it sends a message to a slack channel. Is it possible for a node to report as DN, but not have the message show up in th log?The system polling nodetool status is not the status that was reported as DN.I'm a bit unclear about the last point about mtime/size of files and how to check, can you provide more information there?Thanks for the all the help, I really appreciate it.
  

On Jun 1 2017, at 10:33 am, Victor Chen  wrote:


  Hi Daniel,In my experience when a node
 shows DN and then comes back up by itself that sounds some sort of gc pause 
(especially if nodtool status when run from the "DN" node itself shows 
it is up-- assuming there isn't a spotty network issue). Perhaps I missed this info due to length of thread but have 
you shared info about the following?cpu/memory usage of affected nodes (are all nodes affected comparably, or some more than others?)nodetool compactionstats and tpstats output (especially as the )what is your heap size set to?system.log and gc.logs: for investigating node "DN" symptoms I
 will usually start by noting the timestamp of the "123.56.78.901 is now DOWN" 
entries in system.log of other nodes to tell me where to look in 
system.log of node in question. Then it's a question answer "what was 
this node doing up to that point?"mtime/size of files in data directory-- which files are growing in size? That will help reduce 
how much we need to speculate. I don't think you should need to restart cassandra every X days if things are optimally configured for your read/write pattern-- at least I would not want to use something where that is the normal expected behavior (and I don't believe cassandra is one of those sorts of things).On Thu, Jun 1, 2017 at 11:40 AM, daemeon reiydelle  wrote:Some random thoughts; I would like to thank you for giving us an interesting problem. Cassandra can get boring sometimes, it is too stable.- Do you have a way to monitor the network traffic to see if it is increasing between restarts or does it seem relatively flat?- What activities are happening when you observe the (increasing) latencies? Something must be writing to keyspaces, something I presume is reading. What is the workload?- when using SSD, there are some /devices optimizations for SSD's. I 
wonder if those were done (they will cause some IO latency, but not like
 this)Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872
On Thu, Jun 1, 2017 at 7:18 AM, Daniel Steuernol  wrote:I am just restarting cassandra. I'm not having any disk space issues I think, but we're having issues where operations have increased latency, and these are fixed by a restart. It seemed like the load reported by nodetool status might be helpful in understanding what is going wrong but I'm not sure. Another symptom is that nodes will report as DN in nodetool status and then come back up again just a minute later.I'm not really sure what to track to find out what exactly is going wrong on the cluster, so any insight or debugging techniques would be super helpful
  

On May 31 2017, at 5:07 pm, Anthony Grasso  wrote:


  Hi Daniel,When you say that the nodes have to be restarted, are you just restarting the Cassandra service or are you restarting the machine?How are you reclaiming disk space at the moment? Does disk space free up after the restart?Regarding storage on nodes, keep in mind the more data stored on a node, the longer some operations to maintain that data will take to complete. In addition, the more data that is on each node, the long it will take to stream data to other nodes. Whether it is replacing a down node or inserting a new node, having a large amount of data on each node will mean that it takes longer for a node to join 

Re: How to avoid flush if the data can fit into memtable

2017-06-01 Thread Akhil Mehra
Kevin, Stefan thanks for the positive feedback and questions.

Stefan in the blog post I am writing generally based on Apache Cassandra 
defaults. The meltable cleanup threshold is 1/(1+ memtable_flush_writers). By 
default the meltable_flush_writers defaults to two. This comes to 33 percent of 
the allocated memory. I have updated the blog post adding in this missing 
detail :)

In the email I was trying to address the OP’s original question. I mentioned .5 
because the OP had set the memtable_cleanup_threshold to .50. This is 50% of 
the allocated memory. I was also mentioning that clean up triggered when either 
on or off heap memory reaches the clean up threshold. Please refer to 
https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/utils/memory/MemtableCleanerThread.java#L46-L49
 

 .

I hope that helps.

Regards,
Akhil



 
> On 2/06/2017, at 2:04 AM, Stefan Litsche  wrote:
> 
> Hello Akhil,
> 
> thanks for your great blog post.
> One thing I cannot bring together:
> In the answer mail you write:
> "Note the cleanup threshold is .50 of 1GB and not a combination of heap and 
> off heap space."
> In your blog post you write:
> "memtable_cleanup_threshold is the default value i.e. 33 percent of the total 
> memtable heap and off heap memory."
> 
> Could you clarify this?
> 
> Thanks
> Stefan
> 
> 
> 2017-05-30 2:43 GMT+02:00 Akhil Mehra :
> Hi Preetika,
> 
> After thinking about your scenario I believe your small SSTable size might be 
> due to data compression. By default, all tables enable SSTable compression. 
> 
> Let go through your scenario. Let's say you have allocated 4GB to your 
> Cassandra node. Your memtable_heap_space_in_mb and 
> memtable_offheap_space_in_mb  will roughly come to around 1GB. Since you have 
> memtable_cleanup_threshold to .50 table cleanup will be triggered when total 
> allocated memtable space exceeds 1/2GB. Note the cleanup threshold is .50 of 
> 1GB and not a combination of heap and off heap space. This memtable 
> allocation size is the total amount available for all tables on your node. 
> This includes all system related keyspaces. The cleanup process will write 
> the largest memtable to disk.
> 
> For your case, I am assuming that you are on a single node with only one 
> table with insert activity. I do not think the commit log will trigger a 
> flush in this circumstance as by default the commit log has 8192 MB of space 
> unless the commit log is placed on a very small disk. 
> 
> I am assuming your table on disk is smaller than 500MB because of 
> compression. You can disable compression on your table and see if this helps 
> get the desired size.
> 
> I have written up a blog post explaining memtable flushing 
> (http://abiasforaction.net/apache-cassandra-memtable-flush/)
> 
> Let me know if you have any other question. 
> 
> I hope this helps.
> 
> Regards,
> Akhil Mehra 
> 
> 
> On Fri, May 26, 2017 at 6:58 AM, preetika tyagi  
> wrote:
> I agree that for such a small data, Cassandra is obviously not needed. 
> However, this is purely an experimental setup by using which I'm trying to 
> understand how and exactly when memtable flush is triggered. As I mentioned 
> in my post, I read the documentation and tweaked the parameters accordingly 
> so that I never hit memtable flush but it is still doing that. As far the the 
> setup is concerned, I'm just using 1 node and running Cassandra using 
> "cassandra -R" option and then running some queries to insert some dummy data.
> 
> I use the schema from CASSANDRA_HOME/tools/cqlstress-insanity-example.yaml 
> and add "durable_writes=false" in the keyspace_definition.
> 
> @Daemeon - The previous post lead to this post but since I was unaware of 
> memtable flush and I assumed memtable flush wasn't happening, the previous 
> post was about something else (throughput/latency etc.). This post is 
> explicitly about exactly when memtable is being dumped to the disk. Didn't 
> want to confuse two different goals that's why posted a new one.
> 
> On Thu, May 25, 2017 at 10:38 AM, Avi Kivity  wrote:
> It doesn't have to fit in memory. If your key distribution has strong 
> temporal locality, then a larger memtable that can coalesce overwrites 
> greatly reduces the disk I/O load for the memtable flush and subsequent 
> compactions. Of course, I have no idea if the is what the OP had in mind.
> 
> 
> On 05/25/2017 07:14 PM, Jonathan Haddad wrote:
>> Sorry for the confusion.  That was for the OP.  I wrote it quickly right 
>> after waking up.
>> 
>> What I'm asking is why does the OP want to keep his data in the memtable 
>> exclusively?  If the goal is to "make reads fast", then just turn on row 
>> caching.  
>> 

Re: Cassandra Server 3.10 unable to Start after crash - commitlog needs to be removed

2017-06-01 Thread Cogumelos Maravilha
You can also manually delete the corrupt log file. Just check is name on
the logs.

Of course you are losing some data or not!

Cheers


On 01-06-2017 20:01, Peter Reilly wrote:
> Please, how do you do this?
>
> Peter
>
>
> On Fri, May 19, 2017 at 7:13 PM, Varun Gupta  > wrote:
>
> Yes the bugs need to be fixed, but as a work around on dev
> environment, you can enable cassandra.yaml option to override any
> corrupted commit log file.
>
>
> Thanks,
> Varun
>
> > On May 19, 2017, at 11:31 AM, Jeff Jirsa  > wrote:
> >
> >
> >
> >> On 2017-05-19 08:13 (-0700), Haris Altaf  > wrote:
> >> Hi All,
> >> I am using Cassandra 3.10 for my project and whenever my local
> windows
> >> system, which is my development environment, crashes then
> cassandra server
> >> is unable to start. I have to delete commitlog directory after
> every system
> >> crash. This is actually annoying and what's the purpose of
> commitlog if it
> >> itself gets crashed. I have uploaded the entire dump of
> Cassandra Server
> >> (along with logs, commitlogs, data, configs etc) at the link
> below. Kindly
> >> share its solution. I believe it needs to be fixed.
> >>
> >
> > You need to share the exact stack trace. In cassandra 3.0+, we
> became much less tolerant of surprises in commitlog state -
> perhaps a bit too aggressive, failing to start in many cases when
> only minor things were wrong. We've recently fixed a handful of
> these, but they may not be released yet for the version you're using.
> >
> >
> >
> >
> -
> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> 
> > For additional commands, e-mail: user-h...@cassandra.apache.org
> 
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> 
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 
>
>



Re: Cassandra Server 3.10 unable to Start after crash - commitlog needs to be removed

2017-06-01 Thread Peter Reilly
Please, how do you do this?

Peter


On Fri, May 19, 2017 at 7:13 PM, Varun Gupta  wrote:

> Yes the bugs need to be fixed, but as a work around on dev environment,
> you can enable cassandra.yaml option to override any corrupted commit log
> file.
>
>
> Thanks,
> Varun
>
> > On May 19, 2017, at 11:31 AM, Jeff Jirsa  wrote:
> >
> >
> >
> >> On 2017-05-19 08:13 (-0700), Haris Altaf  wrote:
> >> Hi All,
> >> I am using Cassandra 3.10 for my project and whenever my local windows
> >> system, which is my development environment, crashes then cassandra
> server
> >> is unable to start. I have to delete commitlog directory after every
> system
> >> crash. This is actually annoying and what's the purpose of commitlog if
> it
> >> itself gets crashed. I have uploaded the entire dump of Cassandra Server
> >> (along with logs, commitlogs, data, configs etc) at the link below.
> Kindly
> >> share its solution. I believe it needs to be fixed.
> >>
> >
> > You need to share the exact stack trace. In cassandra 3.0+, we became
> much less tolerant of surprises in commitlog state - perhaps a bit too
> aggressive, failing to start in many cases when only minor things were
> wrong. We've recently fixed a handful of these, but they may not be
> released yet for the version you're using.
> >
> >
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: user-h...@cassandra.apache.org
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: Restarting nodes and reported load

2017-06-01 Thread Victor Chen
Hi Daniel,

In my experience when a node shows DN and then comes back up by itself that
sounds some sort of gc pause (especially if nodtool status when run from
the "DN" node itself shows it is up-- assuming there isn't a spotty network
issue). Perhaps I missed this info due to length of thread but have you
shared info about the following?

   - cpu/memory usage of affected nodes (are all nodes affected comparably,
   or some more than others?)
   - nodetool compactionstats and tpstats output (especially as the )
   - what is your heap size set to?
   - system.log and gc.logs: for investigating node "DN" symptoms I will
   usually start by noting the timestamp of the "123.56.78.901 is now DOWN"
   entries in system.log of other nodes to tell me where to look in system.log
   of node in question. Then it's a question answer "what was this node doing
   up to that point?"
   - mtime/size of files in data directory-- which files are growing in
   size?

That will help reduce how much we need to speculate. I don't think you
should need to restart cassandra every X days if things are optimally
configured for your read/write pattern-- at least I would not want to use
something where that is the normal expected behavior (and I don't believe
cassandra is one of those sorts of things).

On Thu, Jun 1, 2017 at 11:40 AM, daemeon reiydelle 
wrote:

> Some random thoughts; I would like to thank you for giving us an
> interesting problem. Cassandra can get boring sometimes, it is too stable.
>
> - Do you have a way to monitor the network traffic to see if it is
> increasing between restarts or does it seem relatively flat?
> - What activities are happening when you observe the (increasing)
> latencies? Something must be writing to keyspaces, something I presume is
> reading. What is the workload?
> - when using SSD, there are some /devices optimizations for SSD's. I
> wonder if those were done (they will cause some IO latency, but not like
> this)
>
>
>
>
>
>
>
> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <%28415%29%20501-0198>London
> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>
>
>
> On Thu, Jun 1, 2017 at 7:18 AM, Daniel Steuernol 
> wrote:
>
>> I am just restarting cassandra. I'm not having any disk space issues I
>> think, but we're having issues where operations have increased latency, and
>> these are fixed by a restart. It seemed like the load reported by nodetool
>> status might be helpful in understanding what is going wrong but I'm not
>> sure. Another symptom is that nodes will report as DN in nodetool status
>> and then come back up again just a minute later.
>>
>> I'm not really sure what to track to find out what exactly is going wrong
>> on the cluster, so any insight or debugging techniques would be super
>> helpful
>>
>>
>> On May 31 2017, at 5:07 pm, Anthony Grasso 
>> wrote:
>>
>>> Hi Daniel,
>>>
>>> When you say that the nodes have to be restarted, are you just
>>> restarting the Cassandra service or are you restarting the machine?
>>> How are you reclaiming disk space at the moment? Does disk space free up
>>> after the restart?
>>>
>>> Regarding storage on nodes, keep in mind the more data stored on a node,
>>> the longer some operations to maintain that data will take to complete. In
>>> addition, the more data that is on each node, the long it will take to
>>> stream data to other nodes. Whether it is replacing a down node or
>>> inserting a new node, having a large amount of data on each node will mean
>>> that it takes longer for a node to join the cluster if it is streaming the
>>> data.
>>>
>>> Kind regards,
>>> Anthony
>>>
>>> On 30 May 2017 at 02:43, Daniel Steuernol  wrote:
>>>
>>> The cluster is running with RF=3, right now each node is storing about
>>> 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61
>>> GB of RAM, and the disks attached for the data drive are gp2 ssd ebs
>>> volumes with 10k iops. I guess this brings up the question of what's a good
>>> marker to decide on whether to increase disk space vs provisioning a new
>>> node?
>>>
>>>
>>>
>>> On May 29 2017, at 9:35 am, tommaso barbugli 
>>> wrote:
>>>
>>> Hi Daniel,
>>>
>>> This is not normal. Possibly a capacity problem. Whats the RF, how much
>>> data do you store per node and what kind of servers do you use (core count,
>>> RAM, disk, ...)?
>>>
>>> Cheers,
>>> Tommaso
>>>
>>> On Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol >> > wrote:
>>>
>>>
>>> I am running a 6 node cluster, and I have noticed that the reported load
>>> on each node rises throughout the week and grows way past the actual disk
>>> space used and available on each node. Also eventually latency for
>>> operations suffers and the nodes have to be restarted. A couple questions
>>> on this, is this normal? Also does cassandra need to be restarted every few
>>> days for best performance? 

MemtablePostFlush pending

2017-06-01 Thread ZAIDI, ASAD A
Hello Folks,

I'm adding another node on my 14 node open source apache cassandra 2.2.8 
cluster.  New node is taking long time to join the cluster.
I see there are bunch of pending [memtablepostflush] threads. I did increase 
memtable_flush_writers from 8 to 24 , though it is not helping with situation.

Can you guys please share your experience or advise me what else I can look so 
to  remediate the situation.


nodetool tpstats
Pool NameActive   Pending  Completed   Blocked  All 
time blocked
MutationStage 0 08543301 0  
   0
ReadStage 0 0  0 0  
   0
RequestResponseStage  0 0  0 0  
   0
ReadRepairStage   0 0  0 0  
   0
CounterMutationStage  0 0  0 0  
   0
GossipStage   0 0  0 0  
   0
MigrationStage0 0  0 0  
   0
MemtablePostFlush 1   388 15 0  
   0
ValidationExecutor0 0  0 0  
   0
Sampler   0 0  0 0  
   0
MiscStage 0 0  0 0  
   0
MemtableFlushWriter   3 3133 0  
   0
InternalResponseStage 0 0  0 0  
   0
CompactionExecutor0 0 18 0  
   0
MemtableReclaimMemory 0 0133 0  
   0
AntiEntropyStage  0 0  0 0  
   0
CacheCleanupExecutor  0 0  0 0  
   0

Message type   Dropped
READ 0
RANGE_SLICE  0
_TRACE   0
MUTATION 0
COUNTER_MUTATION 0
REQUEST_RESPONSE 0
PAGED_RANGE  0
READ_REPAIR  0


Information on Cassandra

2017-06-01 Thread Harper, Paul
Hello All,

I'm about 3 months into support several clusters of Cassandra databases. I 
recently subscribed to this email list and I receive lots of interesting emails 
most of which I don't understand. I feel like I have a pretty good grasp on 
Cassandra, I would like to know what types of this should I be checking on a 
daily, weekly or monthly basis. Many of the email I see in this string are on 
subjects I've never had to look at so far. So I'm wondering what is it that I 
should be monitoring or doing or I should know. I would appreciate it any 
advice or guidance you can provide. Please to my email and not the group 
listing  unless it's something that maybe helpful to others.

Thanks in advance

Paul E. Harper iii
Principal ClouD SERVICES ENGINEER
NETWORK OPS RUN TEAM - DATABASE
+1 770-239-4465 WORK
+1 770.239.4205 FAX
4450 river green pkwy, suite 100
duluth, GA 30096
United STATES
paul.har...@aspect.com
aspect.com

[Description: http://webapp2.aspect.com/EmailSigLogo-rev.jpg]

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


Re: Restarting nodes and reported load

2017-06-01 Thread daemeon reiydelle
Some random thoughts; I would like to thank you for giving us an
interesting problem. Cassandra can get boring sometimes, it is too stable.

- Do you have a way to monitor the network traffic to see if it is
increasing between restarts or does it seem relatively flat?
- What activities are happening when you observe the (increasing)
latencies? Something must be writing to keyspaces, something I presume is
reading. What is the workload?
- when using SSD, there are some /devices optimizations for SSD's. I wonder
if those were done (they will cause some IO latency, but not like this)







*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*



On Thu, Jun 1, 2017 at 7:18 AM, Daniel Steuernol 
wrote:

> I am just restarting cassandra. I'm not having any disk space issues I
> think, but we're having issues where operations have increased latency, and
> these are fixed by a restart. It seemed like the load reported by nodetool
> status might be helpful in understanding what is going wrong but I'm not
> sure. Another symptom is that nodes will report as DN in nodetool status
> and then come back up again just a minute later.
>
> I'm not really sure what to track to find out what exactly is going wrong
> on the cluster, so any insight or debugging techniques would be super
> helpful
>
>
> On May 31 2017, at 5:07 pm, Anthony Grasso 
> wrote:
>
>> Hi Daniel,
>>
>> When you say that the nodes have to be restarted, are you just restarting
>> the Cassandra service or are you restarting the machine?
>> How are you reclaiming disk space at the moment? Does disk space free up
>> after the restart?
>>
>> Regarding storage on nodes, keep in mind the more data stored on a node,
>> the longer some operations to maintain that data will take to complete. In
>> addition, the more data that is on each node, the long it will take to
>> stream data to other nodes. Whether it is replacing a down node or
>> inserting a new node, having a large amount of data on each node will mean
>> that it takes longer for a node to join the cluster if it is streaming the
>> data.
>>
>> Kind regards,
>> Anthony
>>
>> On 30 May 2017 at 02:43, Daniel Steuernol  wrote:
>>
>> The cluster is running with RF=3, right now each node is storing about
>> 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61
>> GB of RAM, and the disks attached for the data drive are gp2 ssd ebs
>> volumes with 10k iops. I guess this brings up the question of what's a good
>> marker to decide on whether to increase disk space vs provisioning a new
>> node?
>>
>>
>>
>> On May 29 2017, at 9:35 am, tommaso barbugli 
>> wrote:
>>
>> Hi Daniel,
>>
>> This is not normal. Possibly a capacity problem. Whats the RF, how much
>> data do you store per node and what kind of servers do you use (core count,
>> RAM, disk, ...)?
>>
>> Cheers,
>> Tommaso
>>
>> On Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol 
>> wrote:
>>
>>
>> I am running a 6 node cluster, and I have noticed that the reported load
>> on each node rises throughout the week and grows way past the actual disk
>> space used and available on each node. Also eventually latency for
>> operations suffers and the nodes have to be restarted. A couple questions
>> on this, is this normal? Also does cassandra need to be restarted every few
>> days for best performance? Any insight on this behaviour would be helpful.
>>
>> Cheers,
>> Daniel
>> - To
>> unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For
>> additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>> - To
>> unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For
>> additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>> - To
> unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional
> commands, e-mail: user-h...@cassandra.apache.org


Re: Restoring a table cassandra - compactions

2017-06-01 Thread Marcus Eriksson
This is done to avoid overlap in levels > 0

There is this though: https://issues.apache.org/jira/browse/CASSANDRA-13425

If you are restoring an entire node, starting with an empty data directory,
you should probably stop cassandra, copy the snapshot in, and restart, that
will keep the levels

On Thu, Jun 1, 2017 at 4:25 PM, Jean Carlo 
wrote:

> Hello.
>
> During the restore of a table using its snapshot and nodetool refresh, I
> could see that cassandra starts to make a lot of compactions (depending on
> the size of the data).
>
> I wanted to know why and I found this in the code of cassandra 2.1.14.
>
> for CASSANDRA-4872
>
> +// force foreign sstables to level 0
> +try
> +{
> +if (new File(descriptor.filenameFor(
> Component.STATS)).exists())
> +{
> +SSTableMetadata oldMetadata =
> SSTableMetadata.serializer.deserialize(descriptor);
> +LeveledManifest.mutateLevel(oldMetadata, descriptor,
> descriptor.filenameFor(Component.STATS), 0);
> +}
> +}
> +catch (IOException e)
>
>
> This is very interesting and I wanted to know if this was coded taking
> into account only the case of a migration from STCS to LCS or if for the
> case LCS to LCS this is not pertinent
>
> In my case, I use nodetool refresh not only to restore a table but also to
> make an exact copy of any table LCS. So I think the levels do not need to
> change.
>
> @Marcus Can you be so kind to clarify this for me please ?
>
> Thenk you very much in advance
>
> Best regards
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay
>


Restoring a table cassandra - compactions

2017-06-01 Thread Jean Carlo
Hello.

During the restore of a table using its snapshot and nodetool refresh, I
could see that cassandra starts to make a lot of compactions (depending on
the size of the data).

I wanted to know why and I found this in the code of cassandra 2.1.14.

for CASSANDRA-4872

+// force foreign sstables to level 0
+try
+{
+if (new
File(descriptor.filenameFor(Component.STATS)).exists())
+{
+SSTableMetadata oldMetadata =
SSTableMetadata.serializer.deserialize(descriptor);
+LeveledManifest.mutateLevel(oldMetadata, descriptor,
descriptor.filenameFor(Component.STATS), 0);
+}
+}
+catch (IOException e)


This is very interesting and I wanted to know if this was coded taking into
account only the case of a migration from STCS to LCS or if for the case
LCS to LCS this is not pertinent

In my case, I use nodetool refresh not only to restore a table but also to
make an exact copy of any table LCS. So I think the levels do not need to
change.

@Marcus Can you be so kind to clarify this for me please ?

Thenk you very much in advance

Best regards

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay


Re: Restarting nodes and reported load

2017-06-01 Thread Daniel Steuernol
I am just restarting cassandra. I'm not having any disk space issues I think, but we're having issues where operations have increased latency, and these are fixed by a restart. It seemed like the load reported by nodetool status might be helpful in understanding what is going wrong but I'm not sure. Another symptom is that nodes will report as DN in nodetool status and then come back up again just a minute later.I'm not really sure what to track to find out what exactly is going wrong on the cluster, so any insight or debugging techniques would be super helpful
  

On May 31 2017, at 5:07 pm, Anthony Grasso  wrote:


  Hi Daniel,When you say that the nodes have to be restarted, are you just restarting the Cassandra service or are you restarting the machine?How are you reclaiming disk space at the moment? Does disk space free up after the restart?Regarding storage on nodes, keep in mind the more data stored on a node, the longer some operations to maintain that data will take to complete. In addition, the more data that is on each node, the long it will take to stream data to other nodes. Whether it is replacing a down node or inserting a new node, having a large amount of data on each node will mean that it takes longer for a node to join the cluster if it is streaming the data.Kind regards,AnthonyOn 30 May 2017 at 02:43, Daniel Steuernol  wrote:The cluster is running with RF=3, right now each node is storing about 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61 GB of RAM, and the disks attached for the data drive are gp2 ssd ebs volumes with 10k iops. I guess this brings up the question of what's a good marker to decide on whether to increase disk space vs provisioning a new node?
  

On May 29 2017, at 9:35 am, tommaso barbugli  wrote:


  Hi Daniel,This is not normal. Possibly a capacity problem. Whats the RF, how much data do you store per node and what kind of servers do you use (core count, RAM, disk, ...)?Cheers,TommasoOn Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol  wrote:I am running a 6 node cluster, and I have noticed that the reported load on each node rises throughout the week and grows way past the actual disk space used and available on each node. Also eventually latency for operations suffers and the nodes have to be restarted. A couple questions on this, is this normal? Also does cassandra need to be restarted every few days for best performance? Any insight on this behaviour would be helpful.Cheers,Daniel

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org




  

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org




  

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: How to avoid flush if the data can fit into memtable

2017-06-01 Thread Stefan Litsche
Hello Akhil,

thanks for your great blog post.
One thing I cannot bring together:
In the answer mail you write:
"Note the cleanup threshold is .50 of 1GB and not a combination of heap and
off heap space."
In your blog post you write:
"memtable_cleanup_threshold is the default value i.e. 33 percent of the
total memtable heap and off heap memory."

Could you clarify this?

Thanks
Stefan


2017-05-30 2:43 GMT+02:00 Akhil Mehra :

> Hi Preetika,
>
> After thinking about your scenario I believe your small SSTable size might
> be due to data compression. By default, all tables enable SSTable
> compression.
>
> Let go through your scenario. Let's say you have allocated 4GB to your
> Cassandra node. Your *memtable_heap_space_in_mb* and
>
> *memtable_offheap_space_in_mb  *will roughly come to around 1GB. Since
> you have memtable_cleanup_threshold to .50 table cleanup will be
> triggered when total allocated memtable space exceeds 1/2GB. Note the
> cleanup threshold is .50 of 1GB and not a combination of heap and off heap
> space. This memtable allocation size is the total amount available for all
> tables on your node. This includes all system related keyspaces. The
> cleanup process will write the largest memtable to disk.
>
> For your case, I am assuming that you are on a *single node with only one
> table with insert activity*. I do not think the commit log will trigger a
> flush in this circumstance as by default the commit log has 8192 MB of
> space unless the commit log is placed on a very small disk.
>
> I am assuming your table on disk is smaller than 500MB because of
> compression. You can disable compression on your table and see if this
> helps get the desired size.
>
> I have written up a blog post explaining memtable flushing (
> http://abiasforaction.net/apache-cassandra-memtable-flush/)
>
> Let me know if you have any other question.
>
> I hope this helps.
>
> Regards,
> Akhil Mehra
>
>
> On Fri, May 26, 2017 at 6:58 AM, preetika tyagi 
> wrote:
>
>> I agree that for such a small data, Cassandra is obviously not needed.
>> However, this is purely an experimental setup by using which I'm trying to
>> understand how and exactly when memtable flush is triggered. As I mentioned
>> in my post, I read the documentation and tweaked the parameters accordingly
>> so that I never hit memtable flush but it is still doing that. As far the
>> the setup is concerned, I'm just using 1 node and running Cassandra using
>> "cassandra -R" option and then running some queries to insert some dummy
>> data.
>>
>> I use the schema from CASSANDRA_HOME/tools/cqlstress-insanity-example.yaml
>> and add "durable_writes=false" in the keyspace_definition.
>>
>> @Daemeon - The previous post lead to this post but since I was unaware of
>> memtable flush and I assumed memtable flush wasn't happening, the previous
>> post was about something else (throughput/latency etc.). This post is
>> explicitly about exactly when memtable is being dumped to the disk. Didn't
>> want to confuse two different goals that's why posted a new one.
>>
>> On Thu, May 25, 2017 at 10:38 AM, Avi Kivity  wrote:
>>
>>> It doesn't have to fit in memory. If your key distribution has strong
>>> temporal locality, then a larger memtable that can coalesce overwrites
>>> greatly reduces the disk I/O load for the memtable flush and subsequent
>>> compactions. Of course, I have no idea if the is what the OP had in mind.
>>>
>>>
>>> On 05/25/2017 07:14 PM, Jonathan Haddad wrote:
>>>
>>> Sorry for the confusion.  That was for the OP.  I wrote it quickly right
>>> after waking up.
>>>
>>> What I'm asking is why does the OP want to keep his data in the memtable
>>> exclusively?  If the goal is to "make reads fast", then just turn on row
>>> caching.
>>>
>>> If there's so little data that it fits in memory (300MB), and there
>>> aren't going to be any writes past the initial small dataset, why use
>>> Cassandra?  It sounds like the wrong tool for this job.  Sounds like
>>> something that could easily be stored in S3 and loaded in memory when the
>>> app is fired up.
>>>
>>> On Thu, May 25, 2017 at 8:06 AM Avi Kivity  wrote:
>>>
 Not sure whether you're asking me or the original poster, but the more
 times data gets overwritten in a memtable, the less it has to be compacted
 later on (and even without overwrites, larger memtables result in less
 compaction).

 On 05/25/2017 05:59 PM, Jonathan Haddad wrote:

 Why do you think keeping your data in the memtable is a what you need
 to do?
 On Thu, May 25, 2017 at 7:16 AM Avi Kivity  wrote:

> Then it doesn't have to (it still may, for other reasons).
>
> On 05/25/2017 05:11 PM, preetika tyagi wrote:
>
> What if the commit log is disabled?
>
> On May 25, 2017 4:31 AM, "Avi Kivity"  wrote:
>
>> Cassandra has to flush the 

Re: Convert single node C* to cluster (rebalancing problem)

2017-06-01 Thread Vladimir Yudovin
Did you run "nodetool cleanup" on first node after second was bootstrapped? It 
should clean rows not belonging to node after tokens changed.



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Wed, 31 May 2017 03:55:54 -0400 Junaid Nasir jna...@an10.io 
wrote 




Cassandra ensure that adding or removing nodes are very easy and that load is 
balanced between nodes when a change is made. but it's not working in my case.

I have a single node C* deployment (with 270 GB of data) and want to load 
balance the data on multiple nodes, I followed this guide 

`nodetool status` shows 2 nodes but load is not balanced between them

Datacenter: dc1 === Status=Up/Down |/ 
State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) 
Host ID Rack UN 10.128.0.7 270.75 GiB 256 48.6% 
1a3f6faa-4376-45a8-9c20-11480ae5664c rack1 UN 10.128.0.14 414.36 KiB 256 51.4% 
66a89fbf-08ba-4b5d-9f10-55d52a199b41 rack1

I also ran 'nodetool repair' on new node but result is same. any pointers would 
be appreciated :)



conf file of new node

cluster_name: 'cluster1' - seeds: "10.128.0.7"
num_tokens: 256 endpoint_snitch: GossipingPropertyFileSnitch
Thanks,

Junaid









Re: Stable version apache cassandra 3.X /3.0.X

2017-06-01 Thread John Hughes
We are running 3.9x in prod, it has been pretty stable, but admittedly our
footprint is pretty small(half a dozen nodes with less than hundred gigs of
actual data) and not extremely tricky(no realized tables, etc). We have
been running tick-tock since 3.3x and upgrading a could weeks after release
to keep up. It seems to be stable within that context. What few hiccups
that have been encountered we have been able to work our way out of without
outage. We have load tested up to 60 nodes across 4 geographic regions(all
in aws) and it has behaved very well and better yet, very predictably.

That said, YMMV.

Cheers,
JTH

On Wed, May 31, 2017 at 5:50 PM Mark Furlong  wrote:

> I need to reduce my disk footprint, how is 3.0.14 for stability? Also,
> where do I find upgrade instructions and version requirements?
>
>
>
> *Thanks*
>
> *Mark*
>
> *801-705-7115 <(801)%20705-7115> office*
>
>
>
> *From:* Carlos Rolo [mailto:r...@pythian.com]
> *Sent:* Wednesday, May 31, 2017 11:17 AM
> *To:* Jonathan Haddad 
> *Cc:* Junaid Nasir ; pabbireddy avinash <
> pabbireddyavin...@gmail.com>; user@cassandra.apache.org
> *Subject:* Re: Stable version apache cassandra 3.X /3.0.X
>
>
>
> On sync in Jon.
>
>
>
> Only go 3.0.x if you REALLY need something from there (ex: MV) even then,
> be carefull.
>
>
>
> 3.x wait for 3.11.x. 3.10 if you REALLY need something from there right
> now.
>
>
>
> Latest 2.2.x or 2.1.x if you are just doing baseline Cassandra and need
> the stability.
>
>
> Regards,
>
>
>
> Carlos Juzarte Rolo
>
> Cassandra Consultant / Datastax Certified Architect / Cassandra MVP
>
>
>
> Pythian - Love your data
>
>
>
> rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin:
> * linkedin.com/in/carlosjuzarterolo
> *
>
> Mobile: +351 918 918 100 <+351%20918%20918%20100>
>
> www.pythian.com
>
>
>
> On Wed, May 31, 2017 at 5:48 PM, Jonathan Haddad 
> wrote:
>
> I really wouldn't go by the tick tock blog post, considering tick tock is
> dead.
>
>
>
> I'm still not wild about putting any 3.0 or 3.x into production.  3.0
> removed off heap memtables and there have been enough bugs in the storage
> engine that I'm still wary.  My hope is to see 3.11.x get enough bug fixes
> to where most people just skip 3.0 altogether.  I'm not sure if we're there
> yet though.
>
>
>
>
>
> On Wed, May 31, 2017 at 9:43 AM Junaid Nasir  wrote:
>
> as mentioned here
> http://www.datastax.com/dev/blog/cassandra-2-2-3-0-and-beyond
>
> Under normal conditions, we will NOT release 3.x.y stability releases for
> x > 0.  That is, we will have a traditional 3.0.y stability series, but the
> odd-numbered bugfix-only releases will fill that role for the tick-tock
> series — recognizing that occasionally we will need to be flexible enough
> to release an emergency fix in the case of a critical bug or security
> vulnerability.
> We do recognize that it will take some time for tick-tock releases to
> deliver production-level stability, which is why we will continue to
> deliver 2.2.y and 3.0.y bugfix releases.  (But if we do demonstrate that
> tick-tock can deliver the stability we want, there will be no need for a
> 4.0.y bugfix series, only 4.x tick-tock.)
>
>
>
> On Wed, May 31, 2017 at 9:02 PM, pabbireddy avinash <
> pabbireddyavin...@gmail.com> wrote:
>
> Hi,
>
>
>
> We are planning to deploy a cassandra production cluster on 3.X /3.0.X .
> Please let us know if there is any stable version  in 3.X/3.0.X that we
> could deploy in production .
>
>
>
> Regards,
> Avinash.
>
>
>
>
>
>
>
> --
>
>
>


Re: Convert single node C* to cluster (rebalancing problem)

2017-06-01 Thread Akhil Mehra
When you bootstrapped the node for the first time did you see log similar to 
the following:
INFO  [main] 2017-06-01 07:19:45,199 StorageService.java:1435 - JOINING: 
waiting for schema information to complete
INFO  [main] 2017-06-01 07:19:45,250 StorageService.java:1435 - JOINING: schema 
complete, ready to bootstrap
INFO  [main] 2017-06-01 07:19:45,251 StorageService.java:1435 - JOINING: 
waiting for pending range calculation
INFO  [main] 2017-06-01 07:19:45,251 StorageService.java:1435 - JOINING: 
calculation complete, ready to bootstrap
INFO  [main] 2017-06-01 07:19:45,251 StorageService.java:1435 - JOINING: 
getting bootstrap token
INFO  [main] 2017-06-01 07:19:45,341 StorageService.java:1435 - JOINING: 
sleeping 3 ms for pending range setup
INFO  [main] 2017-06-01 07:20:15,342 StorageService.java:1435 - JOINING: 
Starting to bootstrap...
INFO  [main] 2017-06-01 07:20:15,562 StreamResultFuture.java:90 - [Stream 
#c219d430-469a-11e7-8af3-81773e2a69ae] Executing streaming plan for Bootstrap
INFO  [StreamConnectionEstablisher:1] 2017-06-01 07:20:15,568 
StreamSession.java:266 - [Stream #c219d430-469a-11e7-8af3-81773e2a69ae] 
Starting streaming to /172.20.0.3
INFO  [StreamConnectionEstablisher:1] 2017-06-01 07:20:15,591 
StreamCoordinator.java:264 - [Stream #c219d430-469a-11e7-8af3-81773e2a69ae, 
ID#0] Beginning stream session with /172.20.0.3
INFO  [STREAM-IN-/172.20.0.3:7000] 2017-06-01 07:20:16,369 
StreamResultFuture.java:173 - [Stream #c219d430-469a-11e7-8af3-81773e2a69ae 
ID#0] Prepare completed. Receiving 4 files(4.046MiB), sending 0 files(0.000KiB)
INFO  [StreamReceiveTask:1] 2017-06-01 07:20:32,489 StreamResultFuture.java:187 
- [Stream #c219d430-469a-11e7-8af3-81773e2a69ae] Session with /172.20.0.3 is 
complete
INFO  [StreamReceiveTask:1] 2017-06-01 07:20:32,529 StreamResultFuture.java:219 
- [Stream #c219d430-469a-11e7-8af3-81773e2a69ae] All sessions completed
INFO  [StreamReceiveTask:1] 2017-06-01 07:20:32,535 StorageService.java:1491 - 
Bootstrap completed! for the tokens [2170857698622202367, 8517072504425343717, 
-3254771037524187900, -1597835042001935502, 6878847904605480741, 
6816916215341820068, 6291189887494617640, -6855333019196580358, 
-6353317035112065873, 8838974905234547016, 853981057438397, 
-2357950949959511387, 1242077960532340887, 2914039668080386735, 
3548015300105653368, 8973388453035242795, -2325235809399362967, 
-7078812010537656277, 768585495224455336, 7153512700965912517, 
8625819392009074153, -6138302849441936958, -2594051958993427953, 
-735827743339795655, -8202727571538912843, 2180751358288507888, 
-7872842094207074012, -2926504780761300623, -3197260822146229664, 
3411052656191450941, -9049284186987733291, -157882351668930258, 
454637839762305232, -2305675997627138050, 5785282040753174988, 
8604531769609599767, 4363117061247143957, -7255854383313210529, 
-3497611663121502480, -6788457421774336480, -7809767930173770420, 
6591540654522244365, 1773283733607350132, 134776973669066, 
-7242556233623424655, -1552552727731631642, -1226243976028310059, 
-8221762326275074149, -7963893314043006091, -850542197910474448, 
4219437099703910566, -8039365343972054221, 7756456412568178996, 
4057327843751741693, 7155628666873897485, -483058846775660782, 
6968839681845709305, 6396337738827005745, -5285173481531605912, 
7254663657455123842, 871654822989271789, -604574593420741277, 
-244646170484127, -3707613591745746278, -26727542030118959, 
-7190990795521107837, 5388348291571480415, 4249499356533972018, 
82469082189512791, -6389351372873749061, 5138413916027470955, 
2542233707258091740, -4057927973990056143, 552933169018893618, 
-8237860380097407047, 6917383508758068288, 543382311932406672, 
-5671560690999322491, -1240369858424929757, 7394536427227616773, 
4716882285905136652, 8260705434779371419, 3259812719139852593, 
-73864539388331289, -3573980475038135246, -1047139059901238511, 
-1734886021153324482, 8674873751672827600, 3564384074427511950, 
2754071903665103098, -1230493021099846761, -2731315467436512731, 
-7845984767828231726, -8082165594257396645, -2298177264815779081, 
-364542048544165, 9142633389925493379, 7206663288804675578, 
2305939212045070856, -5101738026249032246, 6268847697773786891, 
5903922100677671597, -2001787466557152206, 1318502870562311928, 
5784020265166141829, 5385229217299505171, 6010414616247875068, 
-8080602674779008196, -9189764569651551963, -8969124116887255329, 
-9040482343274988119, -8575947267671214955, -1786409930636352174, 
-757203989676123224, -6640569567328853730, 8431839804447545665, 
6781635966829972979, -8328382509754233304, -3181089993114819214, 
3243262023331941781, 4213737472390389773, -4046361821170607634, 
8877904009116429296, -6931048276693039052, 4838006612846181604, 
-5561480934050473057, -470112649587309682, 3175935810873308999, 
-1693695808908080717, -3753035103371291265, -260741269584337, 
-8454963020263227780, 2037428931895594762, 1158209127301347406, 
-8092787384269386871, -7741092217712244823,