[jira] [Comment Edited] (CASSANDRA-10937) OOM on multiple nodes on write load (v. 3.0.0), problem also present on DSE-4.8.3, but there it survives more time

Peter Kovgan (JIRA) Mon, 18 Jan 2016 09:57:54 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-10937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105572#comment-15105572
 ]


Peter Kovgan edited comment on CASSANDRA-10937 at 1/18/16 5:56 PM:
-------------------------------------------------------------------

I will go through all the recommendations and I will provide more detailed 
picture.
Trust me , I look for the most sane reason to continue use cassandra and I will 
reduce load
to find the acceptable, optimal one. I do my best!!! If load reduction helps - 
we will use cassandra.

But short answer is that:
I have no problem with OOM itself, assuming OOM is just a failure indicator for 
the case of too intensive load.
My problem is this indication is too delayed in time. (48, 89 hours!!! - 
depends on heap size)

There are clear signs that % of "IO wait" grows gradually throughout all the 
test.
(sar metrics tell, that %iowait is progressively growing - time that CPU spends 
in IO grows day to day...gradually, slowly)
It looks like problems accumulate and grow with time.

I'm not claiming this is not workable system.
I'm claiming it looks like a memory leak, based on unrestricted publishers and 
progressively growing IO demand.
The "exit" becomes narrower and narrower and "entrance" remains the same. 

For me the best solution is to find a reason of the progressive(growing) IO and 
prevent this issue.
How to prevent it (by restricting publishers, by optimizing IO, etc) I do not 
know.

But I would be more happy to fail in first 1-5 hours , when most of cassandra 
processes have repeated multiple times and system had a chance to estimate its 
load and make conclusions.
(even friendly warning "you need a rescue node, otherwise you will become 
unstable" is a solution)

When it fails after 4 days of test, when everybody shanked hands and greeted 
each other "we like it", it is another matter.
You should not convince me, because I'm a positive opportunist, I will find a 
way.
But some people will be frustrated. That's why I would take it seriously. 
 
 



was (Author: tierhetze):
I will go through all the recommendations and I will provide more detailed 
picture.
Trust me , I look for the most sane reason to continue use cassandra and I will 
reduce load
to find the acceptable, optimal one. I do my best!!! If load reduction helps - 
we will use cassandra.

But short answer is that:
I have no problem with OOM itself, assuming OOM is just a failure indicator for 
the case of too intensive load.
My problem is this indication is too delayed in time. (48, 89 hours!!! - 
depends on heap size)

There are clear signs that % of "IO wait" grows gradually throughout all the 
test.
(sar metrics tell, that %iowait is progressively growing - time that CPU spends 
in IO grows day to day...gradually, slowly)
It looks like problems accumulate and grow with time.

I'm not claiming this is not workable system.
I'm claiming it looks like a memory leak, based on unrestricted publishers and 
progressively growing IO demand.
The "exit" becomes narrower and narrower and "entrance" remains the same. 

For me the best solution is to find a reason of the progressive(growing) IO and 
prevent this issue.
How to prevent it (by restricting publishers, by optimizing IO, etc) I do not 
know.

But I would be more happy to fail in first 1-5 hours , when most of cassandra 
processes have repeated multiple times and system had a chance to estimate its 
load and make conclusions.

When it fails after 4 days of test, when everybody shanked hands and greeted 
each other "we like it", it is another matter.
You should not convince me, because I'm a positive opportunist, I will find a 
way.
But some people will be frustrated. That's why I would take it seriously. 
 
 


> OOM on multiple nodes on write load (v. 3.0.0), problem also present on 
> DSE-4.8.3, but there it survives more time
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-10937
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10937
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Cassandra : 3.0.0
> Installed as open archive, no connection to any OS specific installer.
> Java:
> Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
> OS :
> Linux version 2.6.32-431.el6.x86_64 
> ([email protected]) (gcc version 4.4.7 20120313 (Red 
> Hat 4.4.7-4) (GCC) ) #1 SMP Sun Nov 10 22:19:54 EST 2013
> We have:
> 8 guests ( Linux OS as above) on 2 (VMWare managed) physical hosts. Each 
> physical host keeps 4 guests.
> Physical host parameters(shared by all 4 guests):
> Model: HP ProLiant DL380 Gen9
> Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
> 46 logical processors.
> Hyperthreading - enabled
> Each guest assigned to have:
> 1 disk 300 Gb for seq. log (NOT SSD)
> 1 disk 4T for data (NOT SSD)
> 11 CPU cores
> Disks are local, not shared.
> Memory on each host -  24 Gb total.
> 8 (or 6, tested both) Gb - cassandra heap
> (lshw and cpuinfo attached in file test2.rar)
>            Reporter: Peter Kovgan
>            Priority: Critical
>         Attachments: gc-stat.txt, more-logs.rar, some-heap-stats.rar, 
> test2.rar, test3.rar, test4.rar, test5.rar, test_2.1.rar, 
> test_2.1_logs_older.rar, test_2.1_restart_attempt_log.rar
>
>
> 8 cassandra nodes.
> Load test started with 4 clients(different and not equal machines), each 
> running 1000 threads.
> Each thread assigned in round-robin way to run one of 4 different inserts. 
> Consistency->ONE.
> I attach the full CQL schema of tables and the query of insert.
> Replication factor - 2:
> create keyspace OBLREPOSITORY_NY with replication = 
> {'class':'NetworkTopologyStrategy','NY':2};
> Initiall throughput is:
> 215.000  inserts /sec
> or
> 54Mb/sec, considering single insert size a bit larger than 256byte.
> Data:
> all fields(5-6) are short strings, except one is BLOB of 256 bytes.
> After about a 2-3 hours of work, I was forced to increase timeout from 2000 
> to 5000ms, for some requests failed for short timeout.
> Later on(after aprox. 12 hous of work) OOM happens on multiple nodes.
> (all failed nodes logs attached)
> I attach also java load client and instructions how set-up and use 
> it.(test2.rar)
> Update:
> Later on test repeated with lesser load (100000 mes/sec) with more relaxed 
> CPU (idle 25%), with only 2 test clients, but anyway test failed.
> Update:
> DSE-4.8.3 also failed on OOM (3 nodes from 8), but here it survived 48 hours, 
> not 10-12.
> Attachments:
> test2.rar -contains most of material
> more-logs.rar - contains additional nodes logs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-10937) OOM on multiple nodes on write load (v. 3.0.0), problem also present on DSE-4.8.3, but there it survives more time

Reply via email to