RE: Node failure Due To Very high GC pause time

Durity, Sean R Thu, 13 Jul 2017 07:50:18 -0700

I like Bryan’s terminology of an “antagonistic use case.” If I am reading this 
correctly, you are putting 5 (or 10) million records in a partition and then 
trying to delete them in the same order they are stored. This is not a good 
data model for Cassandra, in fact a dangerous data model. That partition will 
reside completely on one node (and a number of replicas). Then, you are forcing 
the reads to wade through all the tombstones to get to the undeleted records – 
all on the same nodes. This cannot scale to the scope you want.

For a distributed data store, you want the data distributed across all of your 
cluster. And you want to delete whole partitions, if at all possible. (Or at 
least a reasonable number of deletes within a partition.)

Sean Durity
From: Karthick V [mailto:karthick...@zohocorp.com]
Sent: Monday, July 03, 2017 12:47 PM
To: user <user@cassandra.apache.org>
Subject: Re: Node failure Due To Very high GC pause time

Hi Bryan,

            Thanks for your quick response.  We have already tuned our memory 
and GC based on our hardware specification and it was working fine until 
yesterday, i.e before facing the below specified delete request. As you 
specified we will once again look into our GC & memory configuration.

FYKI :  We are using memtable_allocation_typ as offheap_objects.

Consider the following table

CREATE TABLE  EmployeeDetails (
    branch_id text,
    department_id  text,
    emp_id bigint,
    emp_details text,
    PRIMARY KEY (branch, department, emp_id)
) WITH CLUSTERING ORDER BY (department ASC, emp_id ASC)

    In this table I have 10 million records for the a particular branch_id and 
department_id . And following are the list of operation which I perform in C* 
in chronological order

  1.  Deleting 5 million records, from the start, in batches of 500 records per 
request for the particular branch_id (say 'xxx' ) and department_id (say 'yyy')
  2.  Read the next 500 records as soon the above delete operation is being 
completed ( Select * from EmployeeDetails where branch_id='xxx' and 
department_id = 'yyy' and emp_id >50000000 limit 500 )

It's only after executing the above read request there was a spike in memory 
and within few minutes the node has been marked down.

So my question here is , will the above read request will load all the deleted 
5 million records in my memory before it starts fetching or will it jump 
directly to the offset of 50000001 record (since we have specified the greater 
than condition) ? If its going to the former case then for sure the read 
request will keep the data in main memory and performs merge operation before 
it delivers the data as per this wiki( 
https://wiki.apache.org/cassandra/ReadPathForUsers<https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.apache.org_cassandra_ReadPathForUsers&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=vKUn5NzE_8ZTSnpd-nZm4PEv1cHlVlCWislk0ZFzuqM&s=zumGOb2d0jimG7vaGzRDMd8wnODr8sp55zh1KVURl2I&e=>
 ). If not let me know how the above specified read request will provide me the 
data .

Note : And also while analyzing my heap dump its clear that majority of the 
memory is being held my Tombstone threads.

Thanks in advance
-- karthick

---- On Mon, 03 Jul 2017 20:40:10 +0530 Bryan Cheng 
<br...@blockcypher.com<mailto:br...@blockcypher.com>> wrote ----

This is a very antagonistic use case for Cassandra :P I assume you're familiar 
with Cassandra and deletes? (eg. 
http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__thelastpickle.com_blog_2016_07_27_about-2Ddeletes-2Dand-2Dtombstones.html&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=vKUn5NzE_8ZTSnpd-nZm4PEv1cHlVlCWislk0ZFzuqM&s=3UuijdoAetFFXUGpk68hBRTkeLcm5sPORFJgGnF1Axw&e=>,

http://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_about_deletes_c.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.datastax.com_en_cassandra_2.1_cassandra_dml_dml-5Fabout-5Fdeletes-5Fc.html&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=vKUn5NzE_8ZTSnpd-nZm4PEv1cHlVlCWislk0ZFzuqM&s=YPXQsUTjN7jh0ugK8zhzF1D3Z2ANjEP_Kv2Bm38EAaY&e=>)

That being said, are you giving enough time for your tables to flush to disk? 
Deletes generate markers which can and will consume memory until they have a 
chance to be flushed, after which they will impact query time and performance 
(but should relieve memory pressure). If you're saturating the capability of 
your nodes your tables will have difficulty flushing. See 
http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_memtable_thruput_c.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.datastax.com_en_cassandra_2.1_cassandra_operations_ops-5Fmemtable-5Fthruput-5Fc.html&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=vKUn5NzE_8ZTSnpd-nZm4PEv1cHlVlCWislk0ZFzuqM&s=fBEo99NK2d4zw-aIzj-l5UMxq-tp2n-xwYfIKXGsyLQ&e=>.

This could also be a heap/memory configuration issue as well or a GC tuning 
issue (although unlikely if you've left those at default)

--Bryan

On Mon, Jul 3, 2017 at 7:51 AM, Karthick V 
<karthick...@zohocorp.com<mailto:karthick...@zohocorp.com>> wrote:

Hi,

      Recently In my test Cluster I faced a outrageous GC activity which made 
the Node unreachable inside the cluster itself.

Scenario :
      In a Partition of 5Million rows we read first 500 (by giving the starting 
range) and delete the same 500 again.The same has been done recursively by 
changing the Start range alone. Initially I didn't see any difference in the 
query performance ( upto 50,000) but later I observed a significant increase in 
performance when reached about a 3.3Million the read request failed and the 
node went unreachable. After analysing my GC logs it is clear that 99% of my 
old-memory space is occupied and there are no more space for allocation it 
caused the machine stall.
       here my is doubt is that does all the deleted 3.3Million row will be 
loaded in my on-heap memory? if not what will be object that occupying those 
memory ?.

PS : I am using C* 2.1.13 in cluster.

________________________________

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

RE: Node failure Due To Very high GC pause time

Reply via email to