Re: One time major deletion/purge vs periodic deletion

2018-03-20 Thread Carl Mueller
It's possible you'll run into compaction headaches. Likely actually.

If you have time-bucketed purge/archives, I'd implement a time bucketing
strategy using rotating tables dedicated to a time period so that when an
entire table is ready for archiving you just snapshot its sstables and then
TRUNCATE/nuke the time bucket table.

Queries that span buckets and calculating the table to target on inserts
are a major pain in the ass, but at scale you'll probably want to consider
dingo something like this.

On Wed, Mar 7, 2018 at 8:19 PM, kurt greaves  wrote:

> The important point to consider is whether you are deleting old data or
> recently written data. How old/recent depends on your write rate to the
> cluster and there's no real formula. Basically you want to avoid deleting a
> lot of old data all at once because the tombstones will end up in new
> SSTables and the data to be deleted will live in higher levels (LCS) or
> large SSTables (STCS), which won't get compacted together for a long time.
> In this case it makes no difference if you do a big purge or if you break
> it up, because at the end of the day if your big purge is just old data,
> all the tombstones will have to stick around for awhile until they make it
> to the higher levels/bigger SSTables.
>
> If you have to purge large amounts of old data, the easiest way is to 1.
> Make sure you have at least 50% disk free (for large/major compactions)
> and/or 2. Use garbagecollect compactions (3.10+)
> ​
>


Re: One time major deletion/purge vs periodic deletion

2018-03-07 Thread kurt greaves
The important point to consider is whether you are deleting old data or
recently written data. How old/recent depends on your write rate to the
cluster and there's no real formula. Basically you want to avoid deleting a
lot of old data all at once because the tombstones will end up in new
SSTables and the data to be deleted will live in higher levels (LCS) or
large SSTables (STCS), which won't get compacted together for a long time.
In this case it makes no difference if you do a big purge or if you break
it up, because at the end of the day if your big purge is just old data,
all the tombstones will have to stick around for awhile until they make it
to the higher levels/bigger SSTables.

If you have to purge large amounts of old data, the easiest way is to 1.
Make sure you have at least 50% disk free (for large/major compactions)
and/or 2. Use garbagecollect compactions (3.10+)
​


Re: One time major deletion/purge vs periodic deletion

2018-03-07 Thread Rahul Singh
Charu,

I am aware of what type of things you are trying to do and why. Not sure if DCS 
will solve your problem. Consider a process that identifies the data that needs 
to be deleted and sets a TTL on that row or cell sometime in the future such as 
10 days.

The process could be run daily , hourly, etc. depending on the volume but it 
would spread out the actual deletes.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Mar 7, 2018, 3:26 AM -0500, Ben Slater <ben.sla...@instaclustr.com>, wrote:
> I would say you are better off spreading out the deletes so compactions have 
> the best chance of actually removing them from disk before they become a 
> problem. You will likely need to pay close attempting to compaction strategy 
> tuning.
>
> I don’t have any personal experience with it but you may also want to check 
> out deleting compaction strategy to see if it works for your use case: 
> https://github.com/protectwise/cassandra-util/tree/master/deleting-compaction-strategy
>
> Cheers
> Ben
>
> > On Wed, 7 Mar 2018 at 17:19 Charulata Sharma (charshar) 
> > <chars...@cisco.com> wrote:
> > > Well it’s not like that. We don’t just purge. There are business rules 
> > > which will decide the records to be purged or archived and then purged, 
> > > so cannot rely on TTL.
> > >
> > > Thanks,
> > > Charu
> > >
> > > From: Jens Rantil <jens.ran...@tink.se>
> > > Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
> > > Date: Tuesday, March 6, 2018 at 12:34 AM
> > > To: "user@cassandra.apache.org" <user@cassandra.apache.org>
> > > Subject: Re: One time major deletion/purge vs periodic deletion
> > >
> > > Sounds like you are using Cassandra as a queue. It's an antibiotic 
> > > pattern. What I would do would be to rely on TTL for removal of data and 
> > > use the TWCS compaction strategy to handle removal and you just focus on 
> > > insertion.
> > > On Tue, Mar 6, 2018, 07:39 Charulata Sharma (charshar) 
> > > <chars...@cisco.com> wrote:
> > > > quote_type
> > > > Hi,
> > > >
> > > >   Wanted the community’s feedback on deciding the schedule of 
> > > > Archive and Purge job.
> > > > Is it better to Purge a large volume of data at regular intervals (like 
> > > > run A jobs once in 3 months ) or purge smaller amounts more 
> > > > frequently (run the job weekly??)
> > > >
> > > > Some estimates on the number of deletes performed would be…upto 80-90K  
> > > > rows purged in 3 months vs 10K deletes every week ??
> > > >
> > > > Thanks,
> > > > Charu
> > > >
> > > --
> > > Jens Rantil
> > > Backend Developer @ Tink
> > > Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
> > > For urgent matters you can reach me at +46-708-84 18 32.
> --
> Ben Slater
> Chief Product Officer
>
>
> Read our latest technical blog posts here.
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia) 
> and Instaclustr Inc (USA).
> This email and any attachments may contain confidential and legally 
> privileged information.  If you are not the intended recipient, do not copy 
> or disclose its content, but please reply to this email immediately and 
> highlight the error to the sender and then immediately delete the message.


Re: One time major deletion/purge vs periodic deletion

2018-03-07 Thread Ben Slater
I would say you are better off spreading out the deletes so compactions
have the best chance of actually removing them from disk before they become
a problem. You will likely need to pay close attempting to compaction
strategy tuning.

I don’t have any personal experience with it but you may also want to check
out deleting compaction strategy to see if it works for your use case:
https://github.com/protectwise/cassandra-util/tree/master/deleting-compaction-strategy

Cheers
Ben

On Wed, 7 Mar 2018 at 17:19 Charulata Sharma (charshar) <chars...@cisco.com>
wrote:

> Well it’s not like that. We don’t just purge. There are business rules
> which will decide the records to be purged or archived and then purged, so
> cannot rely on TTL.
>
>
>
> Thanks,
>
> Charu
>
>
>
> *From: *Jens Rantil <jens.ran...@tink.se>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Tuesday, March 6, 2018 at 12:34 AM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Re: One time major deletion/purge vs periodic deletion
>
>
>
> Sounds like you are using Cassandra as a queue. It's an antibiotic
> pattern. What I would do would be to rely on TTL for removal of data and
> use the TWCS compaction strategy to handle removal and you just focus on
> insertion.
>
> On Tue, Mar 6, 2018, 07:39 Charulata Sharma (charshar) <chars...@cisco.com>
> wrote:
>
> Hi,
>
>
>
>   Wanted the community’s feedback on deciding the schedule of Archive
> and Purge job.
>
> Is it better to Purge a large volume of data at regular intervals (like
> run A jobs once in 3 months ) or purge smaller amounts more frequently
> (run the job weekly??)
>
>
>
> Some estimates on the number of deletes performed would be…upto 80-90K
>  rows purged in 3 months vs 10K deletes every week ??
>
>
>
> Thanks,
>
> Charu
>
>
>
> --
>
> Jens Rantil
> Backend Developer @ Tink
>
> Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
> <https://maps.google.com/?q=Wallingatan+5,+111+60+Stockholm,+Sweden=gmail=g>
> For urgent matters you can reach me at +46-708-84 18 32.
>
-- 


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: One time major deletion/purge vs periodic deletion

2018-03-06 Thread Charulata Sharma (charshar)
Well it’s not like that. We don’t just purge. There are business rules which 
will decide the records to be purged or archived and then purged, so cannot 
rely on TTL.

Thanks,
Charu

From: Jens Rantil <jens.ran...@tink.se>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Tuesday, March 6, 2018 at 12:34 AM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: One time major deletion/purge vs periodic deletion

Sounds like you are using Cassandra as a queue. It's an antibiotic pattern. 
What I would do would be to rely on TTL for removal of data and use the TWCS 
compaction strategy to handle removal and you just focus on insertion.
On Tue, Mar 6, 2018, 07:39 Charulata Sharma (charshar) 
<chars...@cisco.com<mailto:chars...@cisco.com>> wrote:
Hi,

  Wanted the community’s feedback on deciding the schedule of Archive and 
Purge job.
Is it better to Purge a large volume of data at regular intervals (like run A 
jobs once in 3 months ) or purge smaller amounts more frequently (run the job 
weekly??)

Some estimates on the number of deletes performed would be…upto 80-90K  rows 
purged in 3 months vs 10K deletes every week ??

Thanks,
Charu

--

Jens Rantil
Backend Developer @ Tink

Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.


Re: One time major deletion/purge vs periodic deletion

2018-03-06 Thread Jens Rantil
Sounds like you are using Cassandra as a queue. It's an antibiotic pattern.
What I would do would be to rely on TTL for removal of data and use the
TWCS compaction strategy to handle removal and you just focus on insertion.

On Tue, Mar 6, 2018, 07:39 Charulata Sharma (charshar) 
wrote:

> Hi,
>
>
>
>   Wanted the community’s feedback on deciding the schedule of Archive
> and Purge job.
>
> Is it better to Purge a large volume of data at regular intervals (like
> run A jobs once in 3 months ) or purge smaller amounts more frequently
> (run the job weekly??)
>
>
>
> Some estimates on the number of deletes performed would be…upto 80-90K
>  rows purged in 3 months vs 10K deletes every week ??
>
>
>
> Thanks,
>
> Charu
>
>
>
-- 

Jens Rantil
Backend Developer @ Tink

Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.