Re: Which compaction strategy when modeling a dumb set

2017-02-27 Thread Benjamin Roth
This is not a queue pattern and I'd recommend LCS for better read
performance.

2017-02-27 16:06 GMT+01:00 Rakesh Kumar <rakeshkumar...@outlook.com>:

> Do you update this table when an event is processed?  If yes, it is
> considered a good practice for Cassandra.  I read somewhere that using
> Cassandra as a queuing table is anti pattern.
> 
> From: Vincent Rischmann <m...@vrischmann.me>
> Sent: Friday, February 24, 2017 06:24
> To: user@cassandra.apache.org
> Subject: Which compaction strategy when modeling a dumb set
>
> Hello,
>
> I'm using a table like this:
>
>CREATE TABLE myset (id uuid PRIMARY KEY)
>
> which is basically a set I use for deduplication, id is a unique id for an
> event, when I process the event I insert the id, and before processing I
> check if it has already been processed for deduplication.
>
> It works well enough, but I'm wondering which compaction strategy I should
> use. I expect maybe 1% or less of events will end up duplicated (thus not
> generating an insert), so the workload will probably be 50% writes 50% read.
>
> Is LCS a good strategy here or should I stick with STCS ?
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: Which compaction strategy when modeling a dumb set

2017-02-27 Thread Rakesh Kumar
typo: " If yes, it is considered a good practice for Cassandra"
should read as
" If yes, is it considered a good practice for Cassandra ?" 

From: Rakesh Kumar <rakeshkumar...@outlook.com>
Sent: Monday, February 27, 2017 10:06
To: user@cassandra.apache.org
Subject: Re: Which compaction strategy when modeling a dumb set

Do you update this table when an event is processed?  If yes, it is considered 
a good practice for Cassandra.  I read somewhere that using Cassandra as a 
queuing table is anti pattern.

From: Vincent Rischmann <m...@vrischmann.me>
Sent: Friday, February 24, 2017 06:24
To: user@cassandra.apache.org
Subject: Which compaction strategy when modeling a dumb set

Hello,

I'm using a table like this:

   CREATE TABLE myset (id uuid PRIMARY KEY)

which is basically a set I use for deduplication, id is a unique id for an 
event, when I process the event I insert the id, and before processing I check 
if it has already been processed for deduplication.

It works well enough, but I'm wondering which compaction strategy I should use. 
I expect maybe 1% or less of events will end up duplicated (thus not generating 
an insert), so the workload will probably be 50% writes 50% read.

Is LCS a good strategy here or should I stick with STCS ?


Re: Which compaction strategy when modeling a dumb set

2017-02-27 Thread Rakesh Kumar
Do you update this table when an event is processed?  If yes, it is considered 
a good practice for Cassandra.  I read somewhere that using Cassandra as a 
queuing table is anti pattern.

From: Vincent Rischmann <m...@vrischmann.me>
Sent: Friday, February 24, 2017 06:24
To: user@cassandra.apache.org
Subject: Which compaction strategy when modeling a dumb set

Hello,

I'm using a table like this:

   CREATE TABLE myset (id uuid PRIMARY KEY)

which is basically a set I use for deduplication, id is a unique id for an 
event, when I process the event I insert the id, and before processing I check 
if it has already been processed for deduplication.

It works well enough, but I'm wondering which compaction strategy I should use. 
I expect maybe 1% or less of events will end up duplicated (thus not generating 
an insert), so the workload will probably be 50% writes 50% read.

Is LCS a good strategy here or should I stick with STCS ?


Re: Which compaction strategy when modeling a dumb set

2017-02-27 Thread Vincent Rischmann
No I don't store events in Cassandra.



The real thing I'm doing is couting stuff: each event has a type, a user
associated with it, some other metadata. When I process an event I need
to increment those counters only if the event hasn't already been
processed. Our input event stream is Kafka and it's not uncommon that we
get the same event twice, due to our clients app not being reliable.


Right now I haven't found a good solution to this that doesn't involve a
read before write, but I'd love to hear your suggestions




On Mon, Feb 27, 2017, at 12:01 PM, Vladimir Yudovin wrote:

> Do you also store events in Cassandra? If yes, why not to add
> "processed" flag to existing table(s), and fetch non-processed events
> with single SELECT?
> 

> Best regards, Vladimir Yudovin, 

> *Winguzone[1] - Cloud Cassandra Hosting*

> 

> 

>  On Fri, 24 Feb 2017 06:24:09 -0500 *Vincent Rischmann
> * wrote 
> 

>> Hello,

>> 

>> I'm using a table like this:

>> 

>>CREATE TABLE myset (id uuid PRIMARY KEY)

>> 

>> which is basically a set I use for deduplication, id is a unique id
>> for an event, when I process the event I insert the id, and before
>> processing I check if it has already been processed for
>> deduplication.
>> 

>> It works well enough, but I'm wondering which compaction strategy I
>> should use. I expect maybe 1% or less of events will end up
>> duplicated (thus not generating an insert), so the workload will
>> probably be 50% writes 50% read.
>> 

>> Is LCS a good strategy here or should I stick with STCS ?

> 




Links:

  1. https://winguzone.com?from=list


Re: Which compaction strategy when modeling a dumb set

2017-02-27 Thread Vladimir Yudovin
Do you also store events in Cassandra? If yes, why not to add "processed" flag 
to existing table(s), and fetch non-processed events with single SELECT?



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Fri, 24 Feb 2017 06:24:09 -0500 Vincent Rischmann 
m...@vrischmann.me wrote 




Hello,



I'm using a table like this:



   CREATE TABLE myset (id uuid PRIMARY KEY)



which is basically a set I use for deduplication, id is a unique id for an 
event, when I process the event I insert the id, and before processing I check 
if it has already been processed for deduplication.



It works well enough, but I'm wondering which compaction strategy I should use. 
I expect maybe 1% or less of events will end up duplicated (thus not generating 
an insert), so the workload will probably be 50% writes 50% read.



Is LCS a good strategy here or should I stick with STCS ?








Re: Which compaction strategy when modeling a dumb set

2017-02-24 Thread kurt greaves
Probably LCS although what you're implying (read before write) is an
anti-pattern in Cassandra. Something like this is a good indicator that you
should review your model.
​


Which compaction strategy when modeling a dumb set

2017-02-24 Thread Vincent Rischmann
Hello,



I'm using a table like this:



   CREATE TABLE myset (id uuid PRIMARY KEY)



which is basically a set I use for deduplication, id is a unique id for
an event, when I process the event I insert the id, and before
processing I check if it has already been processed for deduplication.


It works well enough, but I'm wondering which compaction strategy I
should use. I expect maybe 1% or less of events will end up duplicated
(thus not generating an insert), so the workload will probably be 50%
writes 50% read.


Is LCS a good strategy here or should I stick with STCS ?