Re: Which compaction strategy when modeling a dumb set
This is not a queue pattern and I'd recommend LCS for better read performance. 2017-02-27 16:06 GMT+01:00 Rakesh Kumar <rakeshkumar...@outlook.com>: > Do you update this table when an event is processed? If yes, it is > considered a good practice for Cassandra. I read somewhere that using > Cassandra as a queuing table is anti pattern. > > From: Vincent Rischmann <m...@vrischmann.me> > Sent: Friday, February 24, 2017 06:24 > To: user@cassandra.apache.org > Subject: Which compaction strategy when modeling a dumb set > > Hello, > > I'm using a table like this: > >CREATE TABLE myset (id uuid PRIMARY KEY) > > which is basically a set I use for deduplication, id is a unique id for an > event, when I process the event I insert the id, and before processing I > check if it has already been processed for deduplication. > > It works well enough, but I'm wondering which compaction strategy I should > use. I expect maybe 1% or less of events will end up duplicated (thus not > generating an insert), so the workload will probably be 50% writes 50% read. > > Is LCS a good strategy here or should I stick with STCS ? > -- Benjamin Roth Prokurist Jaumo GmbH · www.jaumo.com Wehrstraße 46 · 73035 Göppingen · Germany Phone +49 7161 304880-6 · Fax +49 7161 304880-1 AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
Re: Which compaction strategy when modeling a dumb set
typo: " If yes, it is considered a good practice for Cassandra" should read as " If yes, is it considered a good practice for Cassandra ?" From: Rakesh Kumar <rakeshkumar...@outlook.com> Sent: Monday, February 27, 2017 10:06 To: user@cassandra.apache.org Subject: Re: Which compaction strategy when modeling a dumb set Do you update this table when an event is processed? If yes, it is considered a good practice for Cassandra. I read somewhere that using Cassandra as a queuing table is anti pattern. From: Vincent Rischmann <m...@vrischmann.me> Sent: Friday, February 24, 2017 06:24 To: user@cassandra.apache.org Subject: Which compaction strategy when modeling a dumb set Hello, I'm using a table like this: CREATE TABLE myset (id uuid PRIMARY KEY) which is basically a set I use for deduplication, id is a unique id for an event, when I process the event I insert the id, and before processing I check if it has already been processed for deduplication. It works well enough, but I'm wondering which compaction strategy I should use. I expect maybe 1% or less of events will end up duplicated (thus not generating an insert), so the workload will probably be 50% writes 50% read. Is LCS a good strategy here or should I stick with STCS ?
Re: Which compaction strategy when modeling a dumb set
Do you update this table when an event is processed? If yes, it is considered a good practice for Cassandra. I read somewhere that using Cassandra as a queuing table is anti pattern. From: Vincent Rischmann <m...@vrischmann.me> Sent: Friday, February 24, 2017 06:24 To: user@cassandra.apache.org Subject: Which compaction strategy when modeling a dumb set Hello, I'm using a table like this: CREATE TABLE myset (id uuid PRIMARY KEY) which is basically a set I use for deduplication, id is a unique id for an event, when I process the event I insert the id, and before processing I check if it has already been processed for deduplication. It works well enough, but I'm wondering which compaction strategy I should use. I expect maybe 1% or less of events will end up duplicated (thus not generating an insert), so the workload will probably be 50% writes 50% read. Is LCS a good strategy here or should I stick with STCS ?
Re: Which compaction strategy when modeling a dumb set
No I don't store events in Cassandra. The real thing I'm doing is couting stuff: each event has a type, a user associated with it, some other metadata. When I process an event I need to increment those counters only if the event hasn't already been processed. Our input event stream is Kafka and it's not uncommon that we get the same event twice, due to our clients app not being reliable. Right now I haven't found a good solution to this that doesn't involve a read before write, but I'd love to hear your suggestions On Mon, Feb 27, 2017, at 12:01 PM, Vladimir Yudovin wrote: > Do you also store events in Cassandra? If yes, why not to add > "processed" flag to existing table(s), and fetch non-processed events > with single SELECT? > > Best regards, Vladimir Yudovin, > *Winguzone[1] - Cloud Cassandra Hosting* > > > On Fri, 24 Feb 2017 06:24:09 -0500 *Vincent Rischmann >* wrote > >> Hello, >> >> I'm using a table like this: >> >>CREATE TABLE myset (id uuid PRIMARY KEY) >> >> which is basically a set I use for deduplication, id is a unique id >> for an event, when I process the event I insert the id, and before >> processing I check if it has already been processed for >> deduplication. >> >> It works well enough, but I'm wondering which compaction strategy I >> should use. I expect maybe 1% or less of events will end up >> duplicated (thus not generating an insert), so the workload will >> probably be 50% writes 50% read. >> >> Is LCS a good strategy here or should I stick with STCS ? > Links: 1. https://winguzone.com?from=list
Re: Which compaction strategy when modeling a dumb set
Do you also store events in Cassandra? If yes, why not to add "processed" flag to existing table(s), and fetch non-processed events with single SELECT? Best regards, Vladimir Yudovin, Winguzone - Cloud Cassandra Hosting On Fri, 24 Feb 2017 06:24:09 -0500 Vincent Rischmann m...@vrischmann.me wrote Hello, I'm using a table like this: CREATE TABLE myset (id uuid PRIMARY KEY) which is basically a set I use for deduplication, id is a unique id for an event, when I process the event I insert the id, and before processing I check if it has already been processed for deduplication. It works well enough, but I'm wondering which compaction strategy I should use. I expect maybe 1% or less of events will end up duplicated (thus not generating an insert), so the workload will probably be 50% writes 50% read. Is LCS a good strategy here or should I stick with STCS ?
Re: Which compaction strategy when modeling a dumb set
Probably LCS although what you're implying (read before write) is an anti-pattern in Cassandra. Something like this is a good indicator that you should review your model.
Which compaction strategy when modeling a dumb set
Hello, I'm using a table like this: CREATE TABLE myset (id uuid PRIMARY KEY) which is basically a set I use for deduplication, id is a unique id for an event, when I process the event I insert the id, and before processing I check if it has already been processed for deduplication. It works well enough, but I'm wondering which compaction strategy I should use. I expect maybe 1% or less of events will end up duplicated (thus not generating an insert), so the workload will probably be 50% writes 50% read. Is LCS a good strategy here or should I stick with STCS ?