listen to colin's advice, avoid the temptation of anti-patterns. On Sat, Jan 3, 2015 at 6:10 PM, Colin <colpcl...@gmail.com> wrote:
> Use a message bus with a transactional get, get the message, send to > cassandra, upon write success, submit to esp, commit get on bus. Messaging > systems like rabbitmq support this semantic. > > Using cassandra as a queuing mechanism is an anti-pattern. > > -- > *Colin Clark* > +1-320-221-9531 > > > On Jan 3, 2015, at 6:07 PM, Hugo José Pinto <hugo.pi...@inovaworks.com> > wrote: > > Thank you all for your answers. > > It seems I'll have to go with some event-driven processing before/during > the Cassandra write path. > > My concern would be that I'd love to first guarantee the disk write of the > Cassandra persistence and then do the event processing (which is mostly > CRUD intercepts at this point), even if slightly delayed, and doing so via > triggers would probably bog down the whole processing pipeline. > > What I'd probably do is to write, in trigger, a separate key table with > all the CRUDed elements and to have the ESP process that table. > > Thank you for your contribution. Should anyone else have any experiende > experience in these scenarios I'm obviously all ears as well. > > Best, > > Hugo > > Enviado do meu iPhone > > No dia 03/01/2015, às 11:09, DuyHai Doan <doanduy...@gmail.com> escreveu: > > Hello Hugo > > I was facing the same kind of requirement from some users. Long story > short, below are the possible strategies with advantages and draw-backs of > each > > 1) Put Spark in front of the back-end, every incoming > modification/update/insert goes into Spark first, then Spark will forward > it to Cassandra for persistence. With Spark, you can perform pre or > post-processing and notify external clients of mutation. > > The draw back of this solution is that all the incoming mutations must go > through Spark. You may set up a Kafka queue as temporary storage to > distribute the load and consume mutations with Spark but it add ups to the > architecture complexity with additional components & technologies > > 2) For high availability and resilience, you probably want to have all > mutations saved first into Cassandra then process notifications with Spark. > In this case the only way to have notifications from Cassandra, as of > version 2.1, is to rely on manually coded triggers (which is still > experimental feature). > > With the triggers you can notify whatever clients you want, not only Spark. > > The big draw back of this solution is that playing with triggers is > dangerous if you are not familiar with Cassandra internals. Indeed the > trigger is on the write path and may hurt performance if you are doing > complex and blocking tasks. > > That's the 2 solutions I can see, maybe the ML members will propose other > innovative choices > > Regards > > On Sat, Jan 3, 2015 at 11:46 AM, Hugo José Pinto < > hugo.pi...@inovaworks.com> wrote: > >> Hello. >> >> We're currently using Hazelcast (http://hazelcast.org/) as a distributed >> in-memory data grid. That's been working sort-of-well for us, but going >> solely in-memory has exhausted its path in our use case, and we're >> considering porting our application to a NoSQL persistent store. After the >> usual comparisons and evaluations, we're borderline close to picking >> Cassandra, plus eventually Spark for analytics. >> >> Nonetheless, there is a gap in our architectural needs that we're still >> not grasping how to solve in Cassandra (with or without Spark): Hazelcast >> allows us to create a Continuous Query in that, whenever a row is >> added/removed/modified from the clause's resultset, Hazelcast calls up back >> with the corresponding notification. We use this to continuously update the >> clients via AJAX streaming with the new/changed rows. >> >> This is probably a conceptual mismatch we're making, so - how to best >> address this use case in Cassandra (with or without Spark's help)? Is there >> something in the API that allows for Continuous Queries on key/clause >> changes (haven't found it)? Is there some other way to get a stream of >> key/clause updates? Events of some sort? >> >> I'm aware that we could, eventually, periodically poll Cassandra, but in >> our use case, the client is potentially interested in a large number of >> table clause notifications (think "all changes to Ship positions on >> California's coastline"), and iterating out of the store would kill the >> streamer's scalability. >> >> Hence, the magic question: what are we missing? Is Cassandra the wrong >> tool for the job? Are we not aware of a particular part of the API or >> external library in/outside the apache realm that would allow for this? >> >> Many thanks for any assistance! >> >> Hugo >> > >