Are Triggers in Cassandra 2.1.2 performace Hog??

2015-01-07 Thread Asit KAUSHIK
HI All,

We are trying to integrate elasticsearch with Cassandra and as the river
plugin uses select * from any table it seems to be bad performance choice.
So i was thinking of inserting into elasticsearch using Cassandra trigger.
So i wanted your view does a Cassandra Trigger impacts the performance of
read/Write of Cassandra.

Also any other way you guys achieve this please guide me. I am struck on
this .

Regards
Asit


Re: Are Triggers in Cassandra 2.1.2 performace Hog??

2015-01-07 Thread Ken Hancock
When last I looked at Datastax Enterprise (DSE 3.0ish), it exhibits the
same problem that you highlight, no different than your good idea of
asynchronously pushing to ES.

Each Cassandra write was indexed independently by each server in the
replication group.  If a node timed out or a mutation was dropped, that
Solr node would have an out-of-sync index.  Doing a solr query such as
count(*) users could return inconsistent results depending on which node
you hit since solr didn't support Cassandra consistency levels.

I haven't seen any blog posts or docs as to whether this intrinsic mismatch
between how Cassandra handles eventual consistency and Solr has ever been
resolved.

Ken


On Wed, Jan 7, 2015 at 9:05 AM, DuyHai Doan doanduy...@gmail.com wrote:

 Be very very careful not to perform blocking calls to ElasticSearch in
 your trigger otherwise you will kill C* performance. The biggest danger of
 the triggers in their current state is that they are on the write path.

 In your trigger, you can try to push the mutation asynchronously to ES but
 in this case it will mean managing a thread pool and all related issues.

 Not even mentioning atomicity issues like: what happen if the update to ES
 fails  or the connection times out ? etc ...

 As an alternative, instead of implementing yourself the integration with
 ES, you can have a look at Datastax Enterprise integration of Cassandra
 with Apache Solr (not free) or some open-source alternatives like Stratio
 or TupleJump fork of Cassandra with Lucene integration.

 On Wed, Jan 7, 2015 at 2:40 PM, Asit KAUSHIK asitkaushikno...@gmail.com
 wrote:

 HI All,

 We are trying to integrate elasticsearch with Cassandra and as the river
 plugin uses select * from any table it seems to be bad performance choice.
 So i was thinking of inserting into elasticsearch using Cassandra trigger.
 So i wanted your view does a Cassandra Trigger impacts the performance of
 read/Write of Cassandra.

 Also any other way you guys achieve this please guide me. I am struck on
 this .

 Regards
 Asit





Re: Are Triggers in Cassandra 2.1.2 performace Hog??

2015-01-07 Thread DuyHai Doan
Be very very careful not to perform blocking calls to ElasticSearch in your
trigger otherwise you will kill C* performance. The biggest danger of the
triggers in their current state is that they are on the write path.

In your trigger, you can try to push the mutation asynchronously to ES but
in this case it will mean managing a thread pool and all related issues.

Not even mentioning atomicity issues like: what happen if the update to ES
fails  or the connection times out ? etc ...

As an alternative, instead of implementing yourself the integration with
ES, you can have a look at Datastax Enterprise integration of Cassandra
with Apache Solr (not free) or some open-source alternatives like Stratio
or TupleJump fork of Cassandra with Lucene integration.

On Wed, Jan 7, 2015 at 2:40 PM, Asit KAUSHIK asitkaushikno...@gmail.com
wrote:

 HI All,

 We are trying to integrate elasticsearch with Cassandra and as the river
 plugin uses select * from any table it seems to be bad performance choice.
 So i was thinking of inserting into elasticsearch using Cassandra trigger.
 So i wanted your view does a Cassandra Trigger impacts the performance of
 read/Write of Cassandra.

 Also any other way you guys achieve this please guide me. I am struck on
 this .

 Regards
 Asit




Re: Are Triggers in Cassandra 2.1.2 performace Hog??

2015-01-07 Thread Jack Krupansky
DSE does now have a queue to decouple Cassandra insert and Solr indexing.
It will block only when/if the queue is filled - you can configure the size
of the queue. So, to be clear, DSE no longer has the highlighted problem
mentioned for ES.

-- Jack Krupansky

On Wed, Jan 7, 2015 at 9:46 AM, Ken Hancock ken.hanc...@schange.com wrote:

 When last I looked at Datastax Enterprise (DSE 3.0ish), it exhibits the
 same problem that you highlight, no different than your good idea of
 asynchronously pushing to ES.

 Each Cassandra write was indexed independently by each server in the
 replication group.  If a node timed out or a mutation was dropped, that
 Solr node would have an out-of-sync index.  Doing a solr query such as
 count(*) users could return inconsistent results depending on which node
 you hit since solr didn't support Cassandra consistency levels.

 I haven't seen any blog posts or docs as to whether this intrinsic
 mismatch between how Cassandra handles eventual consistency and Solr has
 ever been resolved.

 Ken


 On Wed, Jan 7, 2015 at 9:05 AM, DuyHai Doan doanduy...@gmail.com wrote:

 Be very very careful not to perform blocking calls to ElasticSearch in
 your trigger otherwise you will kill C* performance. The biggest danger of
 the triggers in their current state is that they are on the write path.

 In your trigger, you can try to push the mutation asynchronously to ES
 but in this case it will mean managing a thread pool and all related issues.

 Not even mentioning atomicity issues like: what happen if the update to
 ES fails  or the connection times out ? etc ...

 As an alternative, instead of implementing yourself the integration with
 ES, you can have a look at Datastax Enterprise integration of Cassandra
 with Apache Solr (not free) or some open-source alternatives like Stratio
 or TupleJump fork of Cassandra with Lucene integration.

 On Wed, Jan 7, 2015 at 2:40 PM, Asit KAUSHIK asitkaushikno...@gmail.com
 wrote:

 HI All,

 We are trying to integrate elasticsearch with Cassandra and as the river
 plugin uses select * from any table it seems to be bad performance choice.
 So i was thinking of inserting into elasticsearch using Cassandra trigger.
 So i wanted your view does a Cassandra Trigger impacts the performance
 of read/Write of Cassandra.

 Also any other way you guys achieve this please guide me. I am struck on
 this .

 Regards
 Asit









Re: Are Triggers in Cassandra 2.1.2 performace Hog??

2015-01-07 Thread Ryan Svihla
@Ken So I actually support a lot of the DSE Search users and teach classes
on it, so as long as you're not dropping mutations you're in sync, and if
you're dropping mutations you're probably sized way too small anyway, and
once you run repair (which you should be doing anyway when dropping
mutations) you're back in sync. I actually think because of that the models
work well together.

FWIW the improvement since 3.0 is MASSIVE (it's been what I'd call stable
since 3.2.x and we're on 4.6 now)

@Asit to answer the ES question, it's not really for me to say at all what
the lag will be or to help in advising sizing of ES, so that's probably
more of a question for them.


On Wed, Jan 7, 2015 at 8:56 AM, Asit KAUSHIK asitkaushikno...@gmail.com
wrote:

 HI All,

 What i intend to do is on every write i would push the code to
 elasticsearch using the Trigger. I know it would impact the Cassandra write
 but  given that the WRITE is pretty performant on Cassandra would that lag
 be a big one.

 Also as per my information SOLR  has  limitation of using Nested JSON
 documents  which is elasticsearch does seamlessly and hence it was our
 preference.

 Please Let me know about you thought on this as we are struck on this and
 i am looking into Streaming Part of cassandra in hope that i can find
 something

 Regards
 Asit



 On Wed, Jan 7, 2015 at 8:16 PM, Ken Hancock ken.hanc...@schange.com
 wrote:

 When last I looked at Datastax Enterprise (DSE 3.0ish), it exhibits the
 same problem that you highlight, no different than your good idea of
 asynchronously pushing to ES.

 Each Cassandra write was indexed independently by each server in the
 replication group.  If a node timed out or a mutation was dropped, that
 Solr node would have an out-of-sync index.  Doing a solr query such as
 count(*) users could return inconsistent results depending on which node
 you hit since solr didn't support Cassandra consistency levels.

 I haven't seen any blog posts or docs as to whether this intrinsic
 mismatch between how Cassandra handles eventual consistency and Solr has
 ever been resolved.

 Ken


 On Wed, Jan 7, 2015 at 9:05 AM, DuyHai Doan doanduy...@gmail.com wrote:

 Be very very careful not to perform blocking calls to ElasticSearch in
 your trigger otherwise you will kill C* performance. The biggest danger of
 the triggers in their current state is that they are on the write path.

 In your trigger, you can try to push the mutation asynchronously to ES
 but in this case it will mean managing a thread pool and all related issues.

 Not even mentioning atomicity issues like: what happen if the update to
 ES fails  or the connection times out ? etc ...

 As an alternative, instead of implementing yourself the integration with
 ES, you can have a look at Datastax Enterprise integration of Cassandra
 with Apache Solr (not free) or some open-source alternatives like Stratio
 or TupleJump fork of Cassandra with Lucene integration.

 On Wed, Jan 7, 2015 at 2:40 PM, Asit KAUSHIK asitkaushikno...@gmail.com
  wrote:

 HI All,

 We are trying to integrate elasticsearch with Cassandra and as the
 river plugin uses select * from any table it seems to be bad performance
 choice. So i was thinking of inserting into elasticsearch using Cassandra
 trigger.
 So i wanted your view does a Cassandra Trigger impacts the performance
 of read/Write of Cassandra.

 Also any other way you guys achieve this please guide me. I am struck
 on this .

 Regards
 Asit










-- 

Thanks,
Ryan Svihla


Re: Are Triggers in Cassandra 2.1.2 performace Hog??

2015-01-07 Thread Asit KAUSHIK
HI All,

What i intend to do is on every write i would push the code to
elasticsearch using the Trigger. I know it would impact the Cassandra write
but  given that the WRITE is pretty performant on Cassandra would that lag
be a big one.

Also as per my information SOLR  has  limitation of using Nested JSON
documents  which is elasticsearch does seamlessly and hence it was our
preference.

Please Let me know about you thought on this as we are struck on this and i
am looking into Streaming Part of cassandra in hope that i can find
something

Regards
Asit



On Wed, Jan 7, 2015 at 8:16 PM, Ken Hancock ken.hanc...@schange.com wrote:

 When last I looked at Datastax Enterprise (DSE 3.0ish), it exhibits the
 same problem that you highlight, no different than your good idea of
 asynchronously pushing to ES.

 Each Cassandra write was indexed independently by each server in the
 replication group.  If a node timed out or a mutation was dropped, that
 Solr node would have an out-of-sync index.  Doing a solr query such as
 count(*) users could return inconsistent results depending on which node
 you hit since solr didn't support Cassandra consistency levels.

 I haven't seen any blog posts or docs as to whether this intrinsic
 mismatch between how Cassandra handles eventual consistency and Solr has
 ever been resolved.

 Ken


 On Wed, Jan 7, 2015 at 9:05 AM, DuyHai Doan doanduy...@gmail.com wrote:

 Be very very careful not to perform blocking calls to ElasticSearch in
 your trigger otherwise you will kill C* performance. The biggest danger of
 the triggers in their current state is that they are on the write path.

 In your trigger, you can try to push the mutation asynchronously to ES
 but in this case it will mean managing a thread pool and all related issues.

 Not even mentioning atomicity issues like: what happen if the update to
 ES fails  or the connection times out ? etc ...

 As an alternative, instead of implementing yourself the integration with
 ES, you can have a look at Datastax Enterprise integration of Cassandra
 with Apache Solr (not free) or some open-source alternatives like Stratio
 or TupleJump fork of Cassandra with Lucene integration.

 On Wed, Jan 7, 2015 at 2:40 PM, Asit KAUSHIK asitkaushikno...@gmail.com
 wrote:

 HI All,

 We are trying to integrate elasticsearch with Cassandra and as the river
 plugin uses select * from any table it seems to be bad performance choice.
 So i was thinking of inserting into elasticsearch using Cassandra trigger.
 So i wanted your view does a Cassandra Trigger impacts the performance
 of read/Write of Cassandra.

 Also any other way you guys achieve this please guide me. I am struck on
 this .

 Regards
 Asit









Re: Are Triggers in Cassandra 2.1.2 performace Hog??

2015-01-07 Thread Robert Coli
On Wed, Jan 7, 2015 at 5:40 AM, Asit KAUSHIK asitkaushikno...@gmail.com
wrote:

 We are trying to integrate elasticsearch with Cassandra and as the river
 plugin uses select * from any table it seems to be bad performance choice.
 So i was thinking of inserting into elasticsearch using Cassandra trigger.
 So i wanted your view does a Cassandra Trigger impacts the performance of
 read/Write of Cassandra.


I would not use triggers in production in their current form.

=Rob


Re: Are Triggers in Cassandra 2.1.2 performace Hog??

2015-01-07 Thread Jonathan Haddad
+1.  Don't use triggers.

On Wed, Jan 7, 2015 at 10:49 AM, Robert Coli rc...@eventbrite.com wrote:
 On Wed, Jan 7, 2015 at 5:40 AM, Asit KAUSHIK asitkaushikno...@gmail.com
 wrote:

 We are trying to integrate elasticsearch with Cassandra and as the river
 plugin uses select * from any table it seems to be bad performance choice.
 So i was thinking of inserting into elasticsearch using Cassandra trigger.
 So i wanted your view does a Cassandra Trigger impacts the performance of
 read/Write of Cassandra.


 I would not use triggers in production in their current form.

 =Rob



-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade