[jira] [Issue Comment Edited] (CASSANDRA-1311) Triggers

Brian ONeill (Issue Comment Edited) (JIRA) Sun, 08 Apr 2012 18:57:43 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13249647#comment-13249647
 ]


Brian ONeill edited comment on CASSANDRA-1311 at 4/9/12 1:56 AM:
-----------------------------------------------------------------

Agreed.  I don't think we should include REST in the formal API either, just 
offering that up as a design pattern for those that need to do more than you 
can fit in a little javascript snippet.  

We are heavy in performance/stress testing right now.  And we now have two 
models working: one where we use synchronous triggers (prior to write), and 
another where triggers execute asynchronously after write.  Both are useful for 
different things. (asynch where we can't slow down the actual write -- e.g. 
user interactions, and synch when we need to integrity)  

Additionally, we see a need for two levels of guarantees.  For some of the 
triggers, we don't really care if the trigger failed, because we can rely on a 
regular map/reduce job to "cleanup" any failed trigger executions.   We'd 
rather not have the overhead of a CSCL even.  The system just needs to execute 
the trigger for us (if it can).  If it fails, oh well.  

For other jobs, (synchronous or asynchronous) we need to know when we are in a 
bad state.  i.e. we need to know if the data is ever out of synch with a 
side-effect of a trigger.  For these scenarios, the overhead of the CSCL is 
acceptable. We can see failed trigger executions even in the event of a crash.  
(e.g. those log entries left in a PENDING state > some acceptable time period 
are considered failed and we need to go rectify the situation).  

Unless there are transactional semantics, I think it suffices to have three 
interception points:
# Pre-mutation synchronous (blocking until trigger execution completes)
#* Trigger can add additional mutations
#** (additional columns to a row "in-transaction" seems useful)
#* Trigger can fail the operation 
#** (quality/integrity checks)
# Post-mutation synchronous
#* Upon failure, we can signal "trigger failure" to the client suggesting 
retry, but it doesn't fail the actual operation 
#** (since its already happened, and we don't want to add rollback)
# Post-mutation asynchronous
#* No influence on write (obviously), but need to be guaranteed trigger 
executes, or know when it has not.

For each of these, I think there are two levels of guarantees, either:
# You don't necessarily care if ALL executions were successful, you'd rather be 
fast 
#* (e.g. statistics / analytics that need to be "close-enough")
# You absolutely need to know if data changed and a trigger was unsuccessful in 
processing that mutation.

random thoughts,
-brian



                
      was (Author: boneill):
    Agreed.  I don't think we should include REST in the formal API either, 
just offering that up as a design pattern for those that need to do more than 
you can fit in a little javascript snippet.  

We are heavy in performance/stress testing right now.  And we now have two 
models working: one where we use synchronous triggers (prior to write), and 
another where triggers execute asynchronously after write.  Both are useful for 
different things. (asynch where we can't slow down the actual write -- e.g. 
user interactions, and synch when we need to integrity)  

Additionally, we see a need for two levels of guarantees.  For some of the 
triggers, we don't really care if the trigger failed, because we can rely on a 
regular map/reduce job to "cleanup" any failed trigger executions.   We'd 
rather not have the overhead of a CSCL even.  The system just needs to execute 
the trigger for us (if it can).  If it fails, oh well.  

For other jobs, (synchronous or asynchronous) we need to know when we are in a 
bad state.  i.e. we need to know if the data is ever out of synch with a 
side-effect of a trigger.  For these scenarios, the overhead of the CSCL is 
acceptable. We can see failed trigger executions even in the event of a crash.  
(e.g. those log entries left in a PENDING state > some acceptable time period 
are considered failed and we need to go rectify the situation).  

Unless there are transactional semantics, I think it suffices to have three 
interception points:
1) Pre-mutation synchronous (blocking until trigger execution completes)
   - Trigger can add additional mutations
     (additional columns to a row "in-transaction" seems useful)
   - Trigger can fail the operation 
     (quality/integrity checks)
2) Post-mutation synchronous
   - Upon failure, we can signal "trigger failure" to the client suggesting 
retry, but it doesn't fail the actual operation 
     (since its already happened, and we don't want to add rollback)
3) Post-mutation asynchronous
   - No influence on write (obviously), but need to be guaranteed trigger 
executes, or know when it has not.

For each of these, I think there are two levels of guarantees, either:
1) You don't necessarily care if ALL executions were successful, you'd rather 
be fast 
   (e.g. statistics / analytics that need to be "close-enough")
2) You absolutely need to know if data changed and a trigger was unsuccessful 
in processing that mutation.

random thoughts,
-brian



                  
> Triggers
> --------
>
>                 Key: CASSANDRA-1311
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1311
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Maxim Grinev
>             Fix For: 1.2
>
>         Attachments: HOWTO-PatchAndRunTriggerExample-update1.txt, 
> HOWTO-PatchAndRunTriggerExample.txt, ImplementationDetails-update1.pdf, 
> ImplementationDetails.pdf, trunk-967053.txt, trunk-984391-update1.txt, 
> trunk-984391-update2.txt
>
>
> Asynchronous triggers is a basic mechanism to implement various use cases of 
> asynchronous execution of application code at database side. For example to 
> support indexes and materialized views, online analytics, push-based data 
> propagation.
> Please find the motivation, triggers description and list of applications:
> http://maxgrinev.com/2010/07/23/extending-cassandra-with-asynchronous-triggers/
> An example of using triggers for indexing:
> http://maxgrinev.com/2010/07/23/managing-indexes-in-cassandra-using-async-triggers/
> Implementation details are attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-1311) Triggers

Reply via email to