Tim, I think this can also be solved as follows
1. Have a HDFS output operator before the Cassandra Operator. This hdfs operator writes the tuples to hdfs based on file size/ rolling window as user want. It sends the file location to downstream Cassandra Operator 2. The Cassandra operator takes these files and upload these files once the window Id in which the file was seen is committed in a separate thread something like what reconciler does. Does that help? Thanks - Gaurav > On Dec 17, 2015, at 9:47 AM, Timothy Farkas <[email protected]> wrote: > > Hi All, > > One of our users is outputting to Cassandra, but they want to handle a > Cassandra failure or Cassandra down time gracefully from an output > operator. Currently a lot of our database operators will just fail and > redeploy continually until the database comes back. This is a bad idea for > a couple of reasons: > > 1 - We rely on buffer server spooling to prevent data loss. If the database > is down for a long time (several hours or a day) we may run out of space to > spool for buffer server since it spools to local disk, and data is purged > only after a window is committed. Furthermore this buffer server problem > will exist for all the Streaming Containers in the dag, not just the one > immediately upstream from the output operator, since data is spooled to > disk for all operators and only removed for windows once a window is > committed. > > 2 - If there is another failure further upstream in the dag, upstream > operators will be redeployed to a checkpoint less than or equal to the > checkpoint of the database operator in the At leas once case. This could > mean redoing several hours or a day worth of computation. > > We should support a mechanism to detect when the connection to a database > is lost and then spool to hdfs using a WAL, and then write the contents of > the WAL into the database once it comes back online. This will save the > local disk space of all the nodes used in the dag and allow it to be used > for only the data being output to the output operator. > > Ticket here if anyone is interested in working on it: > > https://malhar.atlassian.net/browse/MLHR-1951 > > Thanks, > Tim
