Thanks guys, I remember now that that was done for Hive. Could someone take a look at implementing it for Cassandra Output?
On Thu, Dec 17, 2015 at 11:13 AM, Pramod Immaneni <[email protected]> wrote: > Tim we have a pattern for this called Reconciler that Gaurav has also > mentioned. There are some examples for it in Malhar > > On Thu, Dec 17, 2015 at 9:47 AM, Timothy Farkas <[email protected]> > wrote: > > > Hi All, > > > > One of our users is outputting to Cassandra, but they want to handle a > > Cassandra failure or Cassandra down time gracefully from an output > > operator. Currently a lot of our database operators will just fail and > > redeploy continually until the database comes back. This is a bad idea > for > > a couple of reasons: > > > > 1 - We rely on buffer server spooling to prevent data loss. If the > database > > is down for a long time (several hours or a day) we may run out of space > to > > spool for buffer server since it spools to local disk, and data is purged > > only after a window is committed. Furthermore this buffer server problem > > will exist for all the Streaming Containers in the dag, not just the one > > immediately upstream from the output operator, since data is spooled to > > disk for all operators and only removed for windows once a window is > > committed. > > > > 2 - If there is another failure further upstream in the dag, upstream > > operators will be redeployed to a checkpoint less than or equal to the > > checkpoint of the database operator in the At leas once case. This could > > mean redoing several hours or a day worth of computation. > > > > We should support a mechanism to detect when the connection to a database > > is lost and then spool to hdfs using a WAL, and then write the contents > of > > the WAL into the database once it comes back online. This will save the > > local disk space of all the nodes used in the dag and allow it to be used > > for only the data being output to the output operator. > > > > Ticket here if anyone is interested in working on it: > > > > https://malhar.atlassian.net/browse/MLHR-1951 > > > > Thanks, > > Tim > > >
