Yeah moving to rocksdb would provide this in a much better way than manually issuing a bunch of deletes. Essentially you would disable compaction in the kafka topic and enable retention in rocksdb. Rocksdb will do the retention as part of the compaction if i understand correctly so it is kind of "for free".
-Jay On Thu, Jan 9, 2014 at 9:57 AM, Chris Riccomini <[email protected]>wrote: > Hey Klaus, > > I don't think anyone has directly tried to address the issue by creating a > different state store. The way we handle it right now is by having your > task implement WindowableTask. You get a window() callback in your task, > which you can configure to run every N milliseconds (task.window.ms). In > this window() method, you can do a range() or all() call for old data, and > delete() it from the store. > > As for running a local Mongo DB, this would work, but we tend to prefer > embeddable databases that can be shipped as part of the Samza job (in the > .tgz file as a .jar, usually). The reason for this preference is that, in > a multi-tenant environment (i.e. jobs running on YARN), it's often not > feasible to run one off software for a job. One job might want Mongo DB, > another Redis, another Memcache, etc. What you end up with on your YARN > cluster is a union of all those things running on every node. This makes > life rough, operationally. > > One thing to look into might be RocksDB. I did a bit of googling, and I > see a couple of mentions of TTL support in it > (https://github.com/facebook/rocksdb/tree/master/utilities/ttl and > https://github.com/facebook/rocksdb/wiki/Time-to-Live), but I haven't gone > any further than that. There are probably also other embeddable TTL DBs > that you could find, as well. > > Cheers, > Chris > > On 1/9/14 8:47 AM, "Klaus Schaefers" <[email protected]> wrote: > > >Hi, > > > >I was digging a little into Samza and saw that the state storage is based > >in LevelDB. This very nice because it is really fast but in my use cases I > >would need some kind of time-to-live variabale attached to a key. Has > >anybody already tried to address this issue by including a different state > >storage like a local Mongo db or so? > > > > > >Cheers, > > > >Klaus > > > > > > > >-- > > > >-- > > > >Klaus Schaefers > >Senior Optimization Manager > > > >Ligatus GmbH > >Hohenstaufenring 30-32 > >D-50674 Köln > > > >Tel.: +49 (0) 221 / 56939 -784 > >Fax: +49 (0) 221 / 56 939 - 599 > >E-Mail: [email protected] > >Web: www.ligatus.de > > > >HRB Köln 56003 > >Geschäftsführung: > >Dipl.-Kaufmann Lars Hasselbach, Dipl.-Kaufmann Klaus Ludemann, > >Dipl.-Wirtschaftsingenieur Arne Wolter > >
