Hey all, We're big users of Flume, but now we're looking to integrate a workflow engine to manage dependencies between data imports, scheduled reports, and intermittent data generation for a Hadoop-based data warehouse & analytics system.
I thought I'd reach out to get the community's opinions: - Do you use Yahoo (Apache) Oozie * What do you think of it? (pros/cons) * Would you recommend it? - Do you use something else? * What do you think of it? (pros/cons) * Would you recommend it? Any suggestions/comments greatly appreciated. I'm reaching out to the flume list, because I'd be especially interested to hear about any bespoke flume integrations the community has built (eg - checking that data from all machines is available before starting a job). -- Matthew Rathbone Foursquare | Software Engineer | Server Engineering Team [email protected] (mailto:[email protected]) | @rathboma (http://twitter.com/rathboma) | 4sq (http://foursquare.com/rathboma)
