Thank you all for your suggestion, The suggestions you gave me, are more on the "how" should I develop my app side, and not "what" can I use instead of building an app of my own
Going over Cascalog, Cascading and pig, I didn't find exactly what I need. I need a batch that periodically runs and samples folders for data, if it finds data there, it takes the data and transforms it according to preset transformations, I want to be able to change the transformations easily. The transformations that the data should go through, are determined by the directory it came from or a pattern in the data itself. This app sounds very similar to flume, beside the fact it digest the entire data that has arrived in one map/reduce. -----Original Message----- From: Chris K Wensel [mailto:[email protected]] Sent: Wednesday, February 16, 2011 10:18 PM To: [email protected] Subject: Re: DataCreator > was thinking of using cascading, but cascading, requires me for each change > in the data flow, to recompile and deploy. Maybe cascading can be part of the > implementation but not the solution. Cascading is well suited for this. Multitool was written with Cascading, you can spawn reasonably complex filtering, conversion, and joins from the command line (no recompiling). Amazon promotes this for searching S3 buckets from EMR. Cascading.JRuby allows you to creating complex jobs from a jruby script, no compiling. Etsy uses this for their web site funnel analysis. Cascalog is much more sophisticated, and can be driven from a Clojure shell (repl), obviously no compiling there either. Quite a few companies use this to power their analytics and analysis. all of which can be found here http://www.cascading.org/modules.html And a number of companies have built proprietary web UI's to Hadoop with Cascading as the query planner and processing engine. Some of which will ship as products this year. fyi, there will be a Cascalog workshop this Saturday (I'll be attending) http://www.cascading.org/2011/02/cascalog-workshop-february-19t.html cheers, chris -- Chris K Wensel [email protected] http://www.concurrentinc.com
