BTW you can give it a go using Amazon EMR, that way you can check it without investing in the infra before hand
Ronen On Tuesday, August 21, 2012 11:02:18 PM UTC+3, matt hoffman wrote: > > Great, thanks -- I hadn't looked too closely at Cascalog yet only because > I don't currently have the rest of the Hadoop infrastructure. But adding > that in isn't out of the question, so I'll definitely look at it more > closely. And I may have underestimated the utility of Cascalog without > Hadoop... > > > > On Tue, Aug 21, 2012 at 4:38 AM, Sam Ritchie <sritc...@gmail.com<javascript:> > > wrote: > >> Definitely +1 for Cascalog -- I maintain Cascalog, along with Nathan >> Marz. Here's the wiki: >> >> https://github.com/nathanmarz/cascalog/wiki >> >> Head on over to the >> cascalog-user<https://groups.google.com/forum/?fromgroups#!forum/cascalog-user> >> mailing >> list with any questions. Looking forward to seeing you there. >> >> >> On Mon, Aug 20, 2012 at 5:55 PM, ronen <nar...@gmail.com <javascript:>>wrote: >> >>> Terabyte size and chain of dependent tasks might hint toward >>> Cascalog<https://github.com/nathanmarz/cascalog/wiki> this assumes that >>> your doing batch job processing (on top of hadoop) >>> >>> If you need a more soft real time datalog based query then I would check >>> datomic <http://www.datomic.com/> although from your description is >>> sounds less so. >>> >>> Ronen >>> >>> On Tuesday, August 21, 2012 3:14:23 AM UTC+3, Leif wrote: >>>> >>>> +1. I know of a couple tools in python for this purpose that are >>>> called "workflow management systems." It would be good to know if there >>>> is a robust one in clojure. >>>> >>>> On Monday, August 20, 2012 12:18:54 AM UTC-4, matt hoffman wrote: >>>>> >>>>> I have a problem that I'm trying to figure out how to tackle. I'm new >>>>> to Clojure, but I'm interested, and perhaps this will be my excuse to >>>>> give >>>>> it a try. Any of the following answers would help: >>>>> "What you're describing really sounds like X" >>>>> "You could think of that problem like this, instead" >>>>> "You may want to search for term 'Y'...it sounds related" (I imagine >>>>> I'm probably describing some well-established domain...I just don't know >>>>> the right terms to search for) >>>>> >>>>> So, the problem: >>>>> I have an app that is in production doing some fairly complex >>>>> calculations on large-ish (terabyte-range) amounts of data. The >>>>> calculations are expressed as chains of dependent tasks, where each tasks >>>>> can have a number of inputs and outputs. But the code has become hard to >>>>> maintain, full of accidental complexity and very difficult for newer >>>>> developers to understand. So, I'm trying to find the right abstractions >>>>> to >>>>> put in place to keep things simple. >>>>> One of the sources of complexity is the intermingling of code >>>>> involving loading data, dividing up data to be executed in parallel, >>>>> processing data, persisting data, and handling the execution flow on an >>>>> individual datum (configuring pipelines of components,etc.) I'd like to >>>>> keep the functions pure and push the other concerns off to a framework -- >>>>> and, ideally, not have to write that framework. >>>>> >>>>> So I think my problem statement is this: >>>>> I'd like to be able to define functions that specify, somehow, what >>>>> input they want, and perhaps what output they produce. Then I'd like to >>>>> push the concern of how those inputs are calculated -- loaded from a db, >>>>> calculated from source data -- off on some other party. >>>>> >>>>> For example, if I define a function that requires "foo", and I call >>>>> that function without providing "foo", I'd like for _something_ to step >>>>> in >>>>> and say, "Ok, you require foo. I have this function over here that >>>>> produces >>>>> foo. Let me call that for you, then hand you the output." Perhaps >>>>> instead >>>>> of a framework that transparently looks up and executes that function and >>>>> provides a Future for the result, perhaps I can explicitly build a >>>>> dependency graph up-front containing all the functions required to >>>>> produce >>>>> the end result, and then execute them all in order... I think the effect >>>>> is >>>>> the same. >>>>> >>>>> From a bit of searching I've done today, dataflow programming like >>>>> clojure.contrib.dataflow sounds like it might be close to what I'm >>>>> looking >>>>> for, but I'd love to hear ideas. Am I describing something that already >>>>> exists? Would this actually be simpler than it seems using some clever >>>>> macros? Are there some keywords I should search for to get started? Or >>>>> perhaps I'm coming at this problem wrong, and I should think about it a >>>>> different way... >>>>> >>>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Clojure" group. >>> To post to this group, send email to clo...@googlegroups.com<javascript:> >>> Note that posts from new members are moderated - please be patient with >>> your first post. >>> To unsubscribe from this group, send email to >>> clojure+u...@googlegroups.com <javascript:> >>> For more options, visit this group at >>> http://groups.google.com/group/clojure?hl=en >>> >> >> >> >> -- >> Sam Ritchie, Twitter Inc >> 703.662.1337 >> @sritchie >> >> (Too brief? Here's why! http://emailcharter.org) >> >> -- >> You received this message because you are subscribed to the Google >> Groups "Clojure" group. >> To post to this group, send email to clo...@googlegroups.com<javascript:> >> Note that posts from new members are moderated - please be patient with >> your first post. >> To unsubscribe from this group, send email to >> clojure+u...@googlegroups.com <javascript:> >> For more options, visit this group at >> http://groups.google.com/group/clojure?hl=en >> > > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en