BTW you can give it a go using Amazon EMR, that way you can check it 
without investing in the infra before hand

Ronen

On Tuesday, August 21, 2012 11:02:18 PM UTC+3, matt hoffman wrote:
>
> Great, thanks -- I hadn't looked too closely at Cascalog yet only because 
> I don't currently have the rest of the Hadoop infrastructure. But adding 
> that in isn't out of the question, so I'll definitely look at it more 
> closely.  And I may have underestimated the utility of Cascalog without 
> Hadoop... 
>
>
>
> On Tue, Aug 21, 2012 at 4:38 AM, Sam Ritchie <sritc...@gmail.com<javascript:>
> > wrote:
>
>> Definitely +1 for Cascalog -- I maintain Cascalog, along with Nathan 
>> Marz. Here's the wiki:
>>
>> https://github.com/nathanmarz/cascalog/wiki
>>
>> Head on over to the 
>> cascalog-user<https://groups.google.com/forum/?fromgroups#!forum/cascalog-user>
>>  mailing 
>> list with any questions. Looking forward to seeing you there.
>>
>>
>> On Mon, Aug 20, 2012 at 5:55 PM, ronen <nar...@gmail.com <javascript:>>wrote:
>>
>>> Terabyte size and chain of dependent tasks might hint toward 
>>> Cascalog<https://github.com/nathanmarz/cascalog/wiki> this assumes that 
>>> your doing batch job processing (on top of hadoop) 
>>>
>>> If you need a more soft real time datalog based query then I would check 
>>> datomic <http://www.datomic.com/> although from your description is 
>>> sounds less so.
>>>
>>> Ronen
>>>
>>> On Tuesday, August 21, 2012 3:14:23 AM UTC+3, Leif wrote:
>>>>
>>>> +1.  I know of a couple tools in python for this purpose that are 
>>>> called "workflow management systems."   It would be good to know if there 
>>>> is a robust one in clojure.
>>>>
>>>> On Monday, August 20, 2012 12:18:54 AM UTC-4, matt hoffman wrote:
>>>>>
>>>>> I have a problem that I'm trying to figure out how to tackle. I'm new 
>>>>> to Clojure, but I'm interested, and perhaps this will be my excuse to 
>>>>> give 
>>>>> it a try. Any of the following answers would help:
>>>>> "What you're describing really sounds like X"
>>>>> "You could think of that problem like this, instead"
>>>>> "You may want to search for term 'Y'...it sounds related" (I imagine 
>>>>> I'm probably describing some well-established domain...I just don't know 
>>>>> the right terms to search for)
>>>>>
>>>>> So, the problem:
>>>>> I have an app that is in production doing some fairly complex 
>>>>> calculations on large-ish (terabyte-range) amounts of data.  The 
>>>>> calculations are expressed as chains of dependent tasks, where each tasks 
>>>>> can have a number of inputs and outputs. But the code has become hard to 
>>>>> maintain, full of accidental complexity and very difficult for newer 
>>>>> developers to understand. So, I'm trying to find the right abstractions 
>>>>> to 
>>>>> put in place to keep things simple. 
>>>>> One of the sources of complexity is the intermingling of code 
>>>>> involving loading data, dividing up data to be executed in parallel, 
>>>>> processing data, persisting data, and handling the execution flow on an 
>>>>> individual datum (configuring pipelines of components,etc.) I'd like to 
>>>>> keep the functions pure and push the other concerns off to a framework -- 
>>>>> and, ideally, not have to write that framework. 
>>>>>
>>>>> So I think my problem statement is this: 
>>>>> I'd like to be able to define functions that specify, somehow, what 
>>>>> input they want, and perhaps what output they produce. Then I'd like to 
>>>>> push the concern of how those inputs are calculated -- loaded from a db, 
>>>>> calculated from source data -- off on some other party. 
>>>>>
>>>>> For example, if I define a function that requires "foo", and I call 
>>>>> that function without providing "foo", I'd like for _something_ to step 
>>>>> in 
>>>>> and say, "Ok, you require foo. I have this function over here that 
>>>>> produces 
>>>>> foo. Let me call that for you, then hand you the output."  Perhaps 
>>>>> instead 
>>>>> of a framework that transparently looks up and executes that function and 
>>>>> provides a Future for the result, perhaps I can explicitly build a 
>>>>> dependency graph up-front containing all the functions required to 
>>>>> produce 
>>>>> the end result, and then execute them all in order... I think the effect 
>>>>> is 
>>>>> the same. 
>>>>>
>>>>> From a bit of searching I've done today, dataflow programming like 
>>>>> clojure.contrib.dataflow sounds like it might be close to what I'm 
>>>>> looking 
>>>>> for, but I'd love to hear ideas.   Am I describing something that already 
>>>>> exists?  Would this actually be simpler than it seems using some clever 
>>>>> macros? Are there some keywords I should search for to get started?  Or 
>>>>> perhaps I'm coming at this problem wrong, and I should think about it a 
>>>>> different way...
>>>>>
>>>>>  -- 
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To post to this group, send email to clo...@googlegroups.com<javascript:>
>>> Note that posts from new members are moderated - please be patient with 
>>> your first post.
>>> To unsubscribe from this group, send email to
>>> clojure+u...@googlegroups.com <javascript:>
>>> For more options, visit this group at
>>> http://groups.google.com/group/clojure?hl=en
>>>
>>
>>
>>
>> -- 
>> Sam Ritchie, Twitter Inc
>> 703.662.1337
>> @sritchie
>>
>> (Too brief? Here's why! http://emailcharter.org)
>>
>>  -- 
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to clo...@googlegroups.com<javascript:>
>> Note that posts from new members are moderated - please be patient with 
>> your first post.
>> To unsubscribe from this group, send email to
>> clojure+u...@googlegroups.com <javascript:>
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>>
>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to