[jira] [Resolved] (CRUNCH-296) Support new distributed execution engines (e.g., Spark)

Josh Wills (JIRA) Wed, 11 Dec 2013 12:50:23 -0800

     [ 
https://issues.apache.org/jira/browse/CRUNCH-296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Josh Wills resolved CRUNCH-296.
-------------------------------

       Resolution: Fixed
    Fix Version/s: 0.9.0

Ridiculously huge patch committed.

> Support new distributed execution engines (e.g., Spark)
> -------------------------------------------------------
>
>                 Key: CRUNCH-296
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-296
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Josh Wills
>            Assignee: Josh Wills
>             Fix For: 0.9.0
>
>         Attachments: CRUNCH-296.patch, CRUNCH-296b.patch, CRUNCH-296c.patch, 
> CRUNCH-296d.patch, CRUNCH-296d.patch
>
>
> I've been working on this off-and-on for awhile, but it's currently in a 
> state where I feel like it's worth sharing: I came up with an implementation 
> of the Crunch APIs that runs on top of Apache Spark instead of MapReduce.
> My goal for this is pretty simple; I want to be able to change any instances 
> of "new MRPipeline(...)" to "new SparkPipeline(...)", not change anything 
> else at all, and have my pipelines run on Spark instead of as a series of MR 
> jobs. Turns out that we can pretty much do exactly that. Not everything works 
> yet, but lots of things do-- joins and cogroups work, the PageRank and TfIdf 
> integration tests work. Some things that do not work that I'm aware of: 
> in-memory joins and some of the more complex file output handling rules, but 
> I believe that these things are fixable. Some thing that might work or might 
> not: HBase inputs and outputs on top of Spark.
> This is just an idea I had, and I would understand if other people don't want 
> to work on this or don't think it's the right direction for the project. My 
> minimal request would be to include the refactoring of the core APIs 
> necessary to support plugging in new execution frameworks so I can keep 
> working on this stuff.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Resolved] (CRUNCH-296) Support new distributed execution engines (e.g., Spark)

Reply via email to