[ 
https://issues.apache.org/jira/browse/CRUNCH-470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14130743#comment-14130743
 ] 

Micah Whitacre commented on CRUNCH-470:
---------------------------------------

I'm not understanding what the mini cluster pipeline would actually be doing 
different.  The actual pipeline code would be the same but the only change 
would be values inside of the Configuration object passed to the pipeline.

So the workflow would be:

1. Setup minicluster (either as YARN or MR depending on what you need)
2. Retrieve configuration from minicluster
3. Pass configuration to MRPipeline and run

Are you proposing that a single pipeline would do all of that?  If so that 
would only be for testing purposes and in that case for faster and more stable 
tests you would actually want to use a single minicluster across all your 
tests.  So having a pipeline spin one up and tear down for each run would make 
your tests run considerably slower.

> Add hdfs/yarn minicluster crunch pipeline
> -----------------------------------------
>
>                 Key: CRUNCH-470
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-470
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.3
>            Reporter: Rafal Wojdyla
>            Assignee: Josh Wills
>            Priority: Minor
>
> Crunch currently has two pipelines:
> * MemPipeline
> * MRPipeline
> MemPipeline is in-memory pipelines based on local in-memory mapreduce mode.
> MRPipeline is distributed pipeline based on distributed MapReduce.
> Using HDFS/YARN Minicluster it's possible to better emulate Hadoop cluster, 
> and it could be a 'final test' before running on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to