Thanks Josh - that's great. I'll file a JIRA about the side-outputs feature, but the pipeline.run() call will serve my purpose for now.
Cheers, Dave On 15 January 2013 18:03, Josh Wills <[email protected]> wrote: > Hey Dave, > > The way to force a sequential run would be to call pipeline.run() after > you write D to HDFS and before you declare the operations in step 6. What > we would really want here is a single MapReduce job that wrote side outputs > on the map side to create the dataset in step D, but we don't have support > for side-outputs in maps yet. Worth filing a JIRA, I think. > > Thanks! > Josh >
