Thanks for playing with it. When you say you are trying to make the MRRSleep job be pure Tez are you intent on removing the map processor and reduce processor and writing your own processor?
You are right that Processors represent actual computations. However, they do need to be able to send control plane information back to the AM for basic things like progress and advanced things like data for some user code vertex manager that determines the properties for the next vertex. Hence some subset of the umbilical or some reference context that connects the processor to the umbilical is necessary to be exposed to the processors. Currently we are using a mix of MRTask, MapProcessor etc. to achieve the end goal because we wanted to get MR based functionality working asap to give real world benefits. The API's and separation of concerns have not been cleanly established in that part of the code. We ideally want YarnTezDAGChild (the main Tez shell) to be able to instantiate processors and pass them some context object by which they can communicate essential information back to the control plane. We are not there yet. Which is why we haven't been able to write a multi-input multi-output processor yet. Its on the agenda and becoming increasingly important. Would be great if you can provide a list of weirdness and issues that you have discovered that will serve as a feedback for us when we clean this part up. Even better if you want to help us clean it up. Bikas -----Original Message----- From: Mark Wagner [mailto:[email protected]] Sent: Tuesday, August 13, 2013 9:29 PM To: [email protected] Subject: A few questions on the APIs Hey everyone, I've been playing with the MRRSleep example to familiarize myself with Tez. I've been trying to remove all the map and reduce parts to make it "pure" Tez as an exercise, but I'm a bit hung up on the roles of Processors and Tasks. It seems like they serve very similar roles. My expectation was that Tasks would handle all the start-up and coordination with the DAG AM, while the processors are more user-facing and would mostly focus on the actual computation (given that the processor can be specified via the DAG APIs). But it looks like MRTask (which is ultimately extended as a Map or Reduce Processor) does sends completion notifications to the AM with the umbilical. Is there a good guideline as to what the responsibilities of Processors and Tasks are and where the separation is? Thanks for the insights, Mark -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
