Anupam, I have spent some time looking over the code snippet posted and will try my best to address your questions. But I do have a foundational question that will help guide our continued conversations.
- What role and feature benefits do you see Tez taking on and fulfilling for Scope? Pig and Hive have taken the approach that Tez will be use both construct the the DAG to and to provide the runtime execution for which to run the DAG in a YARN environment running on a hadoop FileSystem API compatible file system (like HDFS). In your reference example, I see you are overriding AbstractLogicalInput. This is an approach that would be used to make ScopeInput a native type to Tez as opposed to using input and output plugins to read scope input without changing Tez code base. I wonder if this is intentional or not. If so, I apologize for the lacking documentation that might have led you astray. By attempting to make ScopeInput a native type to Tez, it takes a direction different than Pig or Hive (or Flink or Cascading for that matter). From an integration perspective, this places a very large amount of work of developers working on Scope integration as well as possibly making contributions troublesome. Once we understand the environment and role Tez is to play in Scope, we (the community) would be happy to help guide you towards Scope integration. For reference, here are the links to Pig and Hive. From there have a look These code snippets below (as well as Tez) are Apache License v2 so please only read if that is possible. https://github.com/apache/pig/tree/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/ Hive also works in a mixed YARN/LLAP runtime environment. That might or might now be the best example to look at unless it matches the Scope case as well. Posting only the non-llap code path for example. https://github.com/apache/hive/tree/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ Regards, jeagles