Hi All, I’ve finished the design and prototype of TEZ-1190, which is about allowing multiple edges between same pair of vertices. Please help review the design when you get time.
Both Hive and Pig call for this feature: Hive need this for efficient implementation of bloomfilter join, and Pig can use this to merge two small pipelines into one pair of vertices. The main challenge is how to support multi-edges while keeping backward compatibility. I propose to follow the named edge approach, which is to allow user to name each edge instead of using source vertex name as edge name, so that multi-edge could be achieved by giving unique names to multiple edges between same pair of vertices. Regarding to compatibility, we could still use source vertex name as default edge name for existing DAGs. After investigation on existing APIs, I find this approach won’t break any of existing APIs either in signature or semantics so existing DAGs should just work fine. More details could be found in design doc and POC patch which are already put on jira. Your comments and suggestions are welcome. Thanks! Zhiyuan
