Hi All,

I’ve finished the design and prototype of TEZ-1190, which is about allowing 
multiple edges between same pair of vertices. Please help review the design 
when you get time. 

Both Hive and Pig call for this feature: Hive need this for efficient 
implementation of bloomfilter join, and Pig can use this to merge two small 
pipelines into one pair of vertices.

The main challenge is how to support multi-edges while keeping backward 
compatibility. I propose to follow the named edge approach, which is to allow 
user to name each edge instead of using source vertex name as edge name, so 
that multi-edge could be achieved by giving unique names to multiple edges 
between same pair of vertices. Regarding to compatibility, we could still use 
source vertex name as default edge name for existing DAGs. After investigation 
on existing APIs, I find this approach won’t break any of existing APIs either 
in signature or semantics so existing DAGs should just work fine.

More details could be found in design doc and POC patch which are already put 
on jira. Your comments and suggestions are welcome.

Thanks!
Zhiyuan

Reply via email to