[
https://issues.apache.org/jira/browse/TEZ-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bikas Saha updated TEZ-1528:
----------------------------
Summary: Support for Cross-Data-Center/Geo-Distributed DAG execution (was:
Native support for multi cluster aggregations)
> Support for Cross-Data-Center/Geo-Distributed DAG execution
> -----------------------------------------------------------
>
> Key: TEZ-1528
> URL: https://issues.apache.org/jira/browse/TEZ-1528
> Project: Apache Tez
> Issue Type: New Feature
> Reporter: Arun C Murthy
> Assignee: Bikas Saha
>
> Increasingly, data-sets are partitioned across clusters due to legal or
> operational considerations. An e.g. is a 'customer-activity' table with
> partitions for the same 'date', but sub-partitions located in the clusters
> across which raw data cannot be moved/copied due to legal considerations.
> It would be nice to have Tez support aggregations across these clusters by
> providing native support for cross-cluster 'sub-dags' (think auto transform
> of mapper-reducer to mapper-combiner-reducer split across clusters), 'edge'
> with *strict* limits on data-transfer across clusters etc. Providing such a
> primitive would make it relatively easier for Hive, Pig etc. to provide SQL
> queries, ETL applications etc. across clusters. Limits on data-transfer are
> very important - we should support only transfer of aggregates, joins across
> clusters is an anti-goal.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)