[
https://issues.apache.org/jira/browse/TEZ-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352029#comment-14352029
]
Jeff Zhang commented on TEZ-2105:
---------------------------------
bq. It seems that pig is going to be migrated onto tez instead of MR, in order
to gain higher performance, right? It's a good idea, I think. What I don't know
is that the exactly part to be done. Is it the compiler part, scheduler part,
or something else? I have a little experience of Hadoop and can write C++ &
Java programs. What should I learn apart from the tez & pig project if I want
to solve this issue?
[~leckie-chn], Tez has been integrated into pig, but there's still lots of work
to do. Most of work of pig on tez is on the execution layer ( compile physical
plan to tez job and run it in hadoop ). PIG-3446 is the umbrella jira for the
initial work of pig on tez. And PIG-3839 is the umbrella jira for the following
work of performance improvement of pig on tez. If you have any question please
comment these jiras or ask question on the pig/tez mail list.
> Totally Sorted Edge with auto-parallelism
> -----------------------------------------
>
> Key: TEZ-2105
> URL: https://issues.apache.org/jira/browse/TEZ-2105
> Project: Apache Tez
> Issue Type: New Feature
> Reporter: Gopal V
> Labels: gsoc, gsoc2015, hadoop, java, pig, tez
>
> Pig-on-Tez supports an edge configuration using a sampled Output along with a
> vertex manager for automatic parallelism estimation.
> This is referred to in the Pig-on-Tez Hadoop Summit presentation.
> http://www.slideshare.net/Hadoop_Summit/pig-on-tez-low-latency-etl-with-big-data/19
> Migrating that plan-model into Tez as a native edge type would allow for much
> more efficient scheduling of the downstream edges and effectively turn the
> auto-parallelism implementation into a runtime skew-correcting mechanism
> within this edge.
> The Tez Edge has enough information to sample, determine partitioning order
> and correct parallelism.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)