[ 
https://issues.apache.org/jira/browse/TEZ-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352029#comment-14352029
 ] 

Jeff Zhang commented on TEZ-2105:
---------------------------------

bq. It seems that pig is going to be migrated onto tez instead of MR, in order 
to gain higher performance, right? It's a good idea, I think. What I don't know 
is that the exactly part to be done. Is it the compiler part, scheduler part, 
or something else? I have a little experience of Hadoop and can write C++ & 
Java programs. What should I learn apart from the tez & pig project if I want 
to solve this issue?
[~leckie-chn], Tez has been integrated into pig, but there's still lots of work 
to do. Most of work of pig on tez is on the execution layer ( compile physical 
plan to tez job and run it in hadoop ). PIG-3446 is the umbrella jira for the 
initial work of pig on tez. And PIG-3839 is the umbrella jira for the following 
work of performance improvement of pig on tez. If you have any question please 
comment these jiras or ask question on the pig/tez mail list.



> Totally Sorted Edge with auto-parallelism
> -----------------------------------------
>
>                 Key: TEZ-2105
>                 URL: https://issues.apache.org/jira/browse/TEZ-2105
>             Project: Apache Tez
>          Issue Type: New Feature
>            Reporter: Gopal V
>              Labels: gsoc, gsoc2015, hadoop, java, pig, tez
>
> Pig-on-Tez supports an edge configuration using a sampled Output along with a 
> vertex manager  for automatic parallelism estimation.
> This is referred to in the Pig-on-Tez Hadoop Summit presentation.
> http://www.slideshare.net/Hadoop_Summit/pig-on-tez-low-latency-etl-with-big-data/19
> Migrating that plan-model into Tez as a native edge type would allow for much 
> more efficient scheduling of the downstream edges and effectively turn the 
> auto-parallelism implementation into a runtime skew-correcting mechanism 
> within this edge.
> The Tez Edge has enough information to sample, determine partitioning order 
> and correct parallelism.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to