[
https://issues.apache.org/jira/browse/TEZ-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14351946#comment-14351946
]
Yan Ni edited comment on TEZ-2105 at 3/8/15 9:10 AM:
-----------------------------------------------------
Hi, I'm from China and new to ASF and gsoc. I looked up in the issue list and
found this one interesting. I went over the slide and the project page of tez &
pig. It seems that pig is going to be migrated onto tez instead of MR, in order
to gain higher performance, right? It's a good idea, I think. What I don't know
is that the exactly part to be done. Is it the compiler part, scheduler part,
or something else? I have a little experience of Hadoop and can write C++ &
Java programs. What should I learn apart from the tez & pig project if I want
to solve this issue?
was (Author: leckie-chn):
Hi, I'm from China and new to ASF and gsoc. I looked up in the issue list and
found this one interesting. I went over the slide and the project page of tez &
pig. It seems that pig is going to be migrated onto tez instead of MR, in order
to gain higher performance, right? It's a good idea, I think. What I don't know
is that the exactly part to be done. Is it the compiler part, scheduler part,
or something else? I have a little experience of Hadoop and can write C++ &
Java programs. What should I learn apart from the tez & pig project if I want
to solve this issue?
> Totally Sorted Edge with auto-parallelism
> -----------------------------------------
>
> Key: TEZ-2105
> URL: https://issues.apache.org/jira/browse/TEZ-2105
> Project: Apache Tez
> Issue Type: New Feature
> Reporter: Gopal V
> Labels: gsoc, gsoc2015, hadoop, java, pig, tez
>
> Pig-on-Tez supports an edge configuration using a sampled Output along with a
> vertex manager for automatic parallelism estimation.
> This is referred to in the Pig-on-Tez Hadoop Summit presentation.
> http://www.slideshare.net/Hadoop_Summit/pig-on-tez-low-latency-etl-with-big-data/19
> Migrating that plan-model into Tez as a native edge type would allow for much
> more efficient scheduling of the downstream edges and effectively turn the
> auto-parallelism implementation into a runtime skew-correcting mechanism
> within this edge.
> The Tez Edge has enough information to sample, determine partitioning order
> and correct parallelism.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)