[
https://issues.apache.org/jira/browse/TEZ-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14254349#comment-14254349
]
Bikas Saha commented on TEZ-1608:
---------------------------------
Overall looks fine. Did not look closely at the impl for performance
implications.
This may need a fix similar to TEZ-1852. Get examples to work in Local Mode.
Would be good to have a test in TestTezJobs to ensure that this continues to
work as expected and other changes are made.
> TopK example
> ------------
>
> Key: TEZ-1608
> URL: https://issues.apache.org/jira/browse/TEZ-1608
> Project: Apache Tez
> Issue Type: Sub-task
> Affects Versions: 0.5.0
> Reporter: Janos Matyas
> Assignee: Krisztian Horvath
> Attachments: TEZ-1608-1.patch, TEZ-1608-2.patch, TEZ-1608-3.patch
>
>
> The goal of this sample is to find the topK elements of a dataset, while
> guiding through the basics of Tez (DAG creation, tokenizers, custom
> comparators and parallelism).
> An example use case for top K:
> Given a large data set in CSV format of user comments on a site listed as:
> userid,postid,commentid,comment,timestamp and we are looking for the top K
> commenter or the posts with the most comment.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)