[
https://issues.apache.org/jira/browse/TEZ-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14956459#comment-14956459
]
Siddharth Seth edited comment on TEZ-1692 at 10/14/15 8:02 AM:
---------------------------------------------------------------
Test failures are unrelated.
bq. Memory wise I think we should be at parity with the latest patch. Likely
also with cpu. But like I said, this can only be measured.
Trying to measure patches for memory/cpu/perf without any solid basis will end
up wasting time for everyone. This patch adds wrappers around existing code.
Since there's no specific suspicions on memory/cpu increase - I don't think a
perf test is required.
That said, I got some numbers as part of testing TEZ-2879, and there's no
noticeable difference in runtime.
If there's no other concerns, I'll commit the patch and get 2879 moving.
A test with random splits is the simplest way to measure something like this -
have created TEZ-2892 for a grouping micro benchmark. I'm sure there's other
cases for such tests as well.
was (Author: sseth):
Test failures are unrelated.
bq. Memory wise I think we should be at parity with the latest patch. Likely
also with cpu. But like I said, this can only be measured.
Trying to measure patches for memory/cpu/perf without any solid basis will end
up wasting time for everyone. This patch adds wrappers around existing code.
Since there's no specific suspicions on memory/cpu increase - I don't think a
perf test is required.
That said, I got some numbers as part of testing TEZ-2892, and there's no
noticeable difference in runtime.
If there's no other concerns, I'll commit the patch and get 2892 moving.
A test with random splits is the simplest way to measure something like this -
have created TEZ-2892 for a grouping micro benchmark. I'm sure there's other
cases for such tests as well.
> Reduce code duplication between TezMapredSplitsGrouper and
> TezMapreduceSplitsGrouper
> ------------------------------------------------------------------------------------
>
> Key: TEZ-1692
> URL: https://issues.apache.org/jira/browse/TEZ-1692
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Siddharth Seth
> Assignee: Siddharth Seth
> Attachments: TEZ-1692.1.txt, TEZ-1692.2.txt, TEZ-1692.3.txt
>
>
> The two are almost identical - with lots of repeated logic. The main
> difference being the mapred / mapreduce InputSplit being grouped.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)