[ 
https://issues.apache.org/jira/browse/TEZ-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955426#comment-14955426
 ] 

Bikas Saha commented on TEZ-1692:
---------------------------------

can you please double check that hive still builds against this after the 
classes have been moved around. hive uses some of this code in their custom 
initializer.

typo - getWRack()?

I think I had mentioned in some other jira that we already have a wrapper 
(SplitHolder) that could be used to unify the code. If I understand the patch 
right, we could move up the logic of SplitHolder into the base SplitContainer. 
That way we could avoid an extra object allocation per split (right now we 
allocated SplitHolder and SplitContainer -> which would become SplitContainer). 
Would that work?

The unused constructors in TezGroupedSplit(s) can be removed?

If would be really useful if we could use the grouper to group a large number 
(say a million splits into 10K groups) and verify that we dont have any cpu or 
memory regression. I dont expect there to be any but in these things its best 
to measure than opine :)


> Reduce code duplication between TezMapredSplitsGrouper and 
> TezMapreduceSplitsGrouper
> ------------------------------------------------------------------------------------
>
>                 Key: TEZ-1692
>                 URL: https://issues.apache.org/jira/browse/TEZ-1692
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>         Attachments: TEZ-1692.1.txt, TEZ-1692.2.txt
>
>
> The two are almost identical - with lots of repeated logic. The main 
> difference being the mapred / mapreduce InputSplit being grouped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to