[
https://issues.apache.org/jira/browse/TEZ-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292192#comment-14292192
]
Gopal V edited comment on TEZ-1993 at 1/26/15 6:32 PM:
-------------------------------------------------------
No, because inheritance is not shimmable. You need visitor patterns instead
here.
And because of that this cannot apply to any other InputFormat that generates
FileSplit (which is not going to be a sub-class of TezInputSplit).
was (Author: gopalv):
No, because inheritance is not shimmable.
And because of that this cannot apply to any other InputFormat that generates
FileSplit (which is not going to be a sub-class of TezInputSplit).
> Implement a pluggable InputSizeEstimator for grouping fairly
> ------------------------------------------------------------
>
> Key: TEZ-1993
> URL: https://issues.apache.org/jira/browse/TEZ-1993
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.7.0
> Reporter: Gopal V
> Assignee: Gopal V
> Attachments: TEZ-1993.1.patch
>
>
> Split grouping is currently done using a file size measurement which is the
> exact size of the split as it stays at rest on HDFS.
> This is not valid for columnar formats and especially suffers from highly
> compressible data skews.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)