Gopal V created TEZ-1993:
----------------------------

             Summary: Implement a pluggable InputSizeEstimator for grouping 
fairly
                 Key: TEZ-1993
                 URL: https://issues.apache.org/jira/browse/TEZ-1993
             Project: Apache Tez
          Issue Type: Bug
    Affects Versions: 0.7.0
            Reporter: Gopal V
            Assignee: Gopal V


Split grouping is currently done using a file size measurement which is the 
exact size of the split as it stays at rest on HDFS.

This is not valid for columnar formats and especially suffers from highly 
compressible data skews.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to