Gopal V created TEZ-1993:
----------------------------
Summary: Implement a pluggable InputSizeEstimator for grouping
fairly
Key: TEZ-1993
URL: https://issues.apache.org/jira/browse/TEZ-1993
Project: Apache Tez
Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Gopal V
Assignee: Gopal V
Split grouping is currently done using a file size measurement which is the
exact size of the split as it stays at rest on HDFS.
This is not valid for columnar formats and especially suffers from highly
compressible data skews.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)