Gopal V created HIVE-7428: ----------------------------- Summary: OrcSplit fails to account for columnar projections in its size estimates Key: HIVE-7428 URL: https://issues.apache.org/jira/browse/HIVE-7428 Project: Hive Issue Type: Bug Reporter: Gopal V
Currently, ORC generates splits based on stripe offset + stripe length. This means that the splits for all columnar projections are exactly the same size, despite reading the footer which gives the estimated sizes for each column. This is a hold-out from FileSplit which uses getLen() as the I/O cost of reading a file in a map-task. RCFile didn't have a footer with column statistics information, but for ORC this would be extremely useful to reduce task overheads when processing extremely wide tables with highly selective column projections. -- This message was sent by Atlassian JIRA (v6.2#6252)