Owen O'Malley created HIVE-3874: ----------------------------------- Summary: Create a new Optimized Row Columnar file format for Hive Key: HIVE-3874 URL: https://issues.apache.org/jira/browse/HIVE-3874 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley
There are several limitations of the current RC File format that I'd like to address by creating a new format: * each column value is stored as a binary blob, which means: ** the entire column value must be read, decompressed, and deserialized ** the file format can't use smarter type-specific compression ** push down filters can't be evaluated * the start of each row group needs to be found by scanning * user metadata can only be added to the file when the file is created * the file doesn't store the number of rows per a file or row group * there is no mechanism for seeking to a particular row number, which is required for external indexes. * there is no mechanism for storing light weight indexes within the file to enable push-down filters to skip entire row groups. * the type of the rows aren't stored in the file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira