[ https://issues.apache.org/jira/browse/ORC-339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437501#comment-16437501 ]
ASF GitHub Bot commented on ORC-339: ------------------------------------ Github user omalley commented on a diff in the pull request: https://github.com/apache/orc/pull/247#discussion_r181437863 --- Diff: site/specification/ORCv2.md --- @@ -0,0 +1,1032 @@ +--- +layout: page +title: Evolving Draft for ORC Specification v2 +--- + +This specification is rapidly evolving and should only be used for +developers on the project. + +# TO DO items + +The list of things that we plan to change: + +* Create a decimal representation with fixed scale using rle. +* Create a better float/double encoding that splits mantissa and + exponent. +* Create a dictionary encoding for float, double, and decimal. +* Create RLEv3: + * 64 and 128 bit variants + * Zero suppression + * Evaluate the rle subformats +* Group stripe data into stripelets to enable Async IO for reads. +* Reorder stripe data into (stripe metadata, index, dictionary, data) +* Stop sorting dictionaries and record the sort order separately in the index. +* Remove use of RLEv1 and RLEv2. +* Remove non-utf8 bloom filter. +* Use numeric value for decimal bloom filter. --- End diff -- Agreed > Reorganize ORC specification > ---------------------------- > > Key: ORC-339 > URL: https://issues.apache.org/jira/browse/ORC-339 > Project: ORC > Issue Type: Improvement > Reporter: Owen O'Malley > Assignee: Owen O'Malley > Priority: Major > > Currently we've put the ORC format specification in the documentation. Now > that we are starting the work to design ORCv2, it will be more convenient to > have each file format version as a separate page. -- This message was sent by Atlassian JIRA (v7.6.3#76005)