Github user wgtmac commented on a diff in the pull request: https://github.com/apache/orc/pull/247#discussion_r181239251 --- Diff: site/specification/ORCv2.md --- @@ -0,0 +1,1032 @@ +--- +layout: page +title: Evolving Draft for ORC Specification v2 +--- + +This specification is rapidly evolving and should only be used for +developers on the project. + +# TO DO items + +The list of things that we plan to change: + +* Create a decimal representation with fixed scale using rle. +* Create a better float/double encoding that splits mantissa and + exponent. +* Create a dictionary encoding for float, double, and decimal. +* Create RLEv3: + * 64 and 128 bit variants + * Zero suppression + * Evaluate the rle subformats +* Group stripe data into stripelets to enable Async IO for reads. +* Reorder stripe data into (stripe metadata, index, dictionary, data) +* Stop sorting dictionaries and record the sort order separately in the index. +* Remove use of RLEv1 and RLEv2. +* Remove non-utf8 bloom filter. +* Use numeric value for decimal bloom filter. --- End diff -- We may also use numeric value for decimal column statistics
---