Github user xndai commented on a diff in the pull request: https://github.com/apache/orc/pull/247#discussion_r181530787 --- Diff: site/specification/ORCv2.md --- @@ -0,0 +1,1032 @@ +--- +layout: page +title: Evolving Draft for ORC Specification v2 +--- + +This specification is rapidly evolving and should only be used for +developers on the project. + +# TO DO items --- End diff -- Is this a final list of v2 or we are still working on it? I have one proposal to add to ORC v2, which is what I call "clustered index". Basically the writer can specify a sorting property on one or more columns, then we create an index section in ORC file with keys being the column(s) value and the value is the row number. To reduce the size of index, each row group has one entry in the clustered index. This will enable new range scan pattern when reader provides upper bound and lower bound of column(s) values. I can write up a detailed proposal for this.
---