Github user xndai commented on a diff in the pull request:
https://github.com/apache/orc/pull/247#discussion_r181530787
--- Diff: site/specification/ORCv2.md ---
@@ -0,0 +1,1032 @@
+---
+layout: page
+title: Evolving Draft for ORC Specification v2
+---
+
+This specification is rapidly evolving and should only be used for
+developers on the project.
+
+# TO DO items
--- End diff --
Is this a final list of v2 or we are still working on it? I have one
proposal to add to ORC v2, which is what I call "clustered index". Basically
the writer can specify a sorting property on one or more columns, then we
create an index section in ORC file with keys being the column(s) value and the
value is the row number. To reduce the size of index, each row group has one
entry in the clustered index. This will enable new range scan pattern when
reader provides upper bound and lower bound of column(s) values.
I can write up a detailed proposal for this.
---