Github user xndai commented on a diff in the pull request:

    https://github.com/apache/orc/pull/247#discussion_r181530787
  
    --- Diff: site/specification/ORCv2.md ---
    @@ -0,0 +1,1032 @@
    +---
    +layout: page
    +title: Evolving Draft for ORC Specification v2
    +---
    +
    +This specification is rapidly evolving and should only be used for
    +developers on the project.
    +
    +# TO DO items
    --- End diff --
    
    Is this a final list of v2 or we are still working on it? I have one 
proposal to add to ORC v2, which is what I call "clustered index". Basically 
the writer can specify a sorting property on one or more columns, then we 
create an index section in ORC file with keys being the column(s) value and the 
value is the row number. To reduce the size of index, each row group has one 
entry in the clustered index. This will enable new range scan pattern when 
reader provides upper bound and lower bound of column(s) values. 
    
    I can write up a detailed proposal for this.


---

Reply via email to