Pavan Lanka created ORC-622:
-------------------------------

             Summary: Refactoring of TreeReader into TypeReader and BatchReader
                 Key: ORC-622
                 URL: https://issues.apache.org/jira/browse/ORC-622
             Project: ORC
          Issue Type: Improvement
          Components: Java, Reader
            Reporter: Pavan Lanka


The org.apache.orc.impl.TreeReaderFactory.TreeReader class is playing two 
functions:
 # *Type Read*: Activities that deal with the read of a particular type that 
finally results in the population of the vector.
 # *Batch Read*: This is invoked on the Type Reader where the Type Reader is as 
determined by the Root Type. In this case the activities are about the 
population of the vectors into the VectorizedRowBatch e.g. Ignoring the columns 
that are the partition fields, setting the batch size, etc

This request proposes that these functions be separated into distinct 
interfaces. Separating the more generic Batch functions away from the Type 
specific functions allows enhancements to TypeReaders to focus purely on type 
functions without having to deal with the Batch related functions.

In addition the request also proposes that certain methods and classes within 
impl package be made public with the understanding that classes within the impl 
package are internal.
* TreeReader.checkEncoding
* TreeReader.startStripe
* TreeReader.skipRows
* StreamInformation
This enables the use of these classes and methods without having to clobber the 
impl package.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to