Re: Review Request 17005: Vectorized reader for DECIMAL datatype for ORC format.

Jitendra Pandey Fri, 24 Jan 2014 18:03:50 -0800


> On Jan. 20, 2014, 6:56 p.m., Eric Hanson wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java, line 
> > 1119
> > <https://reviews.apache.org/r/17005/diff/1/?file=425358#file425358line1119>
> >
> >     It seems odd that we're reading from a scaleStream because the scale 
> > should be the same for every value in the column. Is this necessary?
> >     
> >


  The orc decimal encoding currently supports arbitrary scale. Although, hive 
doesn't allow variable scales, the orc format allows it. We should have another 
decimal encoding in hive optimized for specific precision and scale, and 
correspondingly we will have to add additional vectorized reader as well for 
decimal. 
  Since the reader is part of ORC code, I think it should also allow reading 
variable scales as per the encoding. If that doesn't match the scale in the 
schema, then we definitely have a data/schema corruption issue.


> On Jan. 20, 2014, 6:56 p.m., Eric Hanson wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java, line 
> > 1123
> > <https://reviews.apache.org/r/17005/diff/1/?file=425358#file425358line1123>
> >
> >     If any scale values are different inside a single DecimalColumnVector, 
> > I think that could cause unpredictable or wrong results. 
> >     
> >     Later operations on DecimalColumnVector take the scale from the 
> > columnvector sometimes, not each individual object.

If the scale in the data is different from the scale assumed in the vectorized 
reader, we would still have erroneous results. 


- Jitendra


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17005/#review32299
-----------------------------------------------------------


On Jan. 24, 2014, 10:28 p.m., Jitendra Pandey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17005/
> -----------------------------------------------------------
> 
> (Updated Jan. 24, 2014, 10:28 p.m.)
> 
> 
> Review request for hive and Eric Hanson.
> 
> 
> Bugs: HIVE-6178
>     https://issues.apache.org/jira/browse/HIVE-6178
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> vectorized reader for DECIMAL datatype for ORC format.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/common/type/Decimal128.java 3939511 
>   common/src/java/org/apache/hadoop/hive/common/type/UnsignedInt128.java 
> d71ebb3 
>   common/src/test/org/apache/hadoop/hive/common/type/TestUnsignedInt128.java 
> fbb2aa0 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/DecimalColumnVector.java 
> 23564bb 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java 0df82b9 
>   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestVectorizedORCReader.java 
> 0d5b7ff 
> 
> Diff: https://reviews.apache.org/r/17005/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jitendra Pandey
> 
>

Re: Review Request 17005: Vectorized reader for DECIMAL datatype for ORC format.

Reply via email to