----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/10712/#review19568 -----------------------------------------------------------
I didn't get a chance to go through the whole thing -- here are just some initial comments. ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java <https://reviews.apache.org/r/10712/#comment40418> should be totalLength not totalLenght ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReader.java <https://reviews.apache.org/r/10712/#comment40420> the previous argument and the return value are always of class VectorizedRowBatch so why not declare them to be VectorizedRowBatch instead of Object? ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java <https://reviews.apache.org/r/10712/#comment40461> put parens around expression on right side of =, for clarity ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java <https://reviews.apache.org/r/10712/#comment40462> if noNulls is false, that means there are nulls, so you can't set noNulls back to true, can you? ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java <https://reviews.apache.org/r/10712/#comment40423> You should never have to create a new column vector here because there should always be one passed down into this code as part of the batch. Consider assuming there will always be a vector. The code will just fail if there isn't one. ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java <https://reviews.apache.org/r/10712/#comment40463> It appears you are replacing one isNull array (the one passed in) with another. We should be writing null information into the same Boolean array over and over. Am I missing something? ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java <https://reviews.apache.org/r/10712/#comment40424> this code for IntTreeReader appears identical to the ShortTreeReader case. Can the code be shared, e.g. by inheriting from super-class? Since this is a vector, consider calling it nextVector ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java <https://reviews.apache.org/r/10712/#comment40425> same comment -- can the code be shared for all int types? ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java <https://reviews.apache.org/r/10712/#comment40474> for float/double, vector entries are supposed to be set to NaN for null values. ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java <https://reviews.apache.org/r/10712/#comment40478> I think this could incorrectly decide something isRepeating if there are nulls, the way the code is now. But using NaN for nulls should prevent this. I think this isRepeating check is probably worth it for performance but it might not really pay off. It might be worth a performance test sometime in the future to determine this. ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java <https://reviews.apache.org/r/10712/#comment40486> I'd said that NULL string entries should be set to the empty string in the vector. I'm not sure if any code depends on that though. So this is probably okay to not set null entries. ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestVectorizedORCReader.java <https://reviews.apache.org/r/10712/#comment40427> Please add comment to say what is the purpose of the file created ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestVectorizedORCReader.java <https://reviews.apache.org/r/10712/#comment40489> Should create the batch up here, not use a null batch. Or at least have a second test that creates the batch outside and passes it down. ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestVectorizedORCReader.java <https://reviews.apache.org/r/10712/#comment40490> style guidelines say put a blank line in front of all comments ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestVectorizedORCReader.java <https://reviews.apache.org/r/10712/#comment40445> I think this should take and return VectorizedRowBatch since it would not make sense for it to be another type -- the method name even says "Batch" ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestVectorizedORCReader.java <https://reviews.apache.org/r/10712/#comment40488> should add another test with some null values - Eric Hanson On April 22, 2013, 10:26 p.m., Sarvesh Sakalanaga wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/10712/ > ----------------------------------------------------------- > > (Updated April 22, 2013, 10:26 p.m.) > > > Review request for hive. > > > Description > ------- > > The patch contains changes to ORC reader to return a batch of rows instead of > a row. A new method called nextBatch() is added to ORC reader and tree > readers of ORC. Currently only int,long,short,double,float,string and struct > support batch processing. > > > This addresses bug HIVE-4370. > https://issues.apache.org/jira/browse/HIVE-4370 > > > Diffs > ----- > > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java > 246170d > ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java fc4e53b > ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReader.java 05240ce > ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java d044cd8 > ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReader.java > 2825c64 > ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestVectorizedORCReader.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/10712/diff/ > > > Testing > ------- > > > Thanks, > > Sarvesh Sakalanaga > >