----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11106/#review20512 -----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java <https://reviews.apache.org/r/11106/#comment42276> please add javadoc comment for purpose of class ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java <https://reviews.apache.org/r/11106/#comment42278> put comment for why this is here ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java <https://reviews.apache.org/r/11106/#comment42280> explain more clearly what this function does ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java <https://reviews.apache.org/r/11106/#comment42281> if you use a negative start index -n, the existing Hive code (non-vectorized) seems to take the tail end total n characters. e.g. substr("foo", -2) is "oo". If you use -n and n is greater than the string length, the output is the empty string. Please handle this case with the same behavior as non vectorized hive. ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java <https://reviews.apache.org/r/11106/#comment42282> please run ant checkstyle and follow suggestions, e.g. there is no blank after if before ( ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java <https://reviews.apache.org/r/11106/#comment42283> also set output value to empty string if output is null outV.noNulls needs to get set for every case and doesn't get set here ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java <https://reviews.apache.org/r/11106/#comment42289> len[0] - offset could be negative. Do you need to use len[0] - (start[0] - offset)? Make sure you have unit tests for case where start != 0. ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java <https://reviews.apache.org/r/11106/#comment42296> I've heard that if the common case is the first case, things can run faster. You could reverse these and make the test for offset <> -1. ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStartLen.java <https://reviews.apache.org/r/11106/#comment42305> need to handle negative start index case. It appears your code could get array out of bounds in that case. Also, for substrLength <= 0, result should be empty string ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStartLen.java <https://reviews.apache.org/r/11106/#comment42308> should set isRepeating to false for default case ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStartLen.java <https://reviews.apache.org/r/11106/#comment42306> need to check noNulls first. If noNulls then you can't look into isNull array or you could see invalid data ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStartLen.java <https://reviews.apache.org/r/11106/#comment42309> Is this output supposed to be null or empty string? My add-hoc tests seemed to show it was empty string. You should double check. ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStartLen.java <https://reviews.apache.org/r/11106/#comment42310> need to set outV.isNull[I] to true or false always, unless you are setting outV.noNulls to true. ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorStringExpressions.java <https://reviews.apache.org/r/11106/#comment42311> second argument to VEctorizedRowBatch constructor should not be use (it defaults to correct value) ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorStringExpressions.java <https://reviews.apache.org/r/11106/#comment42314> argument is not needed -- use default constructor with no args ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorStringExpressions.java <https://reviews.apache.org/r/11106/#comment42312> you need to test for some data with multi-byte characters. There is an example of that someplace else in the tests. ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorStringExpressions.java <https://reviews.apache.org/r/11106/#comment42315> need to try to test the case where data start position is not 0 ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorStringExpressions.java <https://reviews.apache.org/r/11106/#comment42313> need to verify the other rows besides 0 are always set to not null. The isNull entries for them could have been true by chance from a previous use of the batch - Eric Hanson On May 13, 2013, 9:54 p.m., Timothy Chen wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/11106/ > ----------------------------------------------------------- > > (Updated May 13, 2013, 9:54 p.m.) > > > Review request for hive. > > > Description > ------- > > Add Vectorized Substr > > > This addresses bug HIVE-4495. > https://issues.apache.org/jira/browse/HIVE-4495 > > > Diffs > ----- > > > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStartLen.java > PRE-CREATION > > ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorStringExpressions.java > 6e26412 > > Diff: https://reviews.apache.org/r/11106/diff/ > > > Testing > ------- > > > Thanks, > > Timothy Chen > >