Zoltan Ivanfi has posted comments on this change.

Change subject: IMPALA-3973: add position and occurrence to instr()
......................................................................


Patch Set 5:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 2063:   // Hive returns 0 if substr was not found in str (or on other 
error coditions).
> What happens if you pass other types as the needle or haystack, e.g. int or
The analysis of the query will complain that there is no function with a 
matching signature:

[localhost:21000] > select instr("1", 2, "3");
ERROR: AnalysisException: No matching function with signature: instr(STRING, 
TINYINT, STRING).


http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

Line 22: #include <stdint.h>
> are the includes still needed?
Done


PS5, Line 289:   if (str.is_null || substr.is_null) return IntVal::null();
             :   StringValue str_sv = StringValue::FromStringVal(str);
             :   StringValue substr_sv = StringValue::FromStringVal(substr);
             :   StringSearch search(&substr_sv);
             :   // Hive returns positions starting from 1.
             :   return IntVal(search.Search(&str_sv) + 1);
> Will replacing this with a call to the more general version of Instr() come
I would expect a minor, probably insignificant performance impact. (On the 
other, we should be aware that in this case we will replace a well-proven 
implementation with a freshly written one, so if there were any bugs in the new 
implementation, the impact of those would become bigger.)


PS5, Line 299: occurance
> spelling
Done


PS5, Line 316: no
> nit: use 'num' to abbreviate number
Done


PS5, Line 329: allowed match
> what does "allowed match" mean?
When we are searching from the right, there are two "current" positions. One is 
for leftmost char of the match we are looking for, the other for the rightmost 
char of it.

When the user specifies -5 as an input, it means that the leftmost char of the 
pattern must be before the 5th char from the right. So the actual match may be 
partially after the limit.

If you can suggest a better description, please let me know.


http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/runtime/string-search-test.cc
File be/src/runtime/string-search-test.cc:

Line 28:   if (len == -1) {
> single line
This is matter of taste, the style guide allows both variant. I like this one 
more, because it is consistent with multi-line conditional block and it is easy 
to add new lines.


PS5, Line 69: Haystack is not full string
> This comment could be clearer, Haystack is a full string. Maybe rephrase "T
I didn't write the test cases, they are almost verbatim copies from 
https://github.com/cloudera/Impala/blob/cdh5-trunk/be/src/experiments/string-search-test.cc#L98


PS5, Line 70: 5
> What is the difference between this and just specifying "abcab" as the hays
As I noted in the other comment, I don't know why each individual test case was 
added, I just reused existing test-cases for strstr functionality. My guess is 
that by taking a prefix of a StringVal, it tests whether the strstr 
implementation respects the length stored in the StringValue or also reads 
parts of the string which should be excluded.


Line 81:   // Test last bit, this is the interesting edge case
> What does "last bit" mean? Can you add a comment why this is an interesting
I guess last bit means last part. I'm not sure why it's interesting, it wasn't 
added by me.


http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/runtime/string-search.h
File be/src/runtime/string-search.h:

Line 204:         }
> Can we advance i by skip_ here, too? Is this what the comment in line 139 r
I don't know what line 139 refers to, I did not write that comment. I don't 
think i can be advanced by skip_ here either. As I have written, I created this 
function in the following way:

Our fast search is based on Python's fast search. Python did not support 
reverse search when it was added to our codebase, but now it does. Therefore I 
compared Python's reverse search to its regular search, then applied the same 
changes to our regular search, thereby creating this function. The python 
implementation is linked to in a comment in this file: 
http://hg.python.org/cpython/file/6b6c79eba944/Objects/stringlib/fastsearch.h


-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi <[email protected]>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Zoltan Ivanfi <[email protected]>
Gerrit-HasComments: Yes

Reply via email to