Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3973: add position and occurrence to instr()
......................................................................


Patch Set 6:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/4094/6/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

PS6, Line 311: search_start_pos < 0
remove- this is not possible since we know it's not 0 from l302 and we know 
it's not <0 from l308. it might be worthwhile to put a DCHECK above l316 to 
make sure it's always >= 0 though.


PS6, Line 311: {
             :       return IntVal(0);
this will be short enough to return on the same line


PS6, Line 318:       if (match_pos_in_substring == -1) {
             :         return IntVal(0);
             :       }
1 line


PS6, Line 322: search_start_pos
I think you also need to make sure search_start_pos is < haystack.len before 
the next call to Substring (l316), otherwise that will return invalid memory 
(Substring doesn't check the length).

In fact, since you'll have to do the check before calling 
Substring(search_start_pos) on l316, I think you can remove the entire block on 
l311 that checks search_start_pos.


PS6, Line 330:  search_start_pos >= str.len
This isn't possible, str.len = haystack.len, and we know we added something 
less than 0 to it to get search_start_pos


PS6, Line 330:     if (search_start_pos < 0 || search_start_pos >= str.len) {
             :       return IntVal(0);
             :     }
remove


PS6, Line 336: search_start_pos + needle.len
why do you add the needle len?
I don't understand why it's not just search_start_pos.


PS6, Line 338:       if (match_pos == -1) {
             :         return IntVal(0);
             :       }
1 line


http://gerrit.cloudera.org:8080/#/c/4094/6/be/src/runtime/string-search.h
File be/src/runtime/string-search.h:

PS6, Line 102:     for (int i = 0; i < mlast; ++i) {
             :       BloomAdd(pattern_->ptr[i]);
             :       if (pattern_->ptr[i] == pattern_->ptr[mlast])
             :         skip_ = mlast - i - 1;
             :     }
             :     BloomAdd(pattern_->ptr[mlast]);
I don't think this will always work for RSearch, see 
https://hg.python.org/cpython/file/6b6c79eba944/Objects/stringlib/fastsearch.h 

l195

I think this would need to be computed in reverse.


-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi <[email protected]>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-Reviewer: Zoltan Ivanfi <[email protected]>
Gerrit-HasComments: Yes

Reply via email to