Zoltan Ivanfi has posted comments on this change.

Change subject: IMPALA-3973: add position and occurrence to instr()
......................................................................


Patch Set 4:

> As we discussed in person our internal strings aren't \0
 > terminated. Let's try using StringFunctions::Reverse() and the
 > InStr implementation we already have and see how fast that is.

To recap our conversation in a little more detail, using the std::string 
constructor that takes a length parameter in addition to the char* param would 
solve the problem that StringValues are not NULL-terminated, but you were also 
worried about the copying and the standard search implementation.

The copying could be easily avoided by using std::search with reverse 
iterators, as suggested here: http://stackoverflow.com/a/1634416 and creating 
the reverse iterators directly on the char*, like suggested here: 
http://stackoverflow.com/a/22488428.

In my opinion this would result in the cleanest solution, but you were also 
worried about using the standard C++ search, which is around four times slower 
than our internal search algorithm that unfortunately only supports searching 
forwards.

According to the comments in the source code of our internal search algorithm, 
it is based on Python's search. I checked this latter and found that reverse 
search have been added to it recently. Since our implementations have diverged 
significantly, it is not possible to use the new Python search code, but it can 
be seen that the reverse search functionality is a trivial modification of the 
regular search, so I added a reverse search variant as well. I also added tests 
for it as well as for our regular search, which did not have tests. The name of 
the new test collided with the name of an existing test that was misnamed, so I 
renamed that. Finally I implemented instr() on top of these bloom-filter-based 
search algorithms.

-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi <[email protected]>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Zoltan Ivanfi <[email protected]>
Gerrit-HasComments: No

Reply via email to