Zoltan Ivanfi has posted comments on this change. Change subject: IMPALA-3973: add position and occurrence to instr() ......................................................................
Patch Set 4: > As we discussed in person our internal strings aren't \0 > terminated. Let's try using StringFunctions::Reverse() and the > InStr implementation we already have and see how fast that is. To recap our conversation in a little more detail, using the std::string constructor that takes a length parameter in addition to the char* param would solve the problem that StringValues are not NULL-terminated, but you were also worried about the copying and the standard search implementation. The copying could be easily avoided by using std::search with reverse iterators, as suggested here: http://stackoverflow.com/a/1634416 and creating the reverse iterators directly on the char*, like suggested here: http://stackoverflow.com/a/22488428. In my opinion this would result in the cleanest solution, but you were also worried about using the standard C++ search, which is around four times slower than our internal search algorithm that unfortunately only supports searching forwards. According to the comments in the source code of our internal search algorithm, it is based on Python's search. I checked this latter and found that reverse search have been added to it recently. Since our implementations have diverged significantly, it is not possible to use the new Python search code, but it can be seen that the reverse search functionality is a trivial modification of the regular search, so I added a reverse search variant as well. I also added tests for it as well as for our regular search, which did not have tests. The name of the new test collided with the name of an existing test that was misnamed, so I renamed that. Finally I implemented instr() on top of these bloom-filter-based search algorithms. -- To view, visit http://gerrit.cloudera.org:8080/4094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi <[email protected]> Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Zoltan Ivanfi <[email protected]> Gerrit-HasComments: No
