[GitHub] [nifi-minifi-cpp] fgerlits commented on a diff in pull request #1310: MINIFICPP-1806 - Use boyer_moore for extension verification

GitBox Thu, 12 May 2022 06:00:26 -0700


fgerlits commented on code in PR #1310:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1310#discussion_r871348747



##########
libminifi/src/utils/file/FileUtils.cpp:
##########
@@ -49,41 +52,21 @@ uint64_t computeChecksum(const std::string &file_name, 
uint64_t up_to_position)
 }
 
 bool contains(const std::filesystem::path& file_path, std::string_view 
text_to_search) {
-  gsl_Expects(text_to_search.size() <= 8192);
+  gsl_Expects(text_to_search.size() <= 8_KiB);
   gsl_ExpectsAudit(std::filesystem::exists(file_path));
-  std::array<char, 8192> buf1{};
-  std::array<char, 8192> buf2{};
-  gsl::span<char> left = buf1;
-  gsl::span<char> right = buf2;
-
-  const auto charat = [&](size_t idx) {
-    if (idx < left.size()) {
-      return left[idx];
-    } else if (idx < left.size() + right.size()) {
-      return right[idx - left.size()];
-    } else {
-      return '\0';
-    }
-  };
-  const auto check_range = [&](size_t start, size_t end) -> size_t {
-    for (size_t i = start; i < end; ++i) {
-      size_t j{};
-      for (j = 0; j < text_to_search.size(); ++j) {
-        if (charat(i + j) != text_to_search[j]) break;
-      }
-      if (j == text_to_search.size()) return true;
-    }
-    return false;
-  };
+  std::array<char, 16_KiB> buf{};
+
+  Searcher searcher(text_to_search.begin(), text_to_search.end());
 
   std::ifstream ifs{file_path, std::ios::binary};
-  ifs.read(right.data(), gsl::narrow<std::streamsize>(right.size()));
   do {
-    std::swap(left, right);
-    ifs.read(right.data(), gsl::narrow<std::streamsize>(right.size()));
-    if (check_range(0, left.size())) return true;
+    std::copy(buf.end() - text_to_search.size(), buf.end(), buf.begin());
+    ifs.read(buf.data() + text_to_search.size(), buf.size() - 
text_to_search.size());
+    if (std::search(buf.begin(), buf.end(), searcher) != buf.end()) {
+      return true;
+    }
   } while (ifs);
-  return check_range(left.size(), left.size() + right.size());
+  return std::search(buf.begin(), buf.end(), searcher) != buf.end();

Review Comment:
   After the last chunk is read, if it is shorter than the capacity of the 
buffer, `buf` will contain bytes remaining from the previous chunk at the end.  
Is it possible that e.g `text_to_search` is `"abcde"`, and it is found 
incorrectly (false positive) because the end of the file is `"abc"`, and it is 
followed by `"de"` in the buffer because that happened to be there from the 
previous chunk?  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [nifi-minifi-cpp] fgerlits commented on a diff in pull request #1310: MINIFICPP-1806 - Use boyer_moore for extension verification

Reply via email to