On Tue, 14 Oct 2025 21:53:41 GMT, Daniel Hu <[email protected]> wrote:
>> These changes should prevent entire binary files from being loaded into >> memory for build/AbsPathsInImage.java test. I chose a default buffer size of >> 8KB since BufferedInputStream uses that, but open to alternative solutions. >> GHA passes and test passes on linux x64. > > Daniel Hu has updated the pull request incrementally with two additional > commits since the last revision: > > - fix incorrect use of inputstream > - remove extraneous variables/imports test/jdk/build/AbsPathsInImage.java line 110: > 108: } > 109: > 110: List<byte[]> searchPatterns = new ArrayList<>(); Moved `searchPatterns` to a class variable so it's "global" and doesn't need to be passed around (removed it from the various function headers accordingly) test/jdk/build/AbsPathsInImage.java line 187: > 185: } > 186: } > 187: } This is an implementation of the [Knuth–Morris–Pratt algorithm](https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm). I waffled between choosing various string search algorithms, and ultimately settled on this one as it's well suited for this application: the algorithm doesn't skip or backtrack the text being searched and only relies looking at one char at a time. Essentially, the algorithm just creates state machines tailored specifically for string searching; upon each inputted char, an update is made to the searched string's DFA. A related algorithm I looked at that's technically better suited is [Aho–Corasick](https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm)? But (to me) it looked significantly more complex (probably requires custom node objects and might have a higher space requirement due to needing to store references to other nodes per node). Of course, I'm not intimately familiar with these algorithms, so I very might be missing cert ain details. `createPrefixTables` function is the pre-processing done on the string patterns that creates the "state machines". Basically, it just creates a table of all the available prefixes that the string can jump back to at any current index. `getPrefixIndex` is used both in `createPrefixTables` (the preprocessing) and the actual string matching process itself (in `scanBytes`). It's the failure function or it finds the appropriate index/state/prefix to go back to when a mismatch is found (it can be used for the preprocessing since the preprocessing is simply finding an index/prefix to jump back to at each index). test/jdk/build/AbsPathsInImage.java line 281: > 279: break; > 280: } > 281: } Basically, we go through every search pattern per inputstream (file) byte and update the corresponding states accordingly. If we encounter a match, we add it to a list of matches. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26030#discussion_r2452852549 PR Review Comment: https://git.openjdk.org/jdk/pull/26030#discussion_r2452848960 PR Review Comment: https://git.openjdk.org/jdk/pull/26030#discussion_r2452860759
