On Friday, 7 October 2022 at 07:16:19 UTC, Siarhei Siamashka
wrote:
On Friday, 7 October 2022 at 06:34:50 UTC, Siarhei Siamashka
wrote:
Also are we allowed to artificially construct needle and
haystack to blow up this test rather than only benchmarking it
on typical real data?
Such as generating the input data via running:
python -c "print(('a' * 49 + 'b') * 20000)" > test.lst
And then using
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" (the
character 'a' replicated 50 times) as the needle to search for.
Much longer needles work even better. In Linux the command line
size is limited by 128K, so there's a huge room for improvement.
https://www.cs.utexas.edu/users/moore/best-ideas/string-searching/
"the longer the pattern is, the faster the algorithm goes"