On Friday, 27 May 2016 at 19:17:44 UTC, Chris wrote:
Oops, I've been found out. :-) Thanks. You're right of course, and I've already noticed that bug as it fails on not found. I got the bounds wrong.

I had the same "bug" when I wrote my search function on the project at work. I also found out that using the simplest loop is the generally the best way to go. The processors nowaday have loop-caches, fast level1 caches etc, all the fancy search algorithms like Boyer-Moore etc. can only overcome their memory and initialisation overhead for really huge haystacks.

Here's the C code of my memmem() function that replaced the Boyer-Moore I had before. This implementation outran the BM search for sizes up to several hundreds megaytes on SPARC64 VII+ machine. On x86-64 I'm sure it would even be worse.

void * memmem(const void *haystack, size_t haystack_len, const void *needle, size_t needle_len)
{
  if(!haystack_len || !needle_len || needle_len > haystack_len)
    return NULL;

  uint8_t u8 = *((uint8_t *)needle);
  const uint8_t *p = (const uint8_t *) haystack;
  for(size_t i = haystack_len - needle_len + 1; i; --i) {
    if(*p == u8 && !memcmp(p, needle, needle_len))
      return (void *)p;
    else
      p++;
  }
  return NULL;
}

(sorry for posting C code, as I'm only beginning to learn D).

Reply via email to