In general, strcmp() is not implemented via strcmp.c (although if you do a source code search for strcmp, that's what you'll get). Most of the time it's implemented in assembly (strcmp.s) or simply leverages memcmp() where you aren't doing a byte by byte comparison but are doing a native memory word (32 or 64bit) comparison. This makes them super fast.
Once we need to worry about case insensitivity, then we see a whole gamut of implementations; some use a mapped array as I did; some go char by char and call tolower() on each one; some do other things such as testing if isupper() before calling tolower() if needed. The word-based optimizations seem less viable, as seen in test results that I ran and Yann also verified (afaict) In my tests, my impl was faster on OSX and CentOS5 and 6. It's a very common function we use and with my test results it seemed to make sense to provide our own impl, esp if we decided that what we were really concerned about was comparing for equality, and so would be able to avoid the !strcasecmp logic leaping. If we decide that all this was for moot, that's fine. That's what these types of investigations and discussions are for.
