On 30/03/2019 23.59, Sultan Alsawaf wrote: > How can the memcmps cross a page boundary when memcmp itself will > only read in large buffers of data at word boundaries?
Consider your patch replacing !strcmp(buf, "123") by !memcmp(buf, "123", 4). buf is known to point to a nul-terminated string. But it may point at, say, the second-last byte in a page, with the last byte in that page being a nul byte, and the following page being MMIO or unmapped or all kinds of bad things. On e.g. x86 where unaligned accesses are cheap, and seeing that you're only comparing for equality, gcc is likely to compile the memcmp version into *(u32*)buf == 0x00333231 because you've told the compiler that there's no problem accessing four bytes starting at buf. Boom. Even without unaligned access being cheap this can happen; suppose the length is 8 instead, and gcc somehow knows that buf is four-byte aligned (and in this case it happens to point four bytes before a page boundary), so it could compile the memcmp(,,8) into *(u32*)(buf+4) == secondword && *(u32*)buf == firstword (or do the comparisons in the "natural" order, but it might still do both loads first). > And if there are concerns for some arches but not others, then couldn't this > be > a feasible optimization for those which would work well with it? No. First, these are concerns for all arches. Second, if you can find some particular place where string parsing/matching is in any way performance relevant and not just done once during driver init or whatnot, maybe the maintainers of that file would take a patch hand-optimizing some strcmps to memcmps, or, depending on what the code does, perhaps replacing the whole *cmp logic with a custom hash table. But a patch implicitly and silently touching thousands of lines of code, without an analysis of why none of the above is a problem for any of those lines, for any .config, arch, compiler version? No. Rasmus