(SIMD) Optimized multi-byte chunk scanning

Nordlöw via Digitalmars-d-learn Wed, 23 Aug 2017 15:11:05 -0700

I recall seeing some C/C++/D code that optimizes the comment- andwhitespace-skipping parts (tokens) of lexers by operating on 2, 4or 8-byte chunks instead of single-byte chunks. This in the casewhen token-terminators are expressed as sets of (alternative)ASCII-characters.

For instance, when searching for the end of a line comment, Iwould like to speed up the while-loop in


    size_t offset;
    string input = "// \n"; // a line-comment string
    import std.algorithm : among;
    // until end-of-line or file terminator
    while (!input[offset].among!('\0', '\n', '\r')
    {
        ++offset;
    }

by taking `offset`-steps larger than one.

Note that my file reading function that creates the real `input`,appends a '\0' at the end to enable sentinel-based search asshown in the call to `among` above.

I further recall that there are x86_64 intrinsics that can beused here for further speedups.


Refs, anyone?

(SIMD) Optimized multi-byte chunk scanning

Reply via email to