I'm writing a package that compute various string distances. Some distances require to compare the set of n-grams (continugous sequences of n characters) for each string. For now, the implementation of these distances is quite slow. I've written the function for jaccard on a gist here <https://goo.gl/S4dkb1>. The function is 10x slower than R stringdist <https://github.com/markvanderloo/stringdist> (written in C and based on binary trees rather than hash tables). Profiling shows that most of time comes from the creation of the Set of q-gram. Can you think of a way to improve its performance?