Hello! Thanks for great command line tool.
I have a large regexp, which works very slow. I take a profiler, and it shows, that bottleneck is 'insert function', which is called too often from this code during file matching, and makes O(N^2)
assigmend, O(N) memory reallocatons:
-------------------
for (j = 0; j < d->states[0].elems.nelem; ++j) {
insert(d->states[0].elems.elems[j], &follows); }
-------------------
I have a patch, which makes dfa-states merge in dfa.c faster. Also I rewrite
this code, using priority-queue data-structure:
-------------------
/* Find the union of the follows of the positions of the group. This is a
hideously inefficient loop. Fix it someday. */
for (j = 0; j < grps[i].nelem; ++j)
for (k = 0; k < d->follows[grps[i].elems[j]].nelem; ++k)
insert (d->follows[grps[i].elems[j]].elems[k], &follows);
-------------------
Will it be usefull for you?
How I can properly send/commit it for review?
Currently I have copied the patch to http://pastebin.com/p6TbvM8Y and
https://gist.github.com/dobrokot/6331243
it need little work for proper styling and optimization shortcut for common
cases when merging 2 sets or only one set(i.e. copy).
Thanks for attention, Ivan Yanikov.