Proof of concept: kwset running 3 times smaller

Nick Cleaton Fri, 01 Oct 2010 23:03:28 -0700

Hi,

I've been playing around with src/kwset.c, and I've found a way to
reduce the memory footprint when matching by a factor of 3 and to
improve locality of reference: collect and sort all of the reversed
keywords before starting on the trie build, allowing the trie layout to
be planned in advance and reducing the number of pointers required.


This gives a speedup on some benchmarks with large numbers of keywords,
for example matching 32-byte random hex keywords against random hex
input is just over twice as fast in my tests, which use keyword counts
between 100k and 1M.  The setup time is also reduced by about a factor
of 2.

I've yet to find a benchmark in which the modified code is measurably
slower than the original at match time.  I have found one case where it
takes about twice as long at prep time: 8M short keywords such that each
adds a single trie node, arranged in an order that gives good locality
of reference to the original trie building code but does not include any
pre-sorted runs of reversed keywords.

Proof of concept patches follow.  All feedback welcome, particularly
test/benchmark results on hardware other than i386.

If some variant of this idea is judged suitable to go into gnu grep,
then I expect the first step would be to add more tests for kwset.


Nick

Proof of concept: kwset running 3 times smaller

Reply via email to