Package: wnpp
Severity: wishlist
Owner: Mikhail Potemkin <[email protected]>

* Package name    : golang-github-bobusumisu-aho-corasick
  Version         : 1.0.3-1
  Upstream Author : Øyvind Ingvaldsen
* URL             : https://github.com/BobuSumisu/aho-corasick
* License         : MIT
  Programming Lang: Go
  Description     : Aho-Corasick string-searching algorithm in Go

 Aho-Corasick
 .
 Build Status (https://travis-ci.com/BobuSumisu/aho-corasick) [Image: Go
 Version] (https://img.shields.io/github/go-mod/go-version/BobuSumisu/aho-
 corasick) [Image: Latest Tag]
 (https://img.shields.io/github/v/tag/BobuSumisu/aho-corasick)
 .
 Implementation of the Aho-Corasick string-search algorithm in Go.
 .
 Licensed under MIT License.
 .
 Details
 .
 This implementation does not use a Double-Array Trie
 (https://linux.thai.net/~thep/datrie/datrie.html) as in my
 implementation (https://github.com/BobuSumisu/go-ahocorasick) from a
 couple of years back.
 .
 This reduces the build time drastically, but at the cost of higher
 memory consumption.
 .
 The search time is still fast, and comparable to other Go
 implementations I have found on github that claims to be fast (see
 performance).
 .
 Documentation
 .
 Can be found at godoc.org (https://godoc.org/github.com/BobuSumisu/aho-
 corasick).
 .
 Example Usage
 .
 Use a TrieBuilder to build a Trie:
 .
   trie := NewTrieBuilder().
       AddStrings([]string{"or", "amet"}).
       Build()
 .
 Then go and match something interesting:
 .
   matches := trie.MatchString("Lorem ipsum dolor sit amet, consectetur
 adipiscing elit.")
   fmt.Printf("Got %d matches.\n", len(matches))
 .
   // => Got 3 matches.
 .
 What did we match?
 .
   for _, match := range matches {
       fmt.Printf("Matched pattern %d %q at position %d.\n",
 match.Match(),
           match.Pattern(), match.Pos())
   }
 .
   // => Matched pattern 0 "or" at position 1.
   // => Matched pattern 0 "or" at position 15.
   // => Matched patterh 1 "amet" at position 22.
 .
 Building
 .
 You can easily load patterns from file:
 .
   builder := NewTrieBuilder()
   builder.LoadPatterns("patterns.txt")
   builder.LoadStrings("strings.txt")
 .
 Both functions expects a text file with one pattern per line.
 LoadPatterns expects the pattern to be in hexadecimal form.
 .
 Storing
 .
 Use Encode to store a Trie in gzip compressed binary format:
 .
   f, err := os.Create("trie.gz")
   err := Encode(f, trie)
 .
 And Decode to load it from binary format:
 .
   f, err := os.Open("trie.gz")
   trie, err := Decode(f)
 .
 Performance
 .
 Some simple benchmarking on my machine (Intel(R) Core(TM) i7-8665U CPU @
 1.90GHz, 32 GiB RAM).
 .
 Build and search time grows quite linearly with regards to number of
 patterns and input text length.
 .
 Building
 .
   BenchmarkTrieBuild/100-4                    7886            154786
 ns/op
   BenchmarkTrieBuild/1000-4                    739           1647419
 ns/op
   BenchmarkTrieBuild/10000-4                    91          13331713
 ns/op
   BenchmarkTrieBuild/100000-4                    9         123886615
 ns/op
 .
 Searching
 .
   BenchmarkMatchIbsen/100-4                1471089               819
 ns/op
   BenchmarkMatchIbsen/1000-4                202288              5667
 ns/op
   BenchmarkMatchIbsen/10000-4                19957             59680
 ns/op
   BenchmarkMatchIbsen/100000-4                2012            595086
 ns/op
 .
 Compared to Other Implementation
 .
 See aho-corasick-benchmark (https://github.com/Bobusumisu/aho-corasick-
 benchmark).
 .
 Memory Usage
 .
 As mentioned, the memory consumption will be quite high compared to a
 double-array trie implementation. Especially during the build phase
 (which currently contains a lot of object allocations).


Needed by trufflehog

Reply via email to