Hey all,

I finally got around to playing around with a suffix array implementation 
and wanted to share. It's a pure julia port of sais 
<https://sites.google.com/site/yuta256/sais>, by Yuta Mori and while not 
the bleeding edge fastest suffix array sorting algorithm out there, I 
personally think it's the best bang for your buck (in terms of a relatively 
simple implementation and platform independence).

Compared to state of the art, the sais algorithm is consistently within 
1.5-2.5x of the fastest (see detailed benchmarks here 
<https://code.google.com/p/libdivsufsort/wiki/SACA_Benchmarks>), and 
handily beats all other algorithms in terms of memory usage. The native 
julia implementation is around 10x faster than the Java sais implementation 
and only 1.2-1.3x slower than the native C. I still need to spin up a 
direct comparison of this julia version vs. Go's standard library 
implementation, which is based on the qsufsort algorithm (though benchmarks 
show sais always beating it). 

There's still some more work on the interface to do and implementing some 
algorithms that build on the suffix array, so for now the only 
functionality is getting the sorted suffix array for a string/corpus. I'm 
also planning on playing around with some distributed/parallel features for 
the core algorithm to see what other shenanigans are possible.

https://github.com/quinnj/SuffixArrays.jl

-Jacob

Reply via email to