Re: Cello: a library of string algoritms using succinct data structures

andrea Tue, 11 Apr 2017 10:30:10 +0200

@ Krux02 As bpr remarked, the library is really about strings - especially 
large ones. One key application is searching, both exact and approximate, but I 
have also started adding similarity measures for strings, and data structures 
such as suffix arrays have many other applications (for example the 
Burrows-Wheeler transform can be used in string compression).


Not everything is there yet, but I hope to cover a few more topics and be able 
to make it work on data on disk for the cases where the datasets are too big 
with respect to the available memory.

It just happens that bioinformatics is a sector where large strings are 
particularly prevalent, but it is not the only one. The library could as well 
be used to, well, do actual searching - that is finding stuff in large text 
corpora - or finding recurrent patterns in time series - e.g. the latest 100 
data points follow a pattern of UP UP DOWN ... has a similar pattern ever 
appeared before?

About the issue you linked, I am not sure what to comment on. I started writing 
spills because I found it convenient to have some kind of disk based sequence, 
and it was pretty easy to do by memory mapping files. I called it like this 
because when memory is scarce you can just spill the sequence to disk. The 
topic discussed there does not seem to have anything to do with memory-mapped 
data structures, but maybe I am misunderstanding your remark

Re: Cello: a library of string algoritms using succinct data structures

Reply via email to