I have an idea to unique and combine the k-mers of many (extremely large) FASTA 
files for a project.  I have a proof of concept working in Perl that works 
fairly quickly because of the ability to cache the open file handles (at most 
1024 [4^5 as I'm using 5-mers as the bins]).  

I can get a Haskell version to work, but extremely slowly as I have no idea how 
to work around having to open and close a file for every single k-mer.  

Any suggestions?  All code and docs here:

        https://github.com/kyclark/kmer-binner

Ken

Reply via email to