I have an idea to unique and combine the k-mers of many (extremely large) FASTA
files for a project. I have a proof of concept working in Perl that works
fairly quickly because of the ability to cache the open file handles (at most
1024 [4^5 as I'm using 5-mers as the bins]).
I can get a Haskell version to work, but extremely slowly as I have no idea how
to work around having to open and close a file for every single k-mer.
Any suggestions? All code and docs here:
https://github.com/kyclark/kmer-binner
Ken