I have an idea to unique and combine the k-mers of many (extremely large) FASTA files for a project. I have a proof of concept working in Perl that works fairly quickly because of the ability to cache the open file handles (at most 1024 [4^5 as I'm using 5-mers as the bins]).
I can get a Haskell version to work, but extremely slowly as I have no idea how to work around having to open and close a file for every single k-mer. Any suggestions? All code and docs here: https://github.com/kyclark/kmer-binner Ken