I would suggest opening all files at the beginning. Generate all k-mers, open a file for each, and make a HashMap from k-mer to file handle. When you're done, run through the HashMap to close all file handles.
Also, this lets you keep your sequences/k-mers/hash keys as byte strings for iterating reads, and convert only during the initial file opening pass. -- Nick n...@ingolia.org On Sat, Jul 25, 2015, at 09:18 AM, Youens-Clark, Charles Kenneth - (kyclark) wrote: > I have an idea to unique and combine the k-mers of many (extremely large) > FASTA files for a project. I have a proof of concept working in Perl > that works fairly quickly because of the ability to cache the open file > handles (at most 1024 [4^5 as I'm using 5-mers as the bins]). > > I can get a Haskell version to work, but extremely slowly as I have no > idea how to work around having to open and close a file for every single > k-mer. > > Any suggestions? All code and docs here: > > https://github.com/kyclark/kmer-binner > > Ken