I would suggest opening all files at the beginning. Generate all k-mers,
open a file for each, and make a HashMap from k-mer to file handle. When
you're done, run through the HashMap to close all file handles. 

Also, this lets you keep your sequences/k-mers/hash keys as byte strings
for iterating reads, and convert only during the initial file opening
pass. 

-- Nick
  n...@ingolia.org

On Sat, Jul 25, 2015, at 09:18 AM, Youens-Clark, Charles Kenneth -
(kyclark) wrote:
> I have an idea to unique and combine the k-mers of many (extremely large)
> FASTA files for a project.  I have a proof of concept working in Perl
> that works fairly quickly because of the ability to cache the open file
> handles (at most 1024 [4^5 as I'm using 5-mers as the bins]).  
> 
> I can get a Haskell version to work, but extremely slowly as I have no
> idea how to work around having to open and close a file for every single
> k-mer.  
> 
> Any suggestions?  All code and docs here:
> 
>       https://github.com/kyclark/kmer-binner
> 
> Ken

Reply via email to