Hello Mic,

That is an interesting benchmark, and you could probably squeeze a bit
more performance out of fqextract.java by tweaking the data structures
(e.g. provide expected size to the HashMap constructor, use
ImmutableMap from Guava, etc.).

Using bioperl, biopython, bioruby, or biojava for this task will be
much slower than just spitting out lines from a file since they are
all validating the FASTQ format against the specification.

   michael


On Tue, Jan 24, 2012 at 6:08 AM, Scooter Willis <[email protected]> wrote:
> You can try a FASTA version of the file to measure performance gain.
>
> File file = new File("filename");
> Boolean  lazySequenceLoad = true;
>
> LinkedHashMap<String, DNASequence> sequences =
> FastaReaderHelper.readFastaDNASequence(file,lazySequenceLoad);
>
> This will go through and index the accession id and not load any sequence
> data which means no memory allocation and speed. You can then reference
> the DNASequence by name and when you need the sequence data it will use
> the file index to load the sequence data from the file for that specific
> sequence. The same approach can be applied to FASTQ files.
>
> Scooter
>
> On 1/24/12 3:37 AM, "Mic" <[email protected]> wrote:
>
>>Hello,
>>I have found the following benchmark (
>>http://biostar.stackexchange.com/questions/10376/how-to-efficiently-parse-
>>a-huge-fastq-file/11279#11279
>>)
>>and I just wonder whether it is possible to make Java example even faster?
>>
>>Thank you in advance.

_______________________________________________
Biojava-l mailing list  -  [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l

Reply via email to