"Paul Leo" <[EMAIL PROTECTED]> writes:

[snip]

> Also if anyone with real word experience can comment on the typical
> size of the alignment file (for paired end reads on a good day), that
> is the s_N_export.txt file generated by ELAND and the s*_sequences.txt
> file generated by GERALD that would be helpful too (I have the

Here are file sizes from a typical middling-quality recent run, in MB;
not too bad by this point. These are NOT paired-end reads.

> library(ShortRead)
> sp <- SolexaPath("/path/to/run")
> seqs <- list.files(analysisPath(sp), "s_[1-8]_sequence.txt", full=TRUE)
> exps <- list.files(analysisPath(sp), "s_[1-8]_export.txt", full=TRUE)
> file.info(seqs)$size/(1024^2)
[1] 421.8892 461.4935 362.4373 426.9607 628.7526 353.9122 441.2186 475.7593
> file.info(exps)$size/(1024^2) # lane 5 not mapped so no export
[1] 603.0924 646.3515 466.8608 570.8070 445.1177 602.2756 691.8376

> standard product info). Are there other files generated by the
> pipeline that you have found particularly useful in downstream

I've sometimes found the image intensity (_int), unfiltered sequence
(_seq.txt) and base call probability (_prb) files useful, and also
'RunBrowser' files created during the run. The intensity and _prb
files are large (5-10Mb per tile x 300 tiles per lane). These and
other intermediate files are likely to be essential in any critical
assessment of the technology or methods (as opposed to down-stream
application).

Sean mentioned that multi-core processors mean requirments for
appropriate memory per-core. Other than PDict, I've found manipulating
objects either on a per-tile or per-lane basis to use on the order of
4-5 Gb. To effectively use an 8-core processor means that 32Gb is a
kind of hard 'minimum'.

Martin

> analysis or that are useful in other 3rd party applications that you
> have tried?

> Thanks in advance Paul
>
>
>
>       [[alternative HTML version deleted]]
>
> _______________________________________________ Bioc-sig-sequencing
> mailing list [email protected]
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to