Jacob Berv noted: > > I noticed today that the compression ratio for an interleaved phylip file > (zip compressed) was about 84:1, (390MB uncompressed —> 4.6MB compressed) > whereas the compression ratio for the same data non-interleaved was a much > worse 3.4:1 (390 MB uncompressed —> 113.9 MB). Not knowing much about how zip > compression actually works - I thought this might be an interesting > observation for the group…
Interleaved sequences have blocks of (say) 50 bases. Successive lines may repeat a whole block or nearly repeat it. I wonder whether that makes the interleaved format easier to compress. I would guess that the compressibility of interleaved sequences would be highest when the sequences are closely related. In that case there would be 50-base blocks of nearly identical sequences. With less closely related sequences the compressibility should be much lower. Joe ---- Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA _______________________________________________ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/