Re: [R-sig-phylo] Iterating though multiple FASTA files via rbind.DNAbin

Emmanuel Paradis Thu, 12 Mar 2020 19:19:29 -0700

Hi Jarrett,

read.FASTA() always returns a list. So you may do something (quite general) 
like:


fls <- dir(pattern = "\\.fas$|\\.fasta$", ignore.case = TRUE) # add more file 
extensions if needed
X <- lapply(fls, read.FASTA)
seqlen <- lengths(X)
if (length(unique(seqlen)) == 1) X <- as.matrix(X)

If the sequences are not of the same length, you can use the vector 'seqlen' 
for further processing, for instance to remove the shortest ones (if this makes 
sense):

X[seqlen > 100]

Also I found the function fasta.index (in Biostrings on BioConductor) to be 
very useful for this kind of tasks: it scans a bunch of FASTA files (possibly 
in different directories) and returns a data frame with each row describing 
each sequence (length, label, path, ...).

HTH

Best,

Emmanuel

----- Le 12 Mar 20, à 22:18, Jarrett Phillips phillipsjarre...@gmail.com a 
écrit :
> Hi All,
> 
> I have a folder with multiple FASTA files which need to be read into R.
> 
> To avoid file overwriting, I use ape::rbind.DNAbin() as follows:
> 
> file.names <- list.files(path = envr$filepath, pattern = ".fas")
>          tmp <- matrix()
>          for (i in 1:length(file.names)) {
>            seqs <- read.dna(file = file.names[i], format = "fasta")
>            seqs <- rbind.DNAbin(tmp, seqs)
>          }
> 
> When run however, I get an error saying that the files do not have the same
> number of columns (i.e., alignments are all not of the same length).
> 
> How can I avoid this error. I feel that it's a basic fix, but one that is
> not immediately obvious to me.
> 
> Thanks!
> 
>       [[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Iterating though multiple FASTA files via rbind.DNAbin

Reply via email to