Re: [R-sig-phylo] Iterating though multiple FASTA files via rbind.DNAbin

BRET R LARGET Thu, 12 Mar 2020 11:44:30 -0700

It will be more efficient if you pre-specify the length of the list, I 
understand. Depending on the size of the data, this may or may not be 
important. I have not tested this code either.


seqs <- vector(mode = "list", length = DESIRED_LENGTH) ## replace 
DESIRED_LENGTH with the appropriate number
class(seqs) <- "DNAbin"
for( i in seq_along(seqs) ) { ## your for loop
        seqs[[i]] <- read.dna(file = ..., format="fasta", as.matrix=FALSE))
}

- Bret

________________________________
From: R-sig-phylo <r-sig-phylo-boun...@r-project.org> on behalf of Liam J. 
Revell <liam.rev...@umb.edu>
Sent: Thursday, March 12, 2020 10:26 AM
To: Jarrett Phillips <phillipsjarre...@gmail.com>; r-sig-phylo@r-project.org 
<r-sig-phylo@r-project.org>
Subject: Re: [R-sig-phylo] Iterating though multiple FASTA files via 
rbind.DNAbin

Dear Jarrett.

I haven't checked to see if this works, but "DNAbin" objects can either
be a matrix (the default if all the sequences have the same length) or a
list.

For your code to work as a list you might change it to be something like:

seqs <- list()
class(seqs) <- "DNAbin"
for(...) { ## your for loop
        seqs <- c(seqs,
                read.dna(file = ..., format="fasta", as.matrix=FALSE))
}

(In which you substitute ... for your original code.)

In theory, I believe that this (or something like this) should do what
you want.

All the best, Liam

Liam J. Revell
Associate Professor, University of Massachusetts Boston
Profesor Asistente, Universidad Cat�lica de la Ssma Concepci�n
web: http://faculty.umb.edu/liam.revell/, http://www.phytools.org

Academic Director UMass Boston Chile Abroad:
https://www.umb.edu/academics/caps/international/biology_chile

On 3/12/2020 11:18 AM, Jarrett Phillips wrote:
> [EXTERNAL SENDER]
>
> Hi All,
>
> I have a folder with multiple FASTA files which need to be read into R.
>
> To avoid file overwriting, I use ape::rbind.DNAbin() as follows:
>
> file.names <- list.files(path = envr$filepath, pattern = ".fas")
>            tmp <- matrix()
>            for (i in 1:length(file.names)) {
>              seqs <- read.dna(file = file.names[i], format = "fasta")
>              seqs <- rbind.DNAbin(tmp, seqs)
>            }
>
> When run however, I get an error saying that the files do not have the same
> number of columns (i.e., alignments are all not of the same length).
>
> How can I avoid this error. I feel that it's a basic fix, but one that is
> not immediately obvious to me.
>
> Thanks!
>
>          [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-phylo&amp;data=02%7C01%7Cliam.revell%40umb.edu%7C04a548f1fb53418efcfa08d7c698a716%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C637196231252305886&amp;sdata=MwZX1E6rHpVC37KzmueeRbImev0FKYTd%2B5ND8Qp6nTE%3D&amp;reserved=0
> Searchable archive at 
> https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.mail-archive.com%2Fr-sig-phylo%40r-project.org%2F&amp;data=02%7C01%7Cliam.revell%40umb.edu%7C04a548f1fb53418efcfa08d7c698a716%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C637196231252315878&amp;sdata=1C%2Biv8VseSTf08%2BQtinp%2F6RffBfbSTGkTDlB6sPrAtM%3D&amp;reserved=0
>

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Iterating though multiple FASTA files via rbind.DNAbin

Reply via email to