subject:"Re\: \[R\-sig\-phylo\] Iterating though multiple FASTA files via rbind.DNAbin"

Re: [R-sig-phylo] Iterating though multiple FASTA files via rbind.DNAbin

2020-03-13 Thread Gustavo

Hi Jarrett,

This has been working for me using the package ‘apex':

x <- read.multiFASTA(files) # creates a multiDNA object
genes <- x@dna[] # creates a list with your loci.

I hope this helps.

Best
Gustavo

Em qui., 12 de mar. de 2020 às 11:18, Jarrett Phillips <
phillipsjarre...@gmail.com> escreveu:

> Hi All,
>
> I have a folder with multiple FASTA files which need to be read into R.
>
> To avoid file overwriting, I use ape::rbind.DNAbin() as follows:
>
> file.names <- list.files(path = envr$filepath, pattern = ".fas")
>   tmp <- matrix()
>   for (i in 1:length(file.names)) {
> seqs <- read.dna(file = file.names[i], format = "fasta")
> seqs <- rbind.DNAbin(tmp, seqs)
>   }
>
> When run however, I get an error saying that the files do not have the same
> number of columns (i.e., alignments are all not of the same length).
>
> How can I avoid this error. I feel that it's a basic fix, but one that is
> not immediately obvious to me.
>
> Thanks!
>
> [[alternative HTML version deleted]]
>
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at
> http://www.mail-archive.com/r-sig-phylo@r-project.org/
>


-- 
*Gustavo Silva de Miranda*
Peter Buck Postdoctoral Fellow - GGI 
Department of Entomology
National Museum of Natural History
Smithsonian Institution
Personal website  | Google Scholar
 |
ResearchGate
|
ORCID  | Publons
 | Curriculum Lattes
 (PT)
*Editor:* Check List  | A Bruxa


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Iterating though multiple FASTA files via rbind.DNAbin

2020-03-12 Thread Emmanuel Paradis

Hi Jarrett,

read.FASTA() always returns a list. So you may do something (quite general) 
like:

fls <- dir(pattern = "\\.fas$|\\.fasta$", ignore.case = TRUE) # add more file 
extensions if needed
X <- lapply(fls, read.FASTA)
seqlen <- lengths(X)
if (length(unique(seqlen)) == 1) X <- as.matrix(X)

If the sequences are not of the same length, you can use the vector 'seqlen' 
for further processing, for instance to remove the shortest ones (if this makes 
sense):

X[seqlen > 100]

Also I found the function fasta.index (in Biostrings on BioConductor) to be 
very useful for this kind of tasks: it scans a bunch of FASTA files (possibly 
in different directories) and returns a data frame with each row describing 
each sequence (length, label, path, ...).

HTH

Best,

Emmanuel

- Le 12 Mar 20, à 22:18, Jarrett Phillips phillipsjarre...@gmail.com a 
écrit :
> Hi All,
> 
> I have a folder with multiple FASTA files which need to be read into R.
> 
> To avoid file overwriting, I use ape::rbind.DNAbin() as follows:
> 
> file.names <- list.files(path = envr$filepath, pattern = ".fas")
>  tmp <- matrix()
>  for (i in 1:length(file.names)) {
>seqs <- read.dna(file = file.names[i], format = "fasta")
>seqs <- rbind.DNAbin(tmp, seqs)
>  }
> 
> When run however, I get an error saying that the files do not have the same
> number of columns (i.e., alignments are all not of the same length).
> 
> How can I avoid this error. I feel that it's a basic fix, but one that is
> not immediately obvious to me.
> 
> Thanks!
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Iterating though multiple FASTA files via rbind.DNAbin

2020-03-12 Thread BRET R LARGET

It will be more efficient if you pre-specify the length of the list, I 
understand. Depending on the size of the data, this may or may not be 
important. I have not tested this code either.

seqs <- vector(mode = "list", length = DESIRED_LENGTH) ## replace 
DESIRED_LENGTH with the appropriate number
class(seqs) <- "DNAbin"
for( i in seq_along(seqs) ) { ## your for loop
seqs[[i]] <- read.dna(file = ..., format="fasta", as.matrix=FALSE))
}

- Bret


From: R-sig-phylo  on behalf of Liam J. 
Revell 
Sent: Thursday, March 12, 2020 10:26 AM
To: Jarrett Phillips ; r-sig-phylo@r-project.org 

Subject: Re: [R-sig-phylo] Iterating though multiple FASTA files via 
rbind.DNAbin

Dear Jarrett.

I haven't checked to see if this works, but "DNAbin" objects can either
be a matrix (the default if all the sequences have the same length) or a
list.

For your code to work as a list you might change it to be something like:

seqs <- list()
class(seqs) <- "DNAbin"
for(...) { ## your for loop
seqs <- c(seqs,
read.dna(file = ..., format="fasta", as.matrix=FALSE))
}

(In which you substitute ... for your original code.)

In theory, I believe that this (or something like this) should do what
you want.

All the best, Liam

Liam J. Revell
Associate Professor, University of Massachusetts Boston
Profesor Asistente, Universidad Cat�lica de la Ssma Concepci�n
web: http://faculty.umb.edu/liam.revell/, http://www.phytools.org

Academic Director UMass Boston Chile Abroad:
https://www.umb.edu/academics/caps/international/biology_chile

On 3/12/2020 11:18 AM, Jarrett Phillips wrote:
> [EXTERNAL SENDER]
>
> Hi All,
>
> I have a folder with multiple FASTA files which need to be read into R.
>
> To avoid file overwriting, I use ape::rbind.DNAbin() as follows:
>
> file.names <- list.files(path = envr$filepath, pattern = ".fas")
>tmp <- matrix()
>for (i in 1:length(file.names)) {
>  seqs <- read.dna(file = file.names[i], format = "fasta")
>  seqs <- rbind.DNAbin(tmp, seqs)
>}
>
> When run however, I get an error saying that the files do not have the same
> number of columns (i.e., alignments are all not of the same length).
>
> How can I avoid this error. I feel that it's a basic fix, but one that is
> not immediately obvious to me.
>
> Thanks!
>
>  [[alternative HTML version deleted]]
>
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-phylodata=02%7C01%7Cliam.revell%40umb.edu%7C04a548f1fb53418efcfa08d7c698a716%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C637196231252305886sdata=MwZX1E6rHpVC37KzmueeRbImev0FKYTd%2B5ND8Qp6nTE%3Dreserved=0
> Searchable archive at 
> https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.mail-archive.com%2Fr-sig-phylo%40r-project.org%2Fdata=02%7C01%7Cliam.revell%40umb.edu%7C04a548f1fb53418efcfa08d7c698a716%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C637196231252315878sdata=1C%2Biv8VseSTf08%2BQtinp%2F6RffBfbSTGkTDlB6sPrAtM%3Dreserved=0
>

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Iterating though multiple FASTA files via rbind.DNAbin

Re: [R-sig-phylo] Iterating though multiple FASTA files via rbind.DNAbin

Re: [R-sig-phylo] Iterating though multiple FASTA files via rbind.DNAbin

3 matches

Site Navigation

Mail list logo

Footer information