Hi Dan,
You don't need to convert to character to manipulate DNAbin objects: in
fact, woodmouse is a just matrix like others.
R> dim(woodmouse)
[1] 15 965
R> is.matrix(woodmouse)
[1] TRUE
R> dimnames(woodmouse)
[[1]]
[1] "No305" "No304" "No306" "No0906S" "No0908S" "No0909S" "No0910S"
[8] "No0912S" "No0913S" "No1103S" "No1007S" "No1114S" "No1202S" "No1206S"
[15] "No1208S"
[[2]]
NULL
R> X <- woodmouse[c('No305', 'No304', 'No306'), ]
R> identical(subsetDNAb, X)
[1] TRUE
The (quite crucial) difference is that "DNAbin" requires about 10 times
less memory:
R> object.size(woodmouse)
15944 bytes
R> object.size(mouseMat)
117296 bytes
So with (very) big data sets, this makes a difference.
As a side note, the next version of ape will be able to handle long
vectors for DNAbin objects with more than 2.1 billion bases, and
read.dna will be able to read files larger than 2.1 Gb.
Best,
Emmanuel
Le 05/09/2016 à 20:13, dga...@huskers.unl.edu a écrit :
Hi Kirston,
I generally convert DNAbin into general R objects like matrices, lists, and
vectors for my subsetting so I don't have to make DNAbin specific functions. I
typically use as.character() which converts DNAbin to matrix, then as.DNAbin()
which converts matrix back to DNAbin
example:
library(ape)
data(woodmouse)
mouseMat<-as.character(woodmouse)
dim(mouseMat)
[1] 15 965
#then do your normal subsetting
subsetMouse<-mouseMat[c('No305','No304','No306'),]
dim(subsetMouse)
[1] 3 965
#then convert it back to DNAbin
subsetDNAb<-as.DNAbin(subsetMouse)
subsetDNAb
3 DNA sequences in binary format stored in a matrix.
All sequences of same length: 965
Labels: No305 No304 No306
Base composition:
a c g t
0.306 0.260 0.126 0.307
Cheers
-Dan
________________________________
From: R-sig-phylo <r-sig-phylo-boun...@r-project.org> on behalf of Kirston Barton
<kirston.bar...@sydney.edu.au>
Sent: Monday, September 5, 2016 1:07:25 AM
To: r-sig-phylo@r-project.org
Subject: [R-sig-phylo] suset DNAbin
Hi,
I have my data in a fasta file and am importing it into R using read.dna, which
creates a DNAbin matrix object. I would like to subset my file depending on the
sequence name so that I can generate the nucleotide pairwise distance using
dist.dna. I have attempted to do this using grep, but all I get is a list of
the numbers of the sequences with the correct name and no sequences or sequence
names. Does anyone have a suggestions for an easy way to do this?
For example, my DNAbin object has the following row names:
[1] "01011-DNA1.Contig1" "01011-DNA11.Contig1" "01011-DNA12.Contig1"
"01011-DNA13.Contig1" "01011-DNA14.Contig1"
[6] "01011-DNA16.Contig1" "01011-DNA17.Contig1" "01011-DNA18.Contig1"
"01011-DNA19.Contig1" "01011-DNA2.Contig1"
[11] "01011-DNA20.Contig1" "01011-DNA21.Contig1" "01011-DNA22.Contig1"
"01011-DNA23.Contig1" "01011-DNA24.Contig1"
[16] "01011-DNA25.Contig1" "01011-DNA26.Contig1" "0103-PRNA2.Contig1"
"01011-DNA3.Contig1" "01011-DNA33.Contig1"
[21] "01011-DNA4.Contig1" "01011-DNA5.Contig1" "01011-DNA6.Contig1"
"01011-DNA7.Contig1" "01011-DNA8.Contig1"
[26] "01011-DNA9.Contig1" "01011-RNA10.Contig1" "01011-RNA13.Contig1"
"01011-RNA14.Contig1" "01011-RNA17.Contig1"
[31] "01011-RNA18.Contig1" "01011-RNA19.Contig1" "01011-RNA21.Contig1"
"01011-RNA23.Contig1" "01011-RNA24.Contig1"
[36] "01011-RNA26.Contig1" "01011-RNA28.Contig1" "01011-RNA29.Contig1"
"01011-RNA30.Contig1" "01011-RNA31.Contig1"
[41] "01011-RNA32.Contig1" "01011-RNA33.Contig1" "01011-RNA35.Contig1"
"01011-RNA38.Contig1" "01011-RNA4.Contig1"
[46] "01011-RNA5.Contig1" "01011-RNA6.Contig1" "01011-RNA8.Contig1"
"01011-RNA9.Contig1" "0102A-CRNA103.Contig1"
[51] "0102A-CRNA105.Contig1" "0102A-CRNA110.Contig1" "0102A-CRNA113.Contig1"
"0102A-CRNA115.Contig1" "0102A-CRNA118.Contig1"
[56] "0102a-DNA10.Contig1�
I would like a new DNAbin object with sequences that have 1011 anywhere in
their row name.
Please let me know if i have left out any pertinent information. Thank you in
advance for any suggestions or help with this matter.
Kind regards,
Kirston
_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
[[alternative HTML version deleted]]
_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/