Re: [R] removed data is still there!
On Thu, 23 Sep 2010, Peter Ehlers wrote: On 2010-09-21 5:51, Nikhil Kaza wrote: example(factor) iris1$Species- factor(iris1$Species, drop=T) will get you what you need. Hmm, doesn't work for me. ?factor does not list a 'drop=' argument. I suspect iris1$Species - [iris1$Species, drop=TRUE] was meant. See ?`[.factor` . -Peter Ehlers Nikhil Kaza Asst. Professor, City and Regional Planning University of North Carolina nikhil.l...@gmail.com On Sep 21, 2010, at 7:41 AM, pdb wrote: I'm confused, hope someone can point out what is not obvious to me. I thought I was creating a new data frame by 'deleting' rows from an existing dataframe - I've tried 2 methods. But this new data frame seems to remember values from its parent - even though there are no occurences. Where does it get the values versicolor and virginica from and give then a count of 0? What am I missing? Thanks in advance. summary(iris$Species) setosa versicolor virginica 50 50 50 nrow(iris) [1] 150 iris1- iris[iris$Species == 'setosa',] nrow(iris1) [1] 50 summary(iris1$Species) setosa versicolor virginica 50 0 0 boxplot(Petal.Width ~ Species, data = iris1, plot=1) iris2- subset(iris, Species == 'setosa') nrow(iris2) [1] 50 summary(iris2$Species) setosa versicolor virginica 50 0 0 boxplot(Petal.Width ~ Species, data = iris2, plot=1) -- View this message in context: http://r.789695.n4.nabble.com/removed-data-is-still-there-tp2548440p2548440.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] removed data is still there!
On Fri, 24 Sep 2010, Prof Brian Ripley wrote: On Thu, 23 Sep 2010, Peter Ehlers wrote: On 2010-09-21 5:51, Nikhil Kaza wrote: example(factor) iris1$Species- factor(iris1$Species, drop=T) will get you what you need. Hmm, doesn't work for me. ?factor does not list a 'drop=' argument. I suspect iris1$Species - [iris1$Species, drop=TRUE] I don't know what happened there: iris1$Species - iris1$Species[, drop=TRUE] is what I saw before sending. was meant. See ?`[.factor` . -Peter Ehlers Nikhil Kaza Asst. Professor, City and Regional Planning University of North Carolina nikhil.l...@gmail.com On Sep 21, 2010, at 7:41 AM, pdb wrote: I'm confused, hope someone can point out what is not obvious to me. I thought I was creating a new data frame by 'deleting' rows from an existing dataframe - I've tried 2 methods. But this new data frame seems to remember values from its parent - even though there are no occurences. Where does it get the values versicolor and virginica from and give then a count of 0? What am I missing? Thanks in advance. summary(iris$Species) setosa versicolor virginica 50 50 50 nrow(iris) [1] 150 iris1- iris[iris$Species == 'setosa',] nrow(iris1) [1] 50 summary(iris1$Species) setosa versicolor virginica 50 0 0 boxplot(Petal.Width ~ Species, data = iris1, plot=1) iris2- subset(iris, Species == 'setosa') nrow(iris2) [1] 50 summary(iris2$Species) setosa versicolor virginica 50 0 0 boxplot(Petal.Width ~ Species, data = iris2, plot=1) -- View this message in context: http://r.789695.n4.nabble.com/removed-data-is-still-there-tp2548440p2548440.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] removed data is still there!
On 2010-09-21 5:51, Nikhil Kaza wrote: example(factor) iris1$Species- factor(iris1$Species, drop=T) will get you what you need. Hmm, doesn't work for me. ?factor does not list a 'drop=' argument. -Peter Ehlers Nikhil Kaza Asst. Professor, City and Regional Planning University of North Carolina nikhil.l...@gmail.com On Sep 21, 2010, at 7:41 AM, pdb wrote: I'm confused, hope someone can point out what is not obvious to me. I thought I was creating a new data frame by 'deleting' rows from an existing dataframe - I've tried 2 methods. But this new data frame seems to remember values from its parent - even though there are no occurences. Where does it get the values versicolor and virginica from and give then a count of 0? What am I missing? Thanks in advance. summary(iris$Species) setosa versicolor virginica 50 50 50 nrow(iris) [1] 150 iris1- iris[iris$Species == 'setosa',] nrow(iris1) [1] 50 summary(iris1$Species) setosa versicolor virginica 50 0 0 boxplot(Petal.Width ~ Species, data = iris1, plot=1) iris2- subset(iris, Species == 'setosa') nrow(iris2) [1] 50 summary(iris2$Species) setosa versicolor virginica 50 0 0 boxplot(Petal.Width ~ Species, data = iris2, plot=1) -- View this message in context: http://r.789695.n4.nabble.com/removed-data-is-still-there-tp2548440p2548440.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] removed data is still there!
Hi, I agree with you that levels should not be automatically dropped after subsetting. However, I think there should/can be an extra argument to make it possible (the default being no dropping). I have no example in mind, but I guess it is possible that sometimes, one want to show only some levels. Would it be a bad approach? Anyway, it is not that complicated to use factor() again. Ivan Le 9/21/2010 23:22, Greg Snow a écrit : This comes up every now and then. The fact is that the behavior of R in not throwing away information unless explicitly told to, is a feature, and one that I don't want to see go away. Yes in your example doing a table or plot based on iris1$Species gives meaningless results, but anything you do with that column in now meaningless, why do you care if there is extra information in a column that you should not be doing anything further with anyways? Does it really make sense to use that column for anything now? It is a bit like a teacher bemoaning the fact that half of his/her students scored below the class median. Now some proposes that all factors should have levels dropped after subsetting, this is worse than useless, consider the following made up example: tmp1- rep( c(1:5,1:5), c(10,20,30,20,0,0,10,20,30,20) ) result- factor(tmp1, levels=1:5, labels=c('Strongly Disagree', 'Disagree', 'No Opinion', 'Agree', 'Strongly Agree') ) my.df- data.frame( result=result, sex = rep( c('M','F'), each=80 ) ) df.m.2- df.m.1- my.df[ my.df$sex=='M', ] df.f.2- df.f.1- my.df[ my.df$sex=='F', ] df.m.1[]- lapply( df.m.1, factor ) df.f.1[]- lapply( df.f.1, factor ) dev.new() par(mfrow=c(2,1)) barplot(table(df.m.1$result), main='Males') barplot(table(df.f.1$result), main='Females') dev.new() par(mfrow=c(2,1)) barplot(table(df.m.2$result), main='Males') barplot(table(df.f.2$result), main='Females') Which pair of plots is more meaningful? Easier to read? Not misleading? -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] removed data is still there!
I'm confused, hope someone can point out what is not obvious to me. I thought I was creating a new data frame by 'deleting' rows from an existing dataframe - I've tried 2 methods. But this new data frame seems to remember values from its parent - even though there are no occurences. Where does it get the values versicolor and virginica from and give then a count of 0? What am I missing? Thanks in advance. summary(iris$Species) setosa versicolor virginica 50 50 50 nrow(iris) [1] 150 iris1 - iris[iris$Species == 'setosa',] nrow(iris1) [1] 50 summary(iris1$Species) setosa versicolor virginica 50 0 0 boxplot(Petal.Width ~ Species, data = iris1, plot=1) iris2 - subset(iris, Species == 'setosa') nrow(iris2) [1] 50 summary(iris2$Species) setosa versicolor virginica 50 0 0 boxplot(Petal.Width ~ Species, data = iris2, plot=1) -- View this message in context: http://r.789695.n4.nabble.com/removed-data-is-still-there-tp2548440p2548440.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] removed data is still there!
example(factor) iris1$Species - factor(iris1$Species, drop=T) will get you what you need. Nikhil Kaza Asst. Professor, City and Regional Planning University of North Carolina nikhil.l...@gmail.com On Sep 21, 2010, at 7:41 AM, pdb wrote: I'm confused, hope someone can point out what is not obvious to me. I thought I was creating a new data frame by 'deleting' rows from an existing dataframe - I've tried 2 methods. But this new data frame seems to remember values from its parent - even though there are no occurences. Where does it get the values versicolor and virginica from and give then a count of 0? What am I missing? Thanks in advance. summary(iris$Species) setosa versicolor virginica 50 50 50 nrow(iris) [1] 150 iris1 - iris[iris$Species == 'setosa',] nrow(iris1) [1] 50 summary(iris1$Species) setosa versicolor virginica 50 0 0 boxplot(Petal.Width ~ Species, data = iris1, plot=1) iris2 - subset(iris, Species == 'setosa') nrow(iris2) [1] 50 summary(iris2$Species) setosa versicolor virginica 50 0 0 boxplot(Petal.Width ~ Species, data = iris2, plot=1) -- View this message in context: http://r.789695.n4.nabble.com/removed-data-is-still-there-tp2548440p2548440.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] removed data is still there!
Removing elements from a factor does not change the levels of the factor. ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens pdb Verzonden: dinsdag 21 september 2010 13:42 Aan: r-help@r-project.org Onderwerp: [R] removed data is still there! I'm confused, hope someone can point out what is not obvious to me. I thought I was creating a new data frame by 'deleting' rows from an existing dataframe - I've tried 2 methods. But this new data frame seems to remember values from its parent - even though there are no occurences. Where does it get the values versicolor and virginica from and give then a count of 0? What am I missing? Thanks in advance. summary(iris$Species) setosa versicolor virginica 50 50 50 nrow(iris) [1] 150 iris1 - iris[iris$Species == 'setosa',] nrow(iris1) [1] 50 summary(iris1$Species) setosa versicolor virginica 50 0 0 boxplot(Petal.Width ~ Species, data = iris1, plot=1) iris2 - subset(iris, Species == 'setosa') nrow(iris2) [1] 50 summary(iris2$Species) setosa versicolor virginica 50 0 0 boxplot(Petal.Width ~ Species, data = iris2, plot=1) -- View this message in context: http://r.789695.n4.nabble.com/removed-data-is-still-there-tp25 48440p2548440.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Druk dit bericht a.u.b. niet onnodig af. Please do not print this message unnecessarily. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] removed data is still there!
Thanks, but that was what I just discovered myself the hard way. What I really wanted to know was how to solve this issue. -- View this message in context: http://r.789695.n4.nabble.com/removed-data-is-still-there-tp2548440p2548527.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] removed data is still there!
On Sep 21, 2010, at 8:39 AM, pdb wrote: Thanks, but that was what I just discovered myself the hard way. What I really wanted to know was how to solve this issue. Although that was _not_ what you requested in your first post. 2 options: ?table ?factor iris1$Species -factor(iris$Species) # removes extraneous levels -- View this message in context: http://r.789695.n4.nabble.com/removed-data-is-still-there-tp2548440p2548527.html Sent from the R help mailing list archive at Nabble.com. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] removed data is still there!
On Sep 21, 2010, at 9:04 AM, David Winsemius wrote: On Sep 21, 2010, at 8:39 AM, pdb wrote: Thanks, but that was what I just discovered myself the hard way. What I really wanted to know was how to solve this issue. Although that was _not_ what you requested in your first post. 2 options: ?table ?factor iris1$Species -factor(iris$Species) # removes extraneous levels And that was not what I meant to type. Meant for factor to be applied to second dataframe.: iris1$Species -factor(iris1$Species) # removes extraneous levels -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] removed data is still there!
Hi, I knew about that way already, with factor(). Isn't there another possibility, directly at the subsetting step? That would be of great help iris1 - iris[iris$Species == 'setosa',] ## I mean here Ivan Le 9/21/2010 15:14, David Winsemius a écrit : On Sep 21, 2010, at 9:04 AM, David Winsemius wrote: On Sep 21, 2010, at 8:39 AM, pdb wrote: Thanks, but that was what I just discovered myself the hard way. What I really wanted to know was how to solve this issue. Although that was _not_ what you requested in your first post. 2 options: ?table ?factor iris1$Species -factor(iris$Species) # removes extraneous levels And that was not what I meant to type. Meant for factor to be applied to second dataframe.: iris1$Species -factor(iris1$Species) # removes extraneous levels -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] removed data is still there!
Ivan Calandra ivan.calandra at uni-hamburg.de writes: Hi, I knew about that way already, with factor(). Isn't there another possibility, directly at the subsetting step? That would be of great help iris1 - iris[iris$Species == 'setosa',] ## I mean here Ivan Not as far as I know. See gdata::drop.levels , or droplevels() in R 2.12.0. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] removed data is still there!
This comes up every now and then. The fact is that the behavior of R in not throwing away information unless explicitly told to, is a feature, and one that I don't want to see go away. Yes in your example doing a table or plot based on iris1$Species gives meaningless results, but anything you do with that column in now meaningless, why do you care if there is extra information in a column that you should not be doing anything further with anyways? Does it really make sense to use that column for anything now? It is a bit like a teacher bemoaning the fact that half of his/her students scored below the class median. Now some proposes that all factors should have levels dropped after subsetting, this is worse than useless, consider the following made up example: tmp1 - rep( c(1:5,1:5), c(10,20,30,20,0,0,10,20,30,20) ) result - factor(tmp1, levels=1:5, labels=c('Strongly Disagree', 'Disagree', 'No Opinion', 'Agree', 'Strongly Agree') ) my.df - data.frame( result=result, sex = rep( c('M','F'), each=80 ) ) df.m.2 - df.m.1 - my.df[ my.df$sex=='M', ] df.f.2 - df.f.1 - my.df[ my.df$sex=='F', ] df.m.1[] - lapply( df.m.1, factor ) df.f.1[] - lapply( df.f.1, factor ) dev.new() par(mfrow=c(2,1)) barplot(table(df.m.1$result), main='Males') barplot(table(df.f.1$result), main='Females') dev.new() par(mfrow=c(2,1)) barplot(table(df.m.2$result), main='Males') barplot(table(df.f.2$result), main='Females') Which pair of plots is more meaningful? Easier to read? Not misleading? -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Ivan Calandra Sent: Tuesday, September 21, 2010 7:23 AM To: r-help@r-project.org Subject: Re: [R] removed data is still there! Hi, I knew about that way already, with factor(). Isn't there another possibility, directly at the subsetting step? That would be of great help iris1 - iris[iris$Species == 'setosa',] ## I mean here Ivan Le 9/21/2010 15:14, David Winsemius a écrit : On Sep 21, 2010, at 9:04 AM, David Winsemius wrote: On Sep 21, 2010, at 8:39 AM, pdb wrote: Thanks, but that was what I just discovered myself the hard way. What I really wanted to know was how to solve this issue. Although that was _not_ what you requested in your first post. 2 options: ?table ?factor iris1$Species -factor(iris$Species) # removes extraneous levels And that was not what I meant to type. Meant for factor to be applied to second dataframe.: iris1$Species -factor(iris1$Species) # removes extraneous levels -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.