Hi just to difference between matrix and data.frame
> str(data.frame(mat)) `data.frame': 4 obs. of 5 variables: $ X1: num -0.1940 -0.7629 0.0446 -0.5408 $ X2: num -1.092 -0.040 1.070 0.868 $ X3: num 0.634 0.823 0.693 1.152 $ X4: num 0.0258 -1.6507 1.2052 0.9714 $ X5: num 0.673 0.380 -1.531 -0.426 > str((mat)) num [1:4, 1:5] -0.1940 -0.7629 0.0446 -0.5408 -1.0925 ... matrix is a numeric vector with dim attributes, data frame is matrix like structure which can hold different types of variables (columns). sd is function based on var > sd function (x, na.rm = FALSE) { if (is.matrix(x)) apply(x, 2, sd, na.rm = na.rm) else if (is.vector(x)) sqrt(var(x, na.rm = na.rm)) else if (is.data.frame(x)) sapply(x, sd, na.rm = na.rm) else sqrt(var(as.vector(x), na.rm = na.rm)) } <environment: namespace:stats> and therefore behaves in similar manner for data.frames and matrices, but mean accepts only data.frames, numeric vectors and dates Arguments: x: An R object. Currently there are methods for numeric data frames, numeric vectors and dates. A complex vector is allowed for 'trim = 0', only. So therefore matrix is treated as a numeric vector by mean but as a set of vectors by sd. Don't know why. I believe that it is because with var(matrix) you expect output as a variance matrix. Maybe somebody can explain it better. If you wanted similar behaviour for mean for matrices as sd you can try mymean<-function(x, na.rm=FALSE) { if(is.matrix(x)) colMeans(x, na.rm=na.rm) else mean(x, na.rm=na.rm) } > mymean(mat) [1] -0.3632682 0.2013843 0.8251625 0.1379205 -0.2259909 HTH Petr On 6 Jan 2006 at 16:18, Stefan Eichenberger wrote: From: "Stefan Eichenberger" <[EMAIL PROTECTED]> To: <r-help@stat.math.ethz.ch> Date sent: Fri, 6 Jan 2006 16:18:16 +0100 Subject: [R] A comment about R: > ~~~~~~~~~~~~~~~ > ... blame me for not having sent below message initially in > plain text format. Sorry! > ~~~~~~~~~~~~~~~ > > I just got into R for most of the Xmas vacations and was about to ask > for helping pointer on how to get a hold of R when I came across this > thread. I've read through most it and would like to comment from a > novice user point of view. I've a strong programming background but > limited statistical experience and no knowledge on competing > packages. I'm working as a senior engineer in electronics. > > Yes, the learning curve is steep. Most of the docu is extremely terse. > Learning is mostly from examples (a wiki was proposed in another > mail...), documentation uses no graphical elements at all. So, when it > comes to things like xyplot in lattice: where would I get the concepts > behind panels, superpanels, and the like? > > ok., this is steep and terse, but after a while I'll get over it... > That's life. The general concept is great, things can be expressed > very densly: Potential is here.... I quickly had 200 lines of my own > code together, doing what it should - or so I believed. > > Next I did: > matrix<-matrix(1:100, 10, 10) > image(matrix) > locator() > Great: I can interactively work with my graphs... But then: > filled.contour(matrix) > locator() > Oops - wrong coordinates returned. Bug. Apparently, locator() doen't > realize that fitted.contour() has a color bar to the right and scales > x wrongly... > > Here is what really shocked me: > > > str(bar) `data.frame': 206858 obs. of 12 variables: ... > > str(mean(bar[,6:12])) > Named num [1:7] 1.828 2.551 3.221 1.875 0.915 ... > ... > > str(sd(bar[,6:12])) > Named num [1:7] 0.0702 0.1238 0.1600 0.1008 0.0465 ... > ... > > prcomp(bar[,6:12])->foo > > str(foo$x) > num [1:206858, 1:7] -0.4187 -0.4015 0.0218 -0.4438 -0.3650 ... ... > > str(mean(foo$x)) > num -1.07e-13 > > str(sd(foo$x)) > Named num [1:7] 0.32235 0.06380 0.02254 0.00337 0.00270 ... > ... > > So, sd returns a vector independent on whether the arguement is a > matrix or data.frame, but mean reacts differently and returns a vector > only against a data.frame? > > The problem here is not that this is difficult to learn - the problem > is the complete absense of a concept. Is a data.frame an 'extended' > matrix with columns of different types or something different? Since > the numeric mean (I expected a vector) is recycled nicely when used > in a vector context, this makes debugging code close to impossible. > Since sd returns a vector, things like mean + 4*sd vary sufficiently > across the data elements that I assume working code... I don't get any > warning signal that something is wrong here. > > The point in case is the behavior of locator() on a filled.contour() > object: Things apparently have been programmed and debugged from > example rather than concept. > > Now, in another posting I read that all this is a feature to discourge > inexperienced users from statistics and force you to think before you > do things. Whilst I support this concept of thinking: Did I miss > something in statistics? I was in the believe that mean and sd were > relatively close to each other conceptually... (here, they are even in > different packages...) > > I will continue using R for the time being. But whether I can > recommend it to my work collegues remains to be seen: How could I > ever trust results returned? > > I'm still impressed by some of the efficiency, but my trust is deeply > shaken... > ---------------------------------------------------------------------- > - Stefan Eichenberger mailto:[EMAIL PROTECTED] > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html Petr Pikal [EMAIL PROTECTED] ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html