Re: [R] specify the number of decimal numbers
On 14-May-09 12:03:43, lehe wrote: Thanks! In my case, I need to deal with a lot of such results, e.g. elements in a matrix. If using sprintf, does it mean I have to apply to each result individually? Is it possible to do it in a single command? Yes, in various ways depending on how you want the output to be: M-pi*matrix(c(1,2,3,4),nrow=2) M # [,1] [,2] # [1,] 3.141593 9.424778 # [2,] 6.283185 12.566371 sprintf(%.3f,M) # [1] 3.142 6.283 9.425 12.566 sprintf(c(%.1f,%.2f,%.3f,%.4f),M) # [1] 3.1 6.289.425 12.5664 In this usage, the output is a vector of character strings. If you wanted to keep the output character strings in matrix layout: matrix(sprintf(%.3f,M),nrow=2) # [,1][,2] # [1,] 3.142 9.425 # [2,] 6.283 12.566 While one may think of using print() with the digits argument for this kind of thing, digits=3 for instance prints to a minimum of 3 _significant_ digits, not decimal places. So: print(10*M,3) # [,1] [,2] # [1,] 31.4 94.2 # [2,] 62.8 125.7 See in ?print.default: The same number of decimal places is used throughout a vector. This means that 'digits' specifies the minimum number of significant digits to be used, and that at least one entry will be encoded with that minimum number. As far as I can tell, there is not a print option such as decimals, e.g. decimals=3 which would print all numbers with 3 digits after the decimal point (as in sprintf(%.3f,...)); if there were such an option, one would expect to find it listed in ?options. If that is really true, it is a pity! Short of using sprintf() (and then converting back to numeric if you need the results as number), the only way you could use digits to get a desired number (e.g. 3) of decimal places throughout would be to first study the range of magnitudes of the numbers and then choose decimals=... accordingly. For example, to print 10*M to 2 d.p., you need to note that the shortest number is 31.4***, so you need decimals=5 to get 3 d.p.: print(10*M,5) #[,1][,2] # [1,] 31.416 94.248 # [2,] 62.832 125.664 On the other hand, you could do it straight off with sprintf: [***] (matrix(as.numeric(sprintf(%.3f,10*M)),nrow=2)) #[,1][,2] # [1,] 31.416 94.248 # [2,] 62.832 125.664 regardless of the magnitudes of the numbers -- though you could still fail if there are zeros after the decimal place: [***] sprintf(%.3f,c(1.0,2.0,3.0,4.0)) # [1] 1.000 2.000 3.000 4.000 as.numeric(sprintf(%.3f,c(1.0,2.0,3.0,4.0))) # [1] 1 2 3 4 Maybe there really is a print method which will output a result like [***] without the quotes; but I don't know how to find it! ted. jholtman wrote: Depending on what you want to do, use 'sprintf': x - 1.23456789 x [1] 1.234568 as.character(x) [1] 1.23456789 sprintf(%.1f %.3f %.5f, x,x,x) [1] 1.2 1.235 1.23457 On Thu, May 14, 2009 at 7:40 AM, lehe timlee...@yahoo.com wrote: Hi, I was wondering how to specify the number of decimal numbers in my computation using R? I have too many decimal numbers for my result, when I convert them to string with as.character, the string will be too long. Thanks and regards! -- View this message in context: http://www.nabble.com/specify-the-number-of-decimal-numbers-tp23538852 p23538852.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/p osting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/specify-the-number-of-decimal-numbers-tp23538852p2 3539189.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 14-May-09 Time: 14:01:14 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch
Re: [R] specify the number of decimal numbers
On 14-May-09 12:27:40, Wacek Kusnierczyk wrote: jim holtman wrote: Depending on what you want to do, use 'sprintf': x - 1.23456789 x [1] 1.234568 as.character(x) [1] 1.23456789 sprintf(%.1f %.3f %.5f, x,x,x) [1] 1.2 1.235 1.23457 ... but remember that sprintf introduces excel bugs into r (i.e., rounding is not done according to the IEC 60559 standard, see ?round): ns = c(0.05, 0.15) round(ns, 1) # 0.0 0.2 as.numeric(sprintf('%.1f', ns)) # 0.1 0.1 vQ True! And thanks for importing that point into the discussion. And, by the way, it goes some way to solving an issue I raised earlier, in that M # [,1] [,2] # [1,] 3.141593 9.424778 # [2,] 6.283185 12.566371 round(10*M,3) #[,1][,2] # [1,] 31.416 94.248 # [2,] 62.832 125.664 though, of course, still round(1.0,3) # [1] 1 ### not 1.000 It would still be good to be able to simply have print(X,decimals=3) (with proper rounding, of course) Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 14-May-09 Time: 14:47:30 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] specify the number of decimal numbers
On 14-May-09 15:15:16, James W. MacDonald wrote: Wacek Kusnierczyk wrote: (Ted Harding) wrote: On 14-May-09 12:27:40, Wacek Kusnierczyk wrote: ... but remember that sprintf introduces excel bugs into r (i.e., rounding is not done according to the IEC 60559 standard, see ?round): ns = c(0.05, 0.15) round(ns, 1) # 0.0 0.2 as.numeric(sprintf('%.1f', ns)) # 0.1 0.1 vQ True! And thanks for importing that point into the discussion. said but true, true but sad. i have already raised the issue on this list earlier, but to no response. apparently, this sort of excel bug in r is an intentional feature, so you may not get it improved anytime soon. unless you submit a patch and get it accepted, that is. Have you brought this 'issue' up with the Perl people as well? perl -e 'print sprintf(%.1f, 0.05) . \n;' 0.1 perl -e 'print sprintf(%.1f, 0.15) . \n;' 0.1 This happens also when you use C's fprintf and sprintf (at any rate in my gcc): #include stdio.h #include math.h main(argc,argv) int argc; char **argv; { fprintf(stdout, %.1f\n, 0.15); fprintf(stdout, %.1f\n, 0.05); fprintf(stdout, %.2f\n, 0.15); fprintf(stdout, %.2f\n, 0.05); } cc -o testprint3 testprint3.c ./testprint 0.1 0.1 0.15 0.05 (with similar output when printing a string formatted by sprintf). So, in so far a R relies on the compiler's implementation of the *printf functions, this can hardly be put right withbout re-writing [g]cc! Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 14-May-09 Time: 17:16:40 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] where does the null come from?
On 13-May-09 14:43:17, Gabor Grothendieck wrote: out - apply(m, 1, cat, '\n') 1 3 2 4 out NULL Or, more explicitly, from ?cat : Value: None (invisible 'NULL'). Ted. On Wed, May 13, 2009 at 5:23 AM, Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no wrote: _ _m = matrix(1:4, 2) _ _apply(m, 1, cat, '\n') _ _# 1 2 _ _# 3 4 _ _# NULL why the null? vQ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 13-May-09 Time: 15:56:04 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems with randomly generating samples
On 13-May-09 15:18:05, Debbie Zhang wrote: Dear R users, Can anyone please tell me how to generate a large number of samples in R, given certain distribution and size. For example, if I want to generate 1000 samples of size n=100, with a N(0,1) distribution, how should I proceed? (Since I dont want to do rnorm(100,0,1) in R for 1000 times) Thanks for help Debbie One possibility is nsamples - 1000 sampsize - 100 Samples - matrix(rnorm(nsamples*sampsize,0,1),nrow=nsamples) Then each row of the matrix Samples will be a sample of size 'sampsize', the i-th can be accessed as Samples[i,], and there are 'nsamples' rows to choose from. Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 13-May-09 Time: 16:46:05 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Input to variables - help please
On 13-May-09 19:49:35, Steve Sidney wrote: Dear list I have managed to write a short program to evaluate data which is inputted from a csv file using the following x = read.csv(wms_results.csv, header=TRUE) All works fine but since I have a number of similar data files which have different names, I would like the program to allow me to input the file name, rather than having to edit the program. From the documentation I am unclear how to replace the filename.csv with a string varaiable so that from the keyboard I can input the name. Also which command can I use to output some text to the monitor as a prompt Any help would be great and apologies for such a simple question. Thanks Regards Steve Try: NextFile - readline(Enter the CSV filename:\n) x = read.csv(NextFile, header=TRUE) You will then get a prompt Enter the CSV filename: so enter it. Then it will read it. Example (from the terminal): NextFile - readline(Enter the CSV filename:\n) Enter the CSV filename: ACBDEFG.csv NextFile [1] ACBDEFG.csv Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 13-May-09 Time: 21:12:19 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matching period with perl regular expression
On 13-May-09 23:47:41, Henrique Dallazuanna wrote: Try this: gsub(^(\\w*).*$, \\1, x) Even simpler: x # [1] wa.w gsub(\\..*,,x,perl=TRUE) # [1] wa x-abcde.fghij.klmno gsub(\\..*,,x,perl=TRUE) # [1] abcde (and it doesn't matter whether 'perl' is TRUE or FALSE) Ted. On Wed, May 13, 2009 at 8:41 PM, Stephen J. Barr stephenjb...@gmail.comwrote: Hello, I have several strings where I am trying to eliminate the period and everything after the period, using a regular expression. However, I am having trouble getting this to work. x = wa.w gsub(x, \..*, , perl=TRUE) [1] Warning messages: 1: '\.' is an unrecognized escape in a character string 2: unrecognized escape removed from \..* In perl, you can match a single period with \. Is this not so even with perl=TRUE. I would like for x to be equal to x = wa What am I missing here? -stephen == Stephen J. Barr University of Washington WEB: www.econsteve.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 14-May-09 Time: 00:58:07 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Power function for ratio of lognormal means: two equally
On 12-May-09 12:00:50, Steve Candy wrote: Hi All This is a general stats problem that I am dealing with using R, so any help is greater appreciated. I have two lognormal distributions with means M1 and M2. If we have: H0: log(M1/M2)=0 H1: log(M1/M2) !=0 equivalent to log(M1/M2)=log(1+P) where P0 or P0. If we calculate the power for a value of P=0.1 or P=-0.1 (i.e. a 10% difference) and say assume SE{log(M1/M2)}=0.05, and confidence level= 100(1-alpha), alpha=0.05, then how is the power function calculated? As far as I can see we can calculate the power in the two ways given below and if there is no assumed direction to difference between M1 and M2 are not both calculations valid? # P=-0.1 qn - qnorm(p=0.05, mean=0, sd=0.05, lower.tail=T) Power.1 - pnorm(q=qn, mean=log(0.9), sd=0.05, lower.tail=T) # P=0.1 qn - qnorm(p=0.95, mean=0, sd=0.05, lower.tail=T) Power.2 - pnorm(q=qn, mean=log(1.1), sd=0.05, lower.tail=F) print(c(Power.1,Power.2)) [1] 0.6780872 0.6030887 So which value should I use? Or would the average of the two values be appropriate to use? Or is there a fault in my logic? After a quick lit search I contacted a colleague who has written two stats text books and taught at University level for many years and he has not come across this problem and suggested averaging the values. This post is to ask if anyone has come across this pretty basic problem and has a suggested approach. thanks Steve The Power Function is the probability of rejecting H0 for a particular instance of H1 (defined by a specific non-null value of a parameter), considered as a function of that parameter. So the power for H1: +10% is not the same as the power for H1: -10%. So then you face the problem of what to do about that when you want to consider What is the power when H1 is +/-10%?. The answer, in general terms, is whatever best suits the context in which the problem arises. One general solution which makes a certain kind of sense is: Adopt the smaller value of the two results (0.6780872 or 0.6030887). Then you know that the power is at least 0.6030887 if H1 (either +10% or -10%) is true. If your objective is to know at least what power can be achieved in the circumstances, then that answers the question. If 0.6030887 is adequate for your needs, then your ivestigation is in a satisfactory state. On the other hand, if in the context of the problem, you are concerned that there may not be enough power, then you may want to know at most what power can be achieved. In that case, you can achieve at most 0.6780872. If that is inadequate for your needs, then you need to re-consider the investigation as a whole, with a view to increasing the power.. You might also be considering a situation in which you are prepared to think as though +/-10% may arise at random with probabilities (P+) and (P-), and are interested in the average power as being representative of what you may expect to achieve overall (even though this will never be true for any particular case). In that case, you may reasonably adopt 0.6780872*(P-) + 0.6030887*(P+). Your colleague's suggestion is equivalent to (P-) = (P+) = 0.5. Additionally, in such a context, you may have different utilities for successfully rejecting H0 when H1: -10% is true, versus successfully rejecting H0 when H1: +10% is true. In that case, you would be thinking of computing the expected utility (the preceding case is tantamount to having equal utilities). Summary: Apply the results in the way that best fits into what you know about the situation you are investigating, and best meets the objectives you have in that aituation. There is not a universal answer to your question! Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 12-May-09 Time: 14:19:41 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] anyone know how to calculate chi square value from P val
On 11-May-09 19:36:00, Anyuan Guo wrote: Dear all, I have P value of a list of markers. Now I need the chi square value with degrees of freedom 2. I noticed there are several Chisquare functions (dchisq, pchisq, qchisq, and rchisq), but it seems all are not for my purpose. In microsoft excel, there is a function CHINV to do this, such as CHINV(0.184, 2) is 3.386, that means the chi square value for P value 0.184, degree 2 is 3.386. Does the R package has some function to do this? Thanks Anyuan Yes, and you already looked at it but apparently did not recognise it! Either: qchisq(1-0.184, 2) ## (note 1-0.184, since 0.184 is the upper tail) # [1] 3.385639 Or: qchisq(0.184, 2, lower.tail=FALSE) ## Default for lower.tail is TRUE # [1] 3.385639 Enter ?qchisq for more informatio0n on this and related function. Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 11-May-09 Time: 18:15:34 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Histogram frequencies with a normal pdf curve overlay
On 09-May-09 16:10:42, Jacques Wagnor wrote: Dear List, When I plot a histogram with 'freq=FALSE' and overlay the histogram with a normal pdf curve, everything looks as expected, as follows: x - rnorm(1000) hist(x, freq=FALSE) curve(dnorm(x), add=TRUE, col=blue) What do I need to do if I want to show the frequencies (freq=TRUE) with the same normal pdf overlay, so that the plot would still look the same? Regards, Jacques Think first about how you would convert the histogram densities (heights of the bars on the density scale) into histogram frequencies. Density * (bin width) * N = frequency where N = total number in sample. Then all you need to is multiply the Normal density by the same factor. To find out the bin width, take the difference between succesive values of the breaks component of the histogram. One way to do all this is N - 1000 x - rnorm(N) H - hist(x, freq=TRUE) ## This will plot the histogram as well dx - min(diff(H$breaks)) curve(N*dx*dnorm(x), add=TRUE, col=blue) Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 09-May-09 Time: 17:31:03 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Division ?
On 02-May-09 17:34:15, bogdanno wrote: It seems division with numbers bigger than 10 000 000 doesn't work 2000/21 [1] 952381 /23 [1] 2415459 Thank you I think you are confusing what is displayed with what is computed: 2000/21 # [1] 952381 print(2000/21,17) # [1] 952380.9523809524 /23 # [1] 2415459 print(/23,17) # [1] 2415458.913043478 (2000/21)*21 # [1] 2e+07 print((2000/21)*21,17) # [1] 2e+07 (/23)*23 # [1] print((/23)*23,17) # [1] Your numbers bigger than 10 000 000 corresponds to the default display of results to 7 significant figures. If (as in the above print() statements) you increase this, you get more reasonable-looking results. Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 03-May-09 Time: 10:11:02 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding zeros to dataframe
On 01-May-09 17:20:08, Collins, Cathy wrote: Greetings, I am new to R and am hoping to get some tips from experienced R-programmers. I have a dataset that I've read into R as a dataframe. There are 5 columns: Plot location,species name, a species number code (unique to each species name), abundance, and treatment. There are 272 plots in each treatment, but only the plots in which the species was recorded have an abundance value. For all species in the dataset, I would like to add zeros to the abundance column for any plots in which the species was not recorded, so that each species has 272 rows. The data are sorted by species and then abundance, so all of the zeros can presumably just be tacked on to the last (272-occupied plots) row for each species. My programming skills are still somewhat rudimentary (and biased toward VBA-style looping...which seems to be leading me astray). Though I have searched, I have not yet seen this particular problem addressed in the help files. Many thanks for any suggestions, Cathy mailto:ccoll...@ku.edu Suppose we call your dataframe abun.df. Then its columns will be something like abun.df$location, abun.df$name, abun.df$abundance, abun.df$trtmt (depending on what you called them in the first place). From your description, I am presuming that abundence values where a species was not recorded have no value entered. In that case, presumably they have gone into abun.df$abundance as NA. You can check this with a command like sum(is.na(abun.df$abundance)) If you get a positive result, then that is likely to be the case. As a cross-check: sum(abun.df$abundance 0, na.rm=TRUE) should give another number which, together with the first, should add up to the total number of rows in the dataframe. Assuming, then, that this is the case, the simplest method to set the non-recorded values to 0 is on the lines of ix - (is.na(abun.df$abundance)) abun.df$abundance[ix] - 0 Then you can run the check sum(abun.df$abundance == 0) and you should get a number which is the same as you got from sum(is.na(abun.df$abundance)) Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 01-May-09 Time: 18:39:58 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] postscript printer breaking up long strings
On 30-Apr-09 18:50:38, tommers wrote: For a long string in an axis title, or main title the postscript device breaks apart the long strings into smaller strings. For example, postscript('linebreaktest.eps') plot(1,xlab='aReallyLongStringToSeeHowItBreaks', ylab='aReallyLongStringToSeeHowItBreaks') for(i in c(.6,1,1.4))text(i,i,'aReallyLongStringToSeeHowItBreaks') dev.off() produces the attached output. http://www.nabble.com/file/p23322197/linebreaktest.eps linebreaktest.eps In the graph shown in your URL above, the xlab and the ylab appear in their entirety, unbroken. So does the one plotted in the middle of the graph. I get the same when I run your code. The texts you plotted at the first and third of the positions (i,i) also are not (I think) unbroken -- they are simply not entirely visible. The one at bottom left is missing its start aReallyLongSt, and the one at top right is missing its end eHowItBreaks. These apparent truncations arise because the beginning of the first, and the end of the second, are outside the plotting area, and so when the postcript() driver puts a BoundingBox at the border of the potting area this has the effect of clipping these strings when they are displayed. The apparently broken strings in the PostScript code you quote below are perfectly normal in PostScript. When dealing with text in a variable-width font, a PS driver will typically split continuous text into chunks, to allow for effects like kerning (or other adjustments of spacing) between the chunks. You will note that all five instances of the string are split in the same places, despite the fact that at least three of them come out perfectly (according to the graph on your URL). So nothing is wrong. You will have to adjust the extent of your plotting area, or adjust the pos or just of your (i,i) labels, to be able to see them in their entirety. Hoping this helps, Ted. The most important lines from the eps source are at the end of the file: 309.70 36.72 (aReallyLongStr) 0 ta 0.180 (ingT) tb -1.440 (oSeeHo) tb -0.180 (wItBreaks) tb gr 30.96 212.50 (aReallyLongStr) 90 ta 0.180 (ingT) tb -1.440 (oSeeHo) tb -0.180 (wItBreaks) tb gr 77.04 91.44 743.76 534.96 cl /Font1 findfont 12 s 0 setgray 1.03 104.76 (aReallyLongStr) 0 ta 0.180 (ingT) tb -1.440 (oSeeHo) tb -0.180 (wItBreaks) tb gr 309.70 310.10 (aReallyLongStr) 0 ta 0.180 (ingT) tb -1.440 (oSeeHo) tb -0.180 (wItBreaks) tb gr 618.36 515.43 (aReallyLongStr) 0 ta 0.180 (ingT) tb -1.440 (oSeeHo) tb -0.180 (wItBreaks) tb gr *** Each string is broken into 4 chunks. Has anyone else had this problem? Thanks! -- E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 30-Apr-09 Time: 21:39:59 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Newbie R question
On 28-Apr-09 20:42:45, Tena Sakai wrote: Hi, I am a newbie with R. My environment is linux and I have a file. I call it barebone.R, which has one line: cat ('Hello World!\n') I execute this file as: R --no-save barebone.R And it does what I expect. What I get is 20+/- lines of text, one of which is 'Hello World!'. How would I go about getting rid of all but the line I am after, 'Hello World!'? Regards, Tena Sakai tsa...@gallo.ucsf.edu An unusual request! I had to browse 'man R' a bit before getting a hint. R --silent --nosave barebone.R should do it! Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 28-Apr-09 Time: 22:11:23 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Terminology: Multiple Regression versus multivariate Reg
On 24-Apr-09 08:14:34, str...@freenet.de wrote: Hello R-Members, can anyone explain the difference between multiple and multivariate regression to me in terms of the terminology and eventually its respect to the mathematical foundation respectively ? Is multiple regression perhaps more related to GLM and multivariate Regression rather applied, if there are no explizit numeric factor levels ? Thanks for elucidations on that topic. Many thanks and best regrads Bjoern This is indeed a question of terminology and usage, and there is a degree of variability in it. As far as I am concerned (and others, though not all), multivariate regression refers to regression where the dependent (outcome) variable is mutltivariate: Y ~ X where each instance of Y is a multivariate observation. For example, suppose G (growth) consists of two pieces of data: height and weight, and A is Age. Then a multivariate regression model would look like G ~ A or (Ht,Wt) ~ Age (two variables on the left, one variable on the right). This allows for correlation between Ht and Wt to be part of the model. What is generally meant by multiple regression is regression of a single variable (on the left) on more than one variable (on the right), for example Wt ~ Ht + Age If you must make a distinction, there is the term simple regression (nowadays rarely used) for when there is only one variable on the right: Wt ~ Age Whether this is a linear model (use lm()) or a generalised linear model (use glm()) has nothing to do with the termonology. There is an unfortunate (in my view) tendency for people to use multivariate regression whden talking about what I call multiple regression above (i.e. more than 1 independent variable). I think this should be reserved for regression where the left-hand side is multivariate. But maybe I'm in a minority ... Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 24-Apr-09 Time: 10:00:23 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Is latest R faster?
Hi Folks, I just upgraded R to the latest R version 2.9.0 (2009-04-17) (from the Debian Etch repository). It seems to me that it at least starts up much faster than it used to and, though I don't have comparative timing data for long jobs, also that it does its work faster. Is that just an impression, or has something been done to speed things up? With thnks, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 20-Apr-09 Time: 20:29:44 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] geometric mean to handle large number and negative value
On 15-Apr-09 09:26:55, richard.cot...@hsl.gov.uk wrote: I have created two functions to compute geometric means. Method 1 can handle even number of negative values but not large number, vice versa for method 2. How can I merge both functions so that both large number and negative values can be handled ? geometric.mean1 - function(x) prod(x)^(1/length(x)) geometric.mean2 - function(x) exp(mean(log(x))) geometric.mean1(1:1000) [1] Inf geometric.mean2(1:1000) [1] 3678798 geometric.mean1(c(-5,-4,4,5)) [1] 4.472136 geometric.mean2(c(-5,-4,4,5)) [1] NaN Warning message: In log(x) : NaNs produced Geometric mean is usually restricted to positive inputs, because otherwise the answer can have an imaginary component. If you really want the geometric mean of negative inputs, use the second method but convert the input to be a complex number first. comp.x - as.complex(c(-5,-4,4,5)) geometric.mean2(comp.x) # [1] 0+4.472136i Regards, Richie. Mathematical Sciences Unit HSL Since it appears that you were content with the result of your product method when there was an even number of negative cases, and this is equivalent to the result you would get if all the negative numbers were positive, why not simply convert all numbers to positive by using abs(), and then applying your second method (which can cope with large numbers)? I.e. geometric.mean3 - function(x) exp(mean(log(abs(x However, do think carefully about whether the results will make the sort of sense that you intend. For instance, on that basis, geometric.mean3(c(-1,1)) = 1, not 0 geometric.mean2(c(-4,-1)) = 2, so the resulting geometric mean is outside the range of the original numbers. (yet it is what your first method would have given). On the other hand, Richie's suggestion gives results which you may consider to make more sense: comp.x - as.complex(c(-1,1)) geometric.mean2(comp.x) # [1] 0+1i comp.x - as.complex(c(-4,-1)) geometric.mean2(comp.x) # [1] -2+0i But then, in the original example: comp.x - as.complex(c(-5,-4,4,5)) geometric.mean2(comp.x) # [1] 0+4.472136i what do you want to do with the resulting 4.472136i ? So you need to think about what you intend to do with the result, in general, and about why you want to compute a geometric mean. Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 15-Apr-09 Time: 11:14:27 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] MLE for bimodal distribution
On 08-Apr-09 23:39:36, Ted Harding wrote: On 08-Apr-09 22:10:26, Ravi Varadhan wrote: EM algorithm is a better approach for maximum likelihood estimation of finite-mixture models than direct maximization of the mixture log-likelihood. Due to its ascent properties, it is guaranteed to converge to a local maximum. By theoretical construction, it also automatically ensures that all the parameter constraints are satisfied. [snip] Be warned that EM convergence can be excruciatingly slow. Acceleration methods can be of help in large simulation studies or for bootstrapping. Best, Ravi. [snip] As to acceleration: agreed, EM can be slow. Aitken acceleration can be dramatically faster. Several outlines of the Aitken procedure can be found by googling on aitken acceleration. I recently wrote a short note, describing its general principle and outlining its application to the EM algorithm, using the Probit model as illustration (with R code). For fitting the location parameter alone, Raw EM took 59 iterations, Aitken-accelerated EM took 3. For fitting the location and scale paramaters, Raw EM took 108, and Aitken took 4. If anyone would like a copy (PS or PDF) of this, drop me a line. I have now placed a PDF copy of this, if anyone is interested (it was intended as a brief expository note), at: http://www.zen89632.zen.co.uk/R/EM_Aitken/em_aitken.pdf Best wishes to all, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 10-Apr-09 Time: 17:15:06 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] puzzling lm.fit errors
On 09-Apr-09 22:53:51, Brendan Morse wrote: Hi everyone, I am running a monte carlo and am getting an error that I haven't the slightest clue where to begin figuring it out. The error is as follows: Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases In addition: Warning message: In ltm.fit(X, betas, constraint, formula, con) : Hessian matrix at convergence is not positive definite; unstable solution. The puzzling aspect though is that it successfully runs a number of complete iterations before this occurs. I am not sure what or where this is going on. Any suggestions? Thanks very much (in advance) - Brendan This is the sort of thing that can happen is the covariates in your linear model lead to a model matrix which singular (or nearly so). In other words, at least one of your covariate vectors is a linear combination of the others (or nearly so). The facts that (a) you are doing a Monte Carlo simulationm and (b) you get a number of satisfactory runs and then it fails, suggest that you are randomly creating covariate (independent-variable) values as well as dependent-variable values. If that is so, then the structure of your model may be such that it is relatively likely for such singularity to arise. While not particularly likely for continuous covariates, near-singularity can nevertheless happen. It can be much more likely if you are generating random levels of factors, since only a finite number of combinations are possible, and sooner or later you will definitely hit exact collinarity. And if you are using a fairly large number of covariates, and not a large number of cases, then it can become really likely! I'm only guessing, of course, and you can either agree with, or deny, the above suggestions. Either way, it would help in clarifying the problem to know the details of how your Monte Carlo simulation is supposed to work. Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 10-Apr-09 Time: 00:23:04 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] MLE for bimodal distribution
On 08-Apr-09 22:10:26, Ravi Varadhan wrote: EM algorithm is a better approach for maximum likelihood estimation of finite-mixture models than direct maximization of the mixture log-likelihood. Due to its ascent properties, it is guaranteed to converge to a local maximum. By theoretical construction, it also automatically ensures that all the parameter constraints are satisfied. The problem with mixture estimation, however, is that the local maximum may not be a good solution. Even global maximum may not be a good solution. For example, it is well known that in a Gaussian mixture, you can get a log-likelihood of +infinity by letting mu = one of the data points, and sigma = 0 for one of the mixture components. You can detect trouble in MLE if you observe one of the following: (1) a degenerate solution, i.e. one of the mixing proportions is close to 0, (2) one of the sigmas is too small. EM convergence is sensitive to starting values. Although, there are several starting strategies (see MacLachlan and Basford's book on Finite Mixtures), there is no universally sound strategy for ensuring a good estimate (e.g. global maximum, when it makes sense). It is always a good practice to run EM with multiple starting values. Be warned that EM convergence can be excruciatingly slow. Acceleration methods can be of help in large simulation studies or for bootstrapping. Best, Ravi. Well put! I've done a fair bit of mixture-fitting in my time, and one approach which can be worth trying (and for which there is often a reasonably objective justification) is to do a series of fits (assuming a given number of components, e.g. 2 for the present purpose) with the sigma's in a constant ratio in each one: sigma2 = lambda*sigma1 Then you won't get those singularities where it tries to climb up a single data-point. If you do this for a range of values of lambda, and keep track of the log-likelihood, then you get in effect a profile likelihood for lambda. Once you see what that looks like, you can then set about penalising ranges you don't want to see. The reason I say there is often a reasonably objective justification is that often you can be pretty sure, if there is a mixture present, that lambda does not go outside say 1/10 - 10 (or whatever), since you expect that the latent groups are fairly similar, and unlikely to have markedly different sigma's. As to acceleration: agreed, EM can be slow. Aitken acceleration can be dramatically faster. Several outlines of the Aitken procedure can be found by googling on aitken acceleration. I recently wrote a short note, describing its general principle and outlining its application to the EM algorithm, using the Probit model as illustration (with R code). For fitting the location parameter alone, Raw EM took 59 iterations, Aitken-accelerated EM took 3. For fitting the location and scale paramaters, Raw EM took 108, and Aitken took 4. If anyone would like a copy (PS or PDF) of this, drop me a line. Or is there some repository for R-help (like the Files area in Google Groups) where one can place files for others to download? [And, if not, do people think it might be a good idea?] Best wishes tgo all, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 09-Apr-09 Time: 00:33:02 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extract the p value of F statistics from the lm class
On 05-Apr-09 08:18:27, tedzzx wrote: Dear R users I have run an regression and want to extract the p value of the F statistics, but I can find a way to do that. x-summary(lm(log(RV2)~log(IV.m),data=b)) Call: lm(formula = log(RV2) ~ log(IV.m), data = b[[11]]) Residuals: Min 1Q Median 3Q Max -0.26511 -0.09718 -0.01326 0.11095 0.29777 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -0.3059 0.1917 -1.5950.121 log(IV.m) 0.9038 0.1065 8.488 1.38e-09 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.1435 on 31 degrees of freedom Multiple R-squared: 0.6991, Adjusted R-squared: 0.6894 F-statistic: 72.04 on 1 and 31 DF, p-value: 1.379e-09 names(x) [1] call terms residuals [4] coefficients aliased sigma [7] dfr.squared adj.r.squared [10] fstatisticcov.unscaled x$fstatistic valuenumdfdendf 72.04064 1.0 31.0 But can not find the p value of F statistics. Thanks Ted Maybe you were looking in the wrong place. A few lines above the output from x$fstatistic x$fstatistic valuenumdfdendf 72.04064 1.0 31.0 you will find F-statistic: 72.04 on 1 and 31 DF, p-value: 1.379e-09 and therefore will find the P-value. However, maybe that is not the question you really wanted to ask. If that is what I think it may be, you could 1: Observe that x$fstatistic is a vector with 3 values which are: value of F; numerator df; demoninator df 2: Note (from ?pf) pf(q, df1, df2, ncp, lower.tail = TRUE, log.p = FALSE) 3: Therefore do pf(x$fstatistic[1],x$fstatistic[2],x$fstatistic[3],lower.tail=FALSE) # [1] 1.378626e-09 Note that the P-value is not in the list of values returned by lm() although $fstatistic is one of the values. The computation of the P-value in the displayed output from summary.lm() is done by the 'print' method for summary.lm() (just as in [3] above). Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 05-Apr-09 Time: 13:12:45 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] use R Group SFBA April meeting reminder; video of Feb k
On 01-Apr-09 09:37:49, Gad Abraham wrote: Rolf Turner wrote: I get to the video screen OK --- there's a large greenish sideways triangle waiting to be clicked on. I do so; there's a message that says it's downloading, with a little progress bar. That seems to complete quite rapidly. Then nothing for a while. Then an error message on the video screen saying ``Fatal error --- video source not ready.'' Then that error message goes away. Long wait. Then I get audio, but never any video. Give up. I'm using Firefox on an Imac; the ``About Mozilla Firefox'' button on the Firefox dropdown menu says I've got Mozilla 5.0, Firefox 2.0.0.2 --- whatever that means. Bottom line --- I can't watch the video. Firefox 3.0.8 on OS X doesn't work either. But you can go directly to http://www.lecturemaker.com/wp-content/uploads/2009/02/lmpremovie.swf and it will work. -- Gad Abraham AND it works beautfully using FlashPlayer v.9 on Linux too! This was the version I already had installed when I first failed to locate the video -- instead seeing the little panel for installing FlashPlayer, which I ignored, since I already had it, not realising that it was telling me I should install FlashPhayer 10. It is now clear that this advice was erroneous, since version 9 in fact works fine! Furthermore, when (as I previously reported) I did install version 10, that didn't work either. I later found that version 10 was looking for versions of Linux libraries which I didn't have; and also that FlashPlayer no longer worked (e.g.) on the BBC website. So I re-installed version 9, and everything again worked on the BBC and for videos in Press reports, etc. But of course still the same failure for the SFBA video. Now, using the direct URL for the same video as kindly revealed by Gad, it works and could have worked all along! It is, therefore, clear that the HTML in the original URL is doing the wrong thing, by somehow detecting a version 10 of FlashPlayer and refusing to cooperate, when this was erroneous. I (and Jim Porzak and Mike Driscoll) received a private email from Ron Fredericks Video director and Multimedia expressionist http://www.lecturemaker.com Ron @ Lecturemaker r...@lecturemaker.com (whom I'm adding to the address-list for this mail) which stated: The video may be hard find, admittedly, if you do not have version 10 of flash player installed. If you do not have a recent version of Flash in your web browser, I present a smallish Adobe image with the words you do not have version 10 of flash installed, click to install. Attached is a screenshot of what you should be seeing at this link http://www.lecturemaker.com/2009/02/r-kickoff-video/ I'll take your feedback from the R User Group list server as a suggestion to make the need to update your flash player a lot bigger on the screen. Ron Fredericks So, message to Ron Fredericks: The video works with version 9. If (for some reason) it is essential for the original URL to detect when version 10 is not installed, then there should be a pointer on the folloowing lines: You appear not to have version 10 of Flach Player installed. If you are using an earlier version, please click on the following link . which would take people to the URL provided by Gad, which does work on earlier versions (well, version 9 at least). Then it would no longer be the case that The video may be hard find, admittedly, if you do not have version 10 of flash player installed.! My thanks to everyone who hsd helped to delve into this tangle, and especially to Gad who found where the solution was lurking! Best wishes to all, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 01-Apr-09 Time: 11:17:02 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A query about na.omit
On 01-Apr-09 15:49:40, Jose Iparraguirre D'Elia wrote: Dear all, Say I have the following dataset: DF x y z [1] 1 1 1 [2] 2 2 2 [3] 3 3NA [4] 4 NA 4 [5] NA 5 5 And I want to omit all the rows which have NA, but only in columns X and Y, so that I get: x y z 1 1 1 2 2 2 3 3 NA Roll up your sleeves, and spell out in detail the condition you need: DF-data.frame(x=c(1,2,3,4,NA),y=c(1,2,3,NA,5),z=c(1,2,NA,4,5)) DF #x y z # 1 1 1 1 # 2 2 2 2 # 3 3 3 NA # 4 4 NA 4 # 5 NA 5 5 DF[!(is.na(rowSums(DF[,(1:2)]))),] # x y z # 1 1 1 1 # 2 2 2 2 # 3 3 3 NA Hoping this helps, Ted. If I use na.omit(DF), I would delete the row for which z=NA, obtaining thus x y z 1 1 1 2 2 2 But this is not what I want, of course. If I use na.omit(DF[,1:2]), then I obtain x y 1 1 2 2 3 3 which is OK for x and y columns, but I wouldn't get the corresponding values for z (ie 1 2 NA) Any suggestions about how to obtain the desired results efficiently (the actual dataset has millions of records and almost 50 columns, and I would apply the procedure on 12 of these columns)? Sincerely, Jose Luis Jose Luis Iparraguirre Senior Research Economist Economic Research Institute of Northern Ireland [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 01-Apr-09 Time: 18:00:53 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] digits in round()
Hi Folks, Compare print(1234567890,digits=4) # [1] 1.235e+09 print(1234567890,digits=5) # [1] 1234567890 Granted that digits: a non-null value for 'digits' specifies the minimum number of significant digits to be printed in values. how does R decide to switch from the 1.235e+09 (rounded to 4 digits, i.e. the minumum, in e notation) to 1234567890 (the complete raw notation, 10 digits) when 'digits' goes from 4 to 5? Thanks, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 31-Mar-09 Time: 13:59:37 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] use R Group SFBA April meeting reminder; video of Feb k
On 30-Mar-09 22:13:04, Jim Porzak wrote: Next week Wednesday evening, April 8th, Mike Driscoll will be talking about Building Web Dashboards using R see: http://www.meetup.com/R-Users/calendar/9718968/ for details to RSVP. Also of interest, our member Ron Fredericks has just posted a well edited video of the February kickoff panel discussion at Predictive Analytics World The R and Science of Predictive Analytics: Four Case Studies in R with * Bo Cowgill, Google * Itamar Rosenn, Facebook * David Smith, Revolution Computing * Jim Porzak, The Generations Network and chaired by Michael Driscoll, Dataspora LLC see: http://www.lecturemaker.com/2009/02/r-kickoff-video/ Best, Jim Porzak It could be very interesting to watch that video! However, I have had a close look at the web page you cite: http://www.lecturemaker.com/2009/02/r-kickoff-video/ and cannot find a link to a video. Lots of links to non-video things, but none that I could see to a video. There is a link on that page at: How Google and Facebook are using R by Michael E. Driscoll | February 19, 2009 http://dataspora.com/blog/predictive-analytics-using-r/ Following that link leads to a page, on which the first link, in: (March 26th Update: Video now available) Last night, I moderated our Bay Area R Users Group kick-off event with a panel discussion entitled The R and Science of Predictive Analytics, co-located with the Predictive Analytics World conference here in SF. leads you back to where you came from, and likewise the link at the bottom of the page: A video of the event is now available courtesy of Ron Fredericks and LectureMaker. Could you help by describing where on that web page it can be found? With thanks, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 30-Mar-09 Time: 23:55:07 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] use R Group SFBA April meeting reminder; video of Feb k
On 30-Mar-09 23:04:40, Jim Porzak wrote: Since Sundar beat me to it w/ Firefox 3 test, I checked with IE 7.0 - works fine for me there also. -Jim Interesting! I'm using Iceweasel 2.0.0.19 (Firefox under another name) on Linux. I'll have to check out what blocks it has activated! I put on a fragrantly scented facemask and started IE up on Windows. Going to the same URL, I now find a big video screen just below the line: The R and Science of Predictive Analytics: Four Case in R -- The Video And it duly plays. But at the same place in my Firefox, I only see a little button inviting me to Get Adobe Flash Player. But I already have that installed for Iceweasel!. Well, maybe it needs updating. Let me try that ... It says Adobe Flash Player version 10.0.22.87 and I have flashplayer_9 already there, so ... (some time later) I now have flashplayer_10 installed, but I still get the same result. Hmmm Well, thanks for helping to locte what the problem might be! Ted. On Mon, Mar 30, 2009 at 4:00 PM, Sundar Dorai-Raj sdorai...@gmail.com wrote: Could be that you have some sort of ad filter in your browser that's blocking the video? It appears just fine for me in Firefox 3. On Mon, Mar 30, 2009 at 3:55 PM, Ted Harding ted.hard...@manchester.ac.uk wrote: On 30-Mar-09 22:13:04, Jim Porzak wrote: Next week Wednesday evening, April 8th, Mike Driscoll will be talking about Building Web Dashboards using R see: http://www.meetup.com/R-Users/calendar/9718968/ for details to RSVP. Also of interest, our member Ron Fredericks has just posted a well edited video of the February kickoff panel discussion at Predictive Analytics World The R and Science of Predictive Analytics: Four Case Studies in R with _ _ * Bo Cowgill, Google _ _ * Itamar Rosenn, Facebook _ _ * David Smith, Revolution Computing _ _ * Jim Porzak, The Generations Network and chaired by Michael Driscoll, Dataspora LLC see: http://www.lecturemaker.com/2009/02/r-kickoff-video/ Best, Jim Porzak It could be very interesting to watch that video! However, I have had a close look at the web page you cite: _http://www.lecturemaker.com/2009/02/r-kickoff-video/ and cannot find a link to a video. Lots of links to non-video things, but none that I could see to a video. There is a link on that page at: _How Google and Facebook are using R _by Michael E. Driscoll | February 19, 2009 _http://dataspora.com/blog/predictive-analytics-using-r/ Following that link leads to a page, on which the first link, in: _(March 26th Update: Video now available) _Last night, I moderated our Bay Area R Users Group kick-off _event with a panel discussion entitled The R and Science of _Predictive Analytics, co-located with the Predictive Analytics _World conference here in SF. leads you back to where you came from, and likewise the link at the bottom of the page: _A video of the event is now available courtesy of Ron Fredericks _and LectureMaker. Could you help by describing where on that web page it can be found? With thanks, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 30-Mar-09 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Time: 23:55:07 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 31-Mar-09 Time: 00:49:08 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] advice for alternative to barchart
are proportional to the total cars owned by owner i. The (A) answers (a), and (B) answers (b). As to how best to achieve this in R, I'm not sure. In any case, I don't draw sophisticated graphics like this directly in R, since I would want totally precise control over sizes, shapes and positions in the result. I in fact use the 'pic' component of the 'groff' package (though these days it may be possible to achive the same sort of result in TeX). However (staying on topic) I would certainly get R to generate the data array from which the graphic would be drawn by 'pic'. Other graphics software may also allow this to be easily and precisly done (I would not like to recommend Excel ... ). If you were to post the data, or a suitable set of similar artificial data (or mail to me privately) I would be happy to try my hand at producing the sort of thing described above. Then you could judge whether you liked it! Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 31-Mar-09 Time: 01:58:41 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subscription request
On 29-Mar-09 11:48:12, Michael Larsson wrote: Hello, I would like to subscribe to the mailing list. I already receive the daily digest, but for some reason I am not subscribed to the list, meaning any posts I make by replying to the e-mail digest have to be placed on the list by a moderator - incurring significant delay. Thanks, Michael This almost certainly means that some other email address of yours is subscribed to the list (via which you receive the digests), whereas the email address from which you try to post to the list is not subscribed. You could check this by looking into the full headers of a digest you have received, to check what address it has been sent to. For example, in one message to me from R-help I find (about halfway down the list of headers): Received: from hypatia.math.ethz.ch ([129.132.145.15]) by deimos.mcc.ac.uk with esmtps (TLSv1:AES256-SHA:256) (Exim 4.69 (FreeBSD)) (envelope-from r-help-boun...@r-project.org) id 1LndwT-000A37-TR for ted.hard...@manchester.ac.uk; Sat, 28 Mar 2009 19:11:42 + showing (in the for ... clause) that the R-help list server (hypatia.math.ethz.ch) addressed it to ted.hard...@manchester.ac.uk If you find that out, then you could work round the problem by using that address to post to R-help. Alternatively, you can simply visit the R-help web page https://stat.ethz.ch/mailman/listinfo/r-help and there subscribe your other email address (the one from which you wish to post). This will then mean that you will receive the messages from R-help at both addresses, unless yoou either use the above web-page to disable sending of mail to one of the addresses, or use it to unsubscribe the one you do not wish to post from. Though a moderator, I cannot check what addresses are subscribed (only the list owner can do that), so that is as far as I can go to help. And I hope it helps! Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 29-Mar-09 Time: 15:52:39 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Burt table from word frequency list
On 29-Mar-09 16:32:11, Joan-Josep Vallbé wrote: Ok, thank you. And is there any function to get the table directly from the original corpus? best, joan-josep vallbé You will have to think about what you are doing. As Duncan said, you need counts of pairs of words or, more precisely, of co-occurrence. But co-occurrence within what? Adjacent? Within the same sentence? Within the same paragraph? Within the same chapter? Within the same document (if your corpus incorporates several documents)? Within documents by the same author? If so, then is there an additional classification by individual document? Etc., etc., etc. In short, what is the structure of your corpus, and how do you wish this to be represented in the Burt table? Hoping this helps to move you forward, Ted. On Mar 29, 2009, at 2:00 PM, Duncan Murdoch wrote: On 29/03/2009 7:02 AM, Joan-Josep Vallbé wrote: Dear all, I have a word frequency list from a corpus (say, in .csv), where the first column is a word and the second is the occurrence frequency of that word in the corpus. Is it possible to obtain a Burt table (a table crossing all words with each other, i.e., where rows and columns are the words) from that frequency list with R? I'm exploring the ca package but I'm not able to solve this detail. No, because you don't have any information on that. You only have marginal counts. You need counts of pairs of words (from the original corpus, or already summarized.) Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 29-Mar-09 Time: 18:46:40 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R's validity on MedStats
Hi Folks, Some of you (since you have contributed) are aware of a quite vigorous discussion currently in progress on the MedStats Google Group. Others, who possibly could contribute usefully, may not be. For the moment it is at the top of the Discussions list at http://groups.google.com/group/MedStats with the title {MEDSTATS} Re: The number of requests for R help If you click on that title, you will be taken to the posting (by Martin Holt) which initiated the discussion. Best wishes to all, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 25-Mar-09 Time: 19:47:47 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confusion about ecdf
On 25-Mar-09 21:01:49, Mandro Litt wrote: Hi, I'm bit confused about ecdf (read the help files but still not sure about this). I have an analytical expression for the pdf, but want to get the empirical cdf. How do I use this analytical expression with ecdf? If this helps make it concrete, the pdf is: f(u) = \sum_{t = 1}^T 1/n_t \sum_{i = 1}^{n_t} 1/w K((u - u_{it})/w) where K = kernel density estimator, w = weights, and u_{it} = data. Thank you! ML Possibly you first need to be clear about why you need the ECDF, and at what values of t you need it. The ECDF is defined for series of values of t. These might be simply the values in your data, in which case it would be ecdf(u) where u is the vector of data. It is intended as an estimate, from the data, of the CDF of the distribution that the data came from. It basically counts the number of data less than or equal to a given value, and divides by the total number of data. Using ecdf() on the values of your estimated density function would not make a lot of sense. However, you can base an estimate of the CDF on your kernel estimate of the density by evaluating at a chosen (finely distributed) set of u-values, say u1 = 0.01*(-500,500), and then ECDF - cumsum(f(u1))/sum(f(u1)) where f(u) is an implementation, as an R function, of your expression above. Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 25-Mar-09 Time: 21:20:26 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] confidence interval or error of x intercept of a linear
On 24-Mar-09 03:31:32, Kevin J Emerson wrote: Hello all, This is something that I am sure has a really suave solution in R, but I can't quite figure out the best (or even a basic) way to do it. I have a simple linear regression that is fit with lm for which I would like to estimate the x intercept with some measure of error around it (confidence interval). In biology, there is the concept of a developmental zero - a temperature under which development will not happen. This is often estimated by extrapolation of a curve of developmental rate as a function of temperature. This developmental zero is generally reported without error. I intend to change this! There has to be some way to assign error to this term, I just have yet to figure it out. Now, it is simple enough to calculate the x-intercept itself ( - intercept / slope ), but it is a whole separate process to generate the confidence interval of it. I can't figure out how to propagate the error of the slope and intercept into the ratio of the two. The option option I have tried to figure out is to use the predict function to look for where the confidence intervals cross the axis but this hasn't been too fruitful either. I would greatly appreciate any insight you may be able to share. Cheers, Kevin Here is a small representative sample of some of my data where Dev.Rate ~ Temperature. The difficulty with the situation you describe is that you are up against what is known as the Calibration Problem in Statistics. Essentially, the difficulty is that in the theory of the linear regression there is sufficient probability for the slope to be very close to zero -- and hence that the X-intercept maybe very far out to the left or the right -- that the whole dconcept of Standard error ceases to exist. Even the expected position of the intercept does not exist. And this is true regardless of the data (the one exception would be where it is absolutely certain that the data will lie exactly on a straight line). This has been addressed a number of times, and controversally, in the statistical literature. One approach which I like (having thought of it myself :)) is the use of likelihood-based confidence intervals. Given a linear regression Y = a + b*X + E (where the error E has N(0, sig^2) distribution), suppose you want to estimate the value X0 of X for which Y = Y0, a given value (in your case Y0 = 0). For simplicity, measure X and Y from their means sum(X)/N and sum(Y)/N. The approach is based on comparing the likelihoods [A] for unresticted ML estimation of a, b and sig -- a.hat, b.hat, sig.hat (where a.hat = 0 because of the origins of X and Y) [B] for ML estimation assuming a particular value X1 for X0 -- a.tilde, b.tilde, sig.tilde where [A] a.hat = 0 (as above), b.hat = (sum(X*Y))/(sum(X^2) X0.hat= Y0/b.hat [ = -mean(Y)/b.hat in your case ] sig.hat^2 = (sum(Y - b.hat*X)^2)/(N+1) [B] a.tilde = (sum(X^2) - X0*sum(X*Y))/D b.tilde = ((N+1)*sum(X*Y) + N*X1*Y0)/D sig.hat.tilde^2 = (sum((Y - a.tilde - b.tilde*X)^2) + (Y0 - a.tilde - b.tilde*X1)^2)/(N+1) where D = (N+1)*sum(X^2 + N*X1^2 Then let R(X1) = (sig.hat^2)/(sig.tilde^2) A confidence interval for X0 is the set of values of X1 such that R(X1) R0 say, where Prob(R(X0)R0) = P, the desired confidence level, when X0 is the true value. It can be established that (N-2)*(1 - R(X0))/R(X0) has the F distribution with 1 and (N-1) degrees of freedom. Since this is independent of X0, the resulting set of values of X1 constitute a confidence interval. This was described in Harding, E.F. (1986). Modelling: the classical approach. The Statistician vol. 35, pp. 115-134 (see pages 123-126) and has been later referred to by P.J. Brown (thought I don't have the references to hand just now). When I have time for it (not today) I'll see if I can implement this neatly in R. It's basically a question of solving (N-2)*(1 - R(X0))/R(X0) = qf(P,1,(N-1)) for X0 (two solutions, maybe one, if any exist). However, it is quite possible that no solution exists (depending on P), so that the confidence interval would then be (-inf,+inf); or, when only one exists, it will be either (-inf,X0) or (X0,inf). These possibilities of infinite intervals are directly related to the possibility that, at significance level alpha = (1-P), the estimated slope may not differ significantly from 0. Maybe more later! Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 24-Mar-09 Time: 10:04:30 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
Re: [R] variance/mean
On 22-Mar-09 08:17:29, rkevinbur...@charter.net wrote: At the risk of appearing ignorant why is the folowing true? o - cbind(rep(1,3),rep(2,3),rep(3,3)) var(o) [,1] [,2] [,3] [1,]000 [2,]000 [3,]000 and mean(o) [1] 2 How do I get mean to return an array similar to var? I would expect in the above example a vector of length 3 {1,2,3}. Thank you for your help. Kevin This is a consequence of (understandable) confusion about how var() and mean() operate! It is not explicit, in ?var, that if you apply var() to a matrix, as in your var(o) you get the covariance matrix between the columns of 'o' -- except where it says (almost as an aside) that 'var' is just another interface to 'cov'. Hence in your example var(o) is equivalent to cov(o). Looked at in this way, it is now straightforward to expect what you got. This is, of course, different from what you would expect if you apply var() to a vector, namely the variance of that series of numbers (a single value). On the other hand, mean() works differently. According to ?mean: Arguments: x: An R object. Currently there are methods for numeric data frames, numeric vectors and dates. [...] Value: For a data frame, a named vector with the appropriate method being applied column by column. which may have been what you expected. But a matrix is not a data frame. Instead, it is an array, which (in effect) is a vector with an attached dimensions attribute which tells R how to chop it up into columns etc. -- whereas a data frame has its by-column structure built in to it. Now: ?mean says nothing about matrices. Nothing whatever. So you have to find out the hard way that mean(o) treats the array 'o' as a vector, ignoring its dimensions attribute. Hence you get a single number, which is the mean of all the values in the matrix. In order to get what you are apparently looking for (the means of the columns of 'o'), you could: a) (the smooth way) use the apply() function, causing mean() to be applied to the second dimension (columns) of 'o': apply(o,2,mean) # [1] 1 2 3 b) (the heavy way) take a hint from ?mean and feed it a data frame: mean(as.data.frame(o)) # V1 V2 V3 # 1 2 3 Hoping this helps to clarify things! Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 22-Mar-09 Time: 09:01:40 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] formula question
On 17-Mar-09 23:04:25, Erin Hodgess wrote: Dear R People: Here is a small data frame and two particular formulas: test.df y x 1 -0.9261650 1 2 1.5702700 2 3 0.1673920 3 4 0.7893085 4 5 0.3576875 5 6 -1.4620915 6 7 -0.5506215 7 8 -0.3480292 8 9 -1.2344036 9 10 0.8502660 10 summary(lm(exp(y)~x)) Call: lm(formula = exp(y) ~ x) Residuals: Min 1Q Median 3Q Max -1.6360 -0.6435 -0.4722 0.4215 2.9127 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 2.1689 0.9782 2.217 0.0574 . x-0.1368 0.1577 -0.868 0.4108 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 1.432 on 8 degrees of freedom Multiple R-squared: 0.08604,Adjusted R-squared: -0.0282 F-statistic: 0.7532 on 1 and 8 DF, p-value: 0.4108 summary(lm(I(y^2)~x)) Call: lm(formula = I(y^2) ~ x) Residuals: Min 1Q Median 3Q Max -0.9584 -0.6387 -0.2651 0.5754 1.4412 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 1.100840.62428 1.7630.116 x -0.038130.10061 -0.3790.715 Residual standard error: 0.9138 on 8 degrees of freedom Multiple R-squared: 0.01764,Adjusted R-squared: -0.1052 F-statistic: 0.1436 on 1 and 8 DF, p-value: 0.7146 These both work just fine. My question is: when do you know to use I() and just the function of the variable, please? thanks in advance, Erin PS Happy St Pat's Day! In the case of your formula you will find it works just as well without I(): summary(lm(y^2 ~ x)) Call: lm(formula = y^2 ~ x) Residuals: Min 1Q Median 3Q Max -0.9584 -0.6387 -0.2651 0.5754 1.4412 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 1.100840.62428 1.7630.116 x -0.038130.10061 -0.3790.715 The point of I() is that it forces numerical evaluation in an expression which could be interpreted as a symbolic model formula. Thus if X1 and X2 were numeric, and you want to regress Y on the numerical values of X1*X2, then you should use I(X1*X2), since in Y ~ X1*X2 this would be interpreted as (essentially) fitting both linear terms and their interaction (equivalent to product here), namely corresponding to Y = a + b1*X1 + b2*X2 + b12*X1*X2 In order to force the fitted equation to be Y = a + b*X1*X2 you would use Y ~ I(X1*X2). This issue does not arise when a product is on the left-hand side of the model formula, so you could simply use X1*X2 ~ Y Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 17-Mar-09 Time: 23:31:21 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R multiline expression grief
On 13-Mar-09 12:55:35, Paul Suckling wrote: Dear all. After much grief I have finally found the source of some weird discrepancies in results generated using R. It turns out that this is due to the way R handles multi-line expressions. Here is an example with R version 2.8.1: # R-script... r_parse_error - function () { a - 1; b - 1; c - 1; d - a + b + c; e - a + b + c; f - a + b + c; cat('a',a,\n); cat('b',b,\n); cat('c',c,\n); cat('d',d,\n); cat('e',e,\n); cat('f',f,\n); } r_parse_error(); a 1 b 1 c 1 d 3 e 3 f 1 As far as I am concerned f should have the value 3. This is causing me endless problems since case f is our house style for breaking up expressions for readability. All our code will need to be rechecked as a result. Is this behaviour a bug? If not, is it possible to get R to generate a warning that several lines of an expression are potentially being ignored, perhaps by turning on a strict mode which requires the semi-colons? Thank you, Paul The lines are not being ignored! In e - a + b + c; each line (until the last) is syntactically incomplete, so the R parser continues on to the next line until the expression is complete; and the ; is irrelevant for this purpose. Unlike C, but like (say) 'awk', the ; in R serves to terminate an expression when this is followed on the same line by another one, so it is basically a separator. In f - a + b + c; however, f - a is complete, so the value of 'a' is assigned to f. The line + b would have sent the value of 'b' (the + being the unary operator + which does not change anything) to the console if it did not occur inside a function definition. As it is, although + b is evaluated, because it is inside the function no putput is produced. Similarly for + c; (and, once again, the ; is irrelevant since a ; at the end of a line does nothing -- unless the line was syntatically incomplete at that point, in which case ; as the expression terminator would trigger a syntax error since an incomplete expression was being terminated. So f - a + b + c; is not a multiline expression. It is three expressions on three separate lines. The only suggestion I can make is that you have to change your house style -- it is at odds with the way the R parser works, and is bound to cause much grief. Best wishes, and good luck! Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 13-Mar-09 Time: 14:16:08 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pseudo-random numbers between two numbers
Because of a mistake I made in copying code into email, there has been confusion (expressed in some private emails). Hence the corrected version is appended at the end. On 11-Mar-09 00:05:26, Ted Harding wrote: I have modified my example to make it more convincing! See at end. On 10-Mar-09 23:39:17, Ted Harding wrote: On 10-Mar-09 23:01:45, g...@ucalgary.ca wrote: Please forget the last email I sent with the same subject. = I would like to generate pseudo-random numbers between two numbers using R, up to a given distribution, for instance, norm. That is something like rnorm(HowMany,Min,Max,mean,sd) over rnorm(HowMany,mean,sd). I am wondering if qnorm(runif(HowMany, pnorm(Min,mean,sd), pnorm(Max,mean, sd)), mean, sd) is good. Any idea? Thanks. -james It would certainly work as intended! For instance: truncnorm-function(HowMany,Min,Max,mean=0,sd=1){ qnorm(runif(HowMany, pnorm(Min,mean,sd), pnorm(Max,mean, sd)), mean, sd)} Sample - truncnorm(1000,-1,2.5) hist(Sample,breaks=100) Hoping this helps, Ted. truncnorm-function(HowMany,Min,Max,mean=0,sd=1){ qnorm(runif(HowMany, pnorm(Min,mean,sd), pnorm(Max,mean, sd)), mean, sd)} Sample - truncnorm(10,-1,2.5) hist(Sample,breaks=100) u0-(-0.975)+0.05*(0:69) yy-dnorm(u0) du-min(diff(H$breaks)) Y - 10*yy*du/(pnorm(2.5)-pnorm(-1.0)) lines(u0,Y) Ted. Corrected version: -- truncnorm-function(HowMany,Min,Max,mean=0,sd=1){ qnorm(runif(HowMany, pnorm(Min,mean,sd), pnorm(Max,mean, sd)), mean, sd)} Sample - truncnorm(10,-1,2.5) H - hist(Sample,breaks=100) u0-(-0.975)+0.05*(0:69) yy-dnorm(u0) du-min(diff(H$breaks)) Y - 10*yy*du/(pnorm(2.5)-pnorm(-1.0)) lines(u0,Y) Apologies for the error. Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 11-Mar-09 Time: 19:48:12 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pseudo-random numbers between two numbers
On 10-Mar-09 23:01:45, g...@ucalgary.ca wrote: Please forget the last email I sent with the same subject. = I would like to generate pseudo-random numbers between two numbers using R, up to a given distribution, for instance, norm. That is something like rnorm(HowMany,Min,Max,mean,sd) over rnorm(HowMany,mean,sd). I am wondering if qnorm(runif(HowMany, pnorm(Min,mean,sd), pnorm(Max,mean, sd)), mean, sd) is good. Any idea? Thanks. -james It would certainly work as intended! For instance: truncnorm-function(HowMany,Min,Max,mean=0,sd=1){ qnorm(runif(HowMany, pnorm(Min,mean,sd), pnorm(Max,mean, sd)), mean, sd)} Sample - truncnorm(1000,-1,2.5) hist(Sample,breaks=100) Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 10-Mar-09 Time: 23:39:13 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pseudo-random numbers between two numbers
I have modified my example to make it more convincing! See at end. On 10-Mar-09 23:39:17, Ted Harding wrote: On 10-Mar-09 23:01:45, g...@ucalgary.ca wrote: Please forget the last email I sent with the same subject. = I would like to generate pseudo-random numbers between two numbers using R, up to a given distribution, for instance, norm. That is something like rnorm(HowMany,Min,Max,mean,sd) over rnorm(HowMany,mean,sd). I am wondering if qnorm(runif(HowMany, pnorm(Min,mean,sd), pnorm(Max,mean, sd)), mean, sd) is good. Any idea? Thanks. -james It would certainly work as intended! For instance: truncnorm-function(HowMany,Min,Max,mean=0,sd=1){ qnorm(runif(HowMany, pnorm(Min,mean,sd), pnorm(Max,mean, sd)), mean, sd)} Sample - truncnorm(1000,-1,2.5) hist(Sample,breaks=100) Hoping this helps, Ted. truncnorm-function(HowMany,Min,Max,mean=0,sd=1){ qnorm(runif(HowMany, pnorm(Min,mean,sd), pnorm(Max,mean, sd)), mean, sd)} Sample - truncnorm(10,-1,2.5) hist(Sample,breaks=100) u0-(-0.975)+0.05*(0:69) yy-dnorm(u0) du-min(diff(H$breaks)) Y - 10*yy*du/(pnorm(2.5)-pnorm(-1.0)) lines(u0,Y) Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 11-Mar-09 Time: 00:05:23 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
On 10-Mar-09 01:07:54, David Duffy wrote: Given we are talking about statistical software, one bibliometric measure of relative package popularity is scientific citations. Web of Science is not too useful where the citation has been to a website or computer package, but Google Scholar for lme4: Linear mixed-effects models using S4 classes gives us 108 journal citations; mgcv: GAMs and generalized ridge regression for R 80 etc Cheers, David Duffy. A good point. But such numbers must be considered in the context of the prevalence of the kind of study for which the respective methods would be used. A great number of epidemiological studies would be suitable for application of glm(). Fewer would involve GAMs. Popularity of a package by citation frequency would (other things being equal) be proportional to the frequency of the kind of study for which it could be used. So one should either evaluate the proportion of studies in which an R package *could* be used, in which it *was* used; or compare the number of citations of an R package with the number of citations of an equiavlent package/module/proc in other software. Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 10-Mar-09 Time: 02:03:22 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
On 08-Mar-09 15:14:03, Duncan Murdoch wrote: On 08/03/2009 10:49 AM, hadley wickham wrote: More seriously : I don't think relative numbers of package downloads can be interpreted in any reasonable way, because reasons for package download have a very wide range from curiosity (what's this ?), fun (think fortunes...), to vital need tthink lme4 if/when a consensus on denominator DFs can be reached :-)...). What can you infer in good faith from such a mess ? So when we have messy data with measurement error, we should just give up? Doesn't sound very statistical! ;) I think the situation is worse than messy. If a client comes in with data that doesn't address the question they're interested in, I think they are better served to be told that, than to be given an answer that is not actually valid. They should also be told how to design a study that actually does address their question. You (and others) have mentioned Google Analytics as a possible way to address the quality of data; that's helpful. But analyzing bad data will just give bad conclusions. Duncan Murdoch The population of R users (which we would need to sample in order to obtain good data) is probably more elusive than a fish population in the ocean -- only partially visible at best, and with an unknown proportion invisible. At least in Fisheries research, there are long established capture techniques (from trawling to netting to electro-fishing to ... ) which can be deployed, for research purposes, in such a way as to potentially reach all members of a target population, with at least a moderately good approximation to random sampling. What have we for R? Come to think of it, electro-fishing, ... Suppose R were released with 2 types of cookie embedded in base R. Each type is randomly configured, when R is first run, to be Active or Inactive (probability of activation to be decided at the design stage ... ). Type 1, if active, on a certain date generates an event which brings it to the notice of R-Core (e.g. by clandestine email or by inducing a bug report). Type 2 acts similarly on a later date. If Type 2 acts, it carries with it information as to whether there was a Type 1 action along with whether, apparently, the Type 1 action succeeded. We then have, in effect, an analogue of the Mark-Recapture technique of population estimation (along with the usual questions about equal catchability and so forth). However, since this sort of thing (which I am not proposing seriously, only for the sake of argument) is undoubtedly unethical (and would do R's reputation no good if it came to light), I tentatively conclude that the population of R users is likely to remain as elusive as ever. Best wishes to all, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 08-Mar-09 Time: 16:11:44 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R on netbooks et al?
On 08-Mar-09 17:44:18, Douglas Bates wrote: On Sun, Mar 8, 2009 at 7:08 AM, Michael Dewey i...@aghmed.fsnet.co.uk wrote: At 08:47 05/03/2009, herrdittm...@yahoo.co.uk wrote: Dear useRs, With the rise of netbooks and 'lifestyle laptops I am tempted to get one of these to mainly run R on it. Processor power and hard disk space seem to be ok. What I wonder is the handling and feel with respect to R. Has anyone here installed or is running R on one of these, and if so, what is your experience? Would it be more of a nice looking gadget than a feasable platform to do some stats on? One issue is whether you wish to use Linux or Windows. If you do use Linux I would advise picking a netbook with one of the standard distributions. The early EEE PC had Xandros and dire warnings about using the Debian repositiories. In fact I had no problem despite a total lack of experience although I am not sure what will happy with the recent move to lenny. Because I have used Debian Linux and Debian-based distributions like Ubuntu for many years, I installed a eee-specific version of Ubuntu within a day or two of getting an ASUS eee pc1000. There are currently at least two versions of Ubuntu, easy peasy and eeebuntu, that are specific to the eee pc models. I started with easy peasy at the time it was called something else (Ubuntu eee?) and later switched to eeebuntu. In both cases packages for the latest versions of R from the Ubuntu package repository on CRAN worked flawlessly. I find the netbook to be very convenient. Having a 5 hour battery life and a weight of less than 3 pounds is wonderful. I teach all of my classes with it and even use it at home (attached to a monitor, USB keyboard and mouse and an external hard drive) in lieu of a desktop computer. (I have been eyeing the eee box covetously but have not yet convinced myself that I really need yet another computer). I develop R packages on it and don't really notice that it is under-powered by today's standards. Of course, when I started computing and even when I started working with the S language the memory capacity of computers was measured in kilobytes so the thought of only 1Gb of memory doesn't cause me to shriek in horror. Thanks for sharing your experiences, Doug. Given that devices like the EeePC are marketed in terms of less demanding users, it's good to know what it is like for a hard user. Further related comments would be welcome! I have to agree about the RAM issue too. My once-trusty old Sharp MZ-80B CP/M machine (early 1980s), with its 64KB and occupying a good 0.25 m^3 of physical space, would have to be replicated 2^14 = 16384 times over to give the same RAM (and occupy some 400 m^3 of space, say 7.4m x 7.4m x 7.4m, or about the size of my house). Now I have things on my desk, about the size of my thumb, with 8MB in each. Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 08-Mar-09 Time: 18:20:45 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PCA and categorical data
On 06-Mar-09 09:25:26, Prof Brian Ripley wrote: You might want to look into correspondence analysis, which has several variants of PCA designed for categorical data. In particular, have a look at the results of RSiteSearch(correspondence) Ted. On Fri, 6 Mar 2009, Galanidis Alexandros wrote: Hi all, I' m trying to figure out if it is appropriate to do a PCA having only categorical data (not ordinal). I have only find the following quote: One method to find such relationships is to select appropriate variables and to view the data using a method like Principle Components Analysis (PCA) [4]. This approach gives us a clear picture of the data using KL-plot of the PCA. However, the method is not settled for the data including categorical data. [http://hp.vector.co.jp/authors/VA038807/personal/covEigGiniRep17.pdf] but I'm still not sure if it WRONG to do so. Since normally categorical data is taken to be binomial or Poisson distributed, the variance varies with the mean and least-squares (the basis of PCA) is then sub-optimal. Correspondence analysis takes that into account (at least to some extent). Any opinion or reference would be very helpful There is a basic introduction in MASS4, with references to more comprehensive accounts. thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 06-Mar-09 Time: 09:46:15 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fitting a gompertz model through the origin using nls
(-a*b*Day.max))/b) which again could be fitted using nls, with paramaters a and b to be fitted (assuming Day.max given). b[*] One could generalise this law to, e.g. a power law: (dn/dt) = a*((1-b*n)^c) whose solution (satisfying n=0 at t=0) is n = (1 - 1/((1 - a*b*(c-1)*t)^(1/(c-1/b (which can possibly be usefully simplified!) Hoping this helps -- and that maybe people with more knowledge of these things can contribute more directly relevant suggestions. Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 06-Mar-09 Time: 13:39:39 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Number Regular Expressions
On 04-Mar-09 07:55:11, Bob Roberts wrote: Hi, I'm trying to write a regular expression that captures numbers in the form 9,007,653,372,262.48 but does not capture dates in the form 09/30/2005 I have tried numerous expressions, but they all seem to return the dates as well. Thanks. Testing the regular expression [0123456789,.][^/]* with grep (on Linux, R is not involved in this test): $ grep '[ ][0123456789,.][^/]*[ ]' EOT A line with no numbers This is 9,007,653,372,262.48 one line Another no-numbers This is 09/30/2005 another line Yet another no-numbers EOT This is 9,007,653,372,262.48 one line Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 04-Mar-09 Time: 09:30:17 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Diff btw percentile and quantile
On 04-Mar-09 16:10:29, megh wrote: Yes, I aware of those definitions. However I wanted to know the difference btw the words Percentile and quantile, if any. Secondly your link navigates to some non-english site, which I could not understand. Percentile and quantile are in effect the same thing. The difference is in how they express what they refer to. For example, the Median of a distribution is the 0.5 Quantile, and is the 50% percentile. So, for 0 = p = 1, refer to either the p-th quantile, or to the (100*p)-th percentile. Thus R has the function quantile(), whose ?quantile states: The generic function 'quantile' produces sample quantiles corresponding to the given probabilities. The smallest observation corresponds to a probability of 0 and the largest to a probability of 1. R (in its basic distribution) does now have a function percentile(), but, given a series P of percentages, e.g. P - c(1,5,10,25,50,75,90,95,99) one could obtain the equivalent as quantile(X,probs=P/100). So, with reference to your original question Excel has percentile() function. R function quantile() does the same thing. Is there any significant difference btw percentile and quantile? the answer is that they in effect give the same results, though differ with respect to how they are to be fed (quantile eats probabilities, percentile eats percentages). [Though (since I am not familiar with Excel) I cannot rule out that Excel's percentile() function also eats probabilities; in which case its name would be an example of sloppy nomenclature on Excel's part; which I cannot rule out on general grounds either]. Ted. Dieter Menne wrote: megh megh74 at yahoo.com writes: To calculate Percentile for a set of observations Excel has percentile() function. R function quantile() does the same thing. Is there any significant difference btw percentile and quantile? If you check the documentation of quantile, you will note that there are 9 variants of quantile which may give different values for small sample sizes and many ties. I found a German page that explains the algorithm Excel uses: http://www.excel4managers.de/index.php?page=quantile01 but I did not check if which of the R-variants this is equivalent to. Dieter E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 04-Mar-09 Time: 16:31:42 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Diff btw percentile and quantile
On 04-Mar-09 16:56:14, Wacek Kusnierczyk wrote: (Ted Harding) wrote: snip So, with reference to your original question Excel has percentile() function. R function quantile() does the same thing. Is there any significant difference btw percentile and quantile? the answer is that they in effect give the same results, though differ with respect to how they are to be fed (quantile eats probabilities, percentile eats percentages). [Though (since I am not familiar with Excel) I cannot rule out that Excel's percentile() function also eats probabilities; in which case its name would be an example of sloppy nomenclature on Excel's part; which I cannot rule out on general grounds either]. i am not familiar enough with excel to prove or disprove what you say above, but in general such claims should be grounded in the respective documentations. there are a number of ways to compute empirical quantiles (see, e.g., [1]), and it's possible that the one used by r's quantile by default (see ?quantile) is not the one used by excel (where you probably have no choice; help in oocalc does not specify the method, and i guess that excel's does not either). have you actually confirmed that excel's percentile() does the same as r's quantile() (modulo the scaling)? vQ I have now googled around a bit. All references to the Excel percentile() function say that you feed it the fractional value corresponding to the percentage. So, for example, to get the 80-th percentile you would give it 0.8. Hence Excel should call it quantile! As to the algorithm, Wikipedia states the following (translated into R syntax): Many software packages, such as Microsoft Excel, use the following method recommended by NIST[4] to estimate the value, vp, of the pth percentile of an ascending ordered dataset containing N elements with values v[1],v[2],...,v[N]: n = (p/100)*(N-1) + 1 n is then split into its integer component, k and decimal component, d, such that n = k + d. If k = 1, then the value for that percentile, vp, is the first member of the ordered dataset, v[1]. If k = N, then the value for that percentile, vp, is the Nth member of the ordered dataset, v[N]. Otherwise, 1 k N and vp = v[k] + d*(v[k + 1] - v[k]). Note that the Wikipedia article uses the % interpretation of p-th percentile, i.e. the point which is (p/100) of the way along the distribution. It looks as though R's quantile with type=4 might be the same, since it is explained as linear interpolation of the empirical cdf, which is what the above description of Excel's method does. However, R's default type is 7, which is different. Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 04-Mar-09 Time: 17:29:50 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] While with 2 conditions
On 27-Feb-09 19:37:50, MarcioRibeiro wrote: Hi listers, I check it out at the messages... But I didn't find how do I use the while with 2 conditions... while (a1 ? b1){ a-2 b-2 } Should I use what... Thanks in advance, Marcio while((a1)(b1)){ a-2 b-2 } is the way to combine the two conditions. However, I would not want to run that particular little program ... Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 28-Feb-09 Time: 00:16:10 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Moving Average
On 26-Feb-09 13:54:51, David Winsemius wrote: I saw Gabor's reply but have a clarification to request. You say you want to remove low frequency components but then you request smoothing functions. The term smoothing implies removal of high-frequency components of a series. If you produce a smoothed series, your result of course contains the low-frequency comsponents, with the high-frequency components removed. But if you then subtract that from the original series, your result contains the high-frequency components, with the low-frequency compinents removed. Moving-average is one way of smoothing (but can introduce periodic components which were not there to start with). Filtering a time-series is a very open-ended activity! In many cases a useful start is exploration of the spectral properties of the series, for which R has several functions. 'spectrum()' in the stats package (loaded bvy default) is one basic function. help.search(time series) will throw up a lot of functions. You might want to look at package 'ltsa' (linear time series analysis). Alternatively, if yuou already have good information about the frequency-structure of the series, or (for instance) know that it has a will-defined seasonal component, then you could embark on designing a transfer function specifically tuned to the job. Have a look at RSiteSearch({transfer function}) Hoping this helps, Ted. If smoothing really is your goal then additional R resource would be smooth.spline, loess (or lowess), ksmooth, or using smoothing terms in regressions. Venables and Ripley have quite a few worked examples of such in MASS. -- David Winsemius On Feb 26, 2009, at 7:07 AM, mau...@alice.it wrote: I am looking for some help at removing low-frequency components from a signal, through Moving Average on a sliding window. I understand thiis is a smoothing procedure that I never done in my life before .. sigh. I searched R archives and found rollmean, MovingAverages {TTR}, SymmetricMA. None of the above mantioned functions seems to accept the smoothing polynomial order and the sliding window with as input parameters. Maybe I am missing something. I wonder whether there is some building blocks in R if not even a function which does it all (I do not expect that much,though). Even some literature references and/or tutorials are very welcome. Thank you so much, Maura tutti i telefonini TIM! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 26-Feb-09 Time: 14:54:43 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Singularity in a regression?
On 26-Feb-09 12:58:49, Bob Gotwals wrote: R friends, In a matrix of 1s and 0s, I'm getting a singularity error. Any helpful ideas? From the degress of freedom in your output, it seems you are fitting 10 binary variables to a total of 23 observations. In such circumstances, it is not unlikely that the matrix of 0s and 1s representing the binary variables would have at least 1 column which can be represented as a linear combination of the others (which is what the 1 not defined because of singularities means). Get more data, or use fewer variables! Or, also worth considering, check whether there are relationahips in the real world between your 10 variables which would tend to generate such linear dependence. Ted. lm(formula = activity ~ metaF + metaCl + metaBr + metaI + metaMe + paraF + paraCl + paraBr + paraI + paraMe) Residuals: Min 1Q Median 3QMax -4.573e-01 -7.884e-02 3.469e-17 6.616e-02 2.427e-01 Coefficients: (1 not defined because of singularities) Estimate Std. Error t value Pr(|t|) (Intercept) 7.9173 0.1129 70.135 2e-16 *** metaF-0.3973 0.2339 -1.698 0.115172 metaClNA NA NA NA metaBr0.3454 0.1149 3.007 0.010929 * metaI 0.4827 0.2339 2.063 0.061404 . metaMe0.3654 0.1149 3.181 0.007909 ** paraF 0.7675 0.1449 5.298 0.000189 *** paraCl0.3400 0.1449 2.347 0.036925 * paraBr1.0200 0.1449 7.040 1.36e-05 *** paraI 1.3327 0.2339 5.697 9.96e-05 *** paraMe1.2191 0.1573 7.751 5.19e-06 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 0.2049 on 12 degrees of freedom Multiple R-squared: 0.9257, Adjusted R-squared: 0.8699 F-statistic: 16.61 on 9 and 12 DF, p-value: 1.811e-05 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 26-Feb-09 Time: 15:07:40 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] windows vs. linux code
There is one MAJOR issue you will have to watch out for, far more likely to turn up than calls like system(). This is that, if you want to have two or more plotting windows in use at the same time, while the first one is autoatically opened by the plot() command, you will have to open additional ones explcitily. In Linux, the command is X11() [possibly with paramaters, though usually you don't need to bother]. In Windows, it is windows() [ditto]. I run R on Linux, so use the X11() command. However, If I write a script which would also be run on a Windows system, I write using windows() in the first instance, but with a conditional alias to X11(): if(length(grep(linux,R.Version()$os))){ windows - function( ... ) X11( ... ) } and put this at the beginning of the code file. Then, if the code is run on a Windows machine, the function call windows() does the Windows thing; but if the code is run on Linux then the above test detects that, and defines a function windows() which does the same as X11(). Ted. On 26-Feb-09 01:25:36, Sherri Heck wrote: i am asking if, in general, r code can be written on a linux-based system and then run on a windows-based system. Rolf Turner wrote: On 26/02/2009, at 2:08 PM, Sherri Heck wrote: Dear All- I have been given some Rcode that was written using a Linux OS, but I use Windows-based R. The person that is giving it to me said that it needs to run on a Linux system. Does anyone have any insight and/or can verify this. I haven't yet obtained the code, so I haven't been able to try it yet. Despite the knowledge, wisdom, insight, skill, good looks, and other admirable characteristics of the members of the R-help list, few of us are skilled in telepathy or clairvoyance. cheers, Rolf Turner ## Attention:This e-mail message is privileged and confidential. If you are not theintended recipient please delete the message and notify the sender.Any views or opinions presented are solely those of the author. This e-mail has been scanned and cleared by MailMarshalwww.marshalsoftware.com ## __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 26-Feb-09 Time: 03:58:35 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] type III effect from glm()
On 19-Feb-09 10:38:50, Simon Pickett wrote: Hi all, This could be naivety/stupidity on my part rather than a problem with model output, but here goes I have fitted a fairly simple model m1-glm(count~siteall+yrs+yrs:district,family=quasipoisson, weights=weight,data=m[x[[i]],]) I want to know if yrs (a continuous variable) has a significant unique effect in the model, so I fit a simplified model with the main effect ommitted... m2-glm(count~siteall+yrs:district,family=quasipoisson, weights=weight,data=m[x[[i]],]) So, above, you have fitted two models: m1, m2 then compare models using anova() anova(m1,m1b,test=F) And here you are comparing two models: m1, m1b Could this be the reason for your result? Analysis of Deviance Table Model 1: count ~ siteall + yrs + yrs:district Model 2: count ~ siteall + yrs:district Resid. Df Resid. Dev Df Deviance F Pr(F) 1 1936 75913 2 1936 7591300 The d.f.'s are exactly the same, is this right? Can I only test the significance of a main effect when it is not in an interaction? Thanks in advance, Simon. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 19-Feb-09 Time: 10:56:12 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The Origins of R
On 04-Feb-09 20:45:04, Nutter, Benjamin wrote: snip Those of us on this list (with the possible exception of one or two nutters) would take it that it goes without saying that R was developed on the basis of S --- we all ***know*** that. snip Just want to clarify that the nutters referred to here are not the same as the Nutters that bear my name :-) Surely the Nutters are a Movement or a Party[1] whose members are nutters? [1] In the UK we have long had the Monster Raving Loony Party, which (at least in a 1990 bye-election) made a serious dent in the political scene. See: http://en.wikipedia.org/wiki/Monster_Raving_Looney_Party :-) Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 04-Feb-09 Time: 21:04:55 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] comparing the previous and next entry of a vector
On 25-Jan-09 22:29:25, Jörg Groß wrote: Hi, I have a quit abstract problem, hope someone can help me here. I have a vector like this: x - c(1,2,3,4,5,2,6) x [1] 1 2 3 4 5 2 6 now I want to get the number where the previous number is 1 and the next number is 3 (that is the 2 at the second place) I tried something with tail(x, -1) ... with that, I can check the next number, but how can I check the previous number? x - c(1,2,3,4,5,2,6) x0 - x[1:(length(x)-2)] x1 - x[3:(length(x))] x[which((x0==1)(x1==3))+1] # [1] 2 Or, changing the data: x - c(4,5,1,7,3,2,6,9,8) x0 - x[1:(length(x)-2)] x1 - x[3:(length(x))] x[which((x0==1)(x1==3))+1] # [1] 7 Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 25-Jan-09 Time: 22:47:15 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Special function? [non-R query]
Greetings all, Sorry for posting what is primarily not an R question (though it will have an R target in due course). Let F(x) be the CDF of the Normal -- i.e. pnorm(x) Let f(x) be the density function -- i.e. dnorm(x) Define G(psi) = Integral[-inf,inf] F(x)*f(x)*exp(x*psi) dx Is G(psi) a known special function? If so, can anyone point me to a reference for its properties? With thanks, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 22-Jan-09 Time: 00:58:27 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data.frame: how to extract parts
On 17-Jan-09 02:57:13, Jörg Groß wrote: Hi, I have a problem with the R syntax. It's perhaps pretty simple, but I don't understand it ... I can extract a column from a data.frame with the following code for example ... b$row1[b$row1 == male] so I see all male-entries. But I cannot extract all lines of a data.frame depending on this criterium; only.male - b[b$row1 == male] With that, I get an undefined columns selected message. So how can I extract lines of a data.frame depending on a level variable? only.male - b[b$row1 == male,] should work. Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 17-Jan-09 Time: 09:48:43 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] If...while...or what else??
On 13-Jan-09 13:20:54, Niccolò Bassani wrote: Dear R users,I come to you with a quite silly question I think, but I hope you can answer me... That is, I've got some problems in using the if and while conditions in a loop. Substantially, I want to select rows in a dataset depending on an index variable (suppose it ranges from 1 to 5), so to make a specific analysis (homemade of course) on thie new dataset. Mi first tought has been to do a double loop like this: for i in 1:5{ for j in (i+1):5{ data = dataset[(variable==i) | (variable==j),] ##analysis ##analysis } } This way I should select all the couples only once (gaining in efficiency I hope). The fact is that this arrangement has a problem: that j ends up with ranging from 2 to 6, not from to 2 to 5. So when I do a subsetting on the dataset to obtain only the rows corresponding to the values of i and j I want, when the loop comes to j = 6 I have an error, of course. What I want to know is: how can I use the if or while condition in such a loop to avoid the routine doing the computations for this case? I.e., can I tell somehow R Hey, if j=6, let it go and move on with the other computations? Or maybe you can see a faster and better way of using the for conditions?? I hope I made myself clear, if not I'll carify myself!! Thanks in advance Niccolò Presumably the foloowing summarises the situation you want to avoid: for(i in (1:5)){for(j in ((i+1):5)){ print(c(i,j))}} # [1] 1 2 # [1] 1 3 # [1] 1 4 # [1] 1 5 # [1] 2 3 # [1] 2 4 # [1] 2 5 # [1] 3 4 # [1] 3 5 # [1] 4 5 # [1] 5 6 # [1] 5 5 I.e. you get (5,6) when case 6 is not wanted. If (as seems likely) you don't want (5,5) either (i.e. you only want all pairs (i,j) from 1:5 with ij), then the following does it: for(i in (1:4)){for(j in ((i+1):5)){ print(c(i,j))}} # [1] 1 2 # [1] 1 3 # [1] 1 4 # [1] 1 5 # [1] 2 3 # [1] 2 4 # [1] 2 5 # [1] 3 4 # [1] 3 5 # [1] 4 5 However, if for some reason you do want (5,5) as well, then: for(i in (1:5)){for(j in (min(5,(i+1)):5)){ print(c(i,j))}} # [1] 1 2 # [1] 1 3 # [1] 1 4 # [1] 1 5 # [1] 2 3 # [1] 2 4 # [1] 2 5 # [1] 3 4 # [1] 3 5 # [1] 4 5 # [1] 5 5 Also (though I suspect it was simply due to hasty typing), the syntax of what you wrote above: for i in 1:5{ for j in (i+1):5{ ... } } will not work, since you must write for( ... ) and not for ... : for i in (1:5) # Error: unexpected symbol in for i Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 13-Jan-09 Time: 15:09:58 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Joining lists
On 13-Jan-09 22:18:00, glenn wrote: Very simple one sorry: How do I join 2 lists please Thanks glenn c(list1,list2) Example: A-list(a=1,b=2,c=3) B-list(d=4,e=5,f=6) c(A,B) # $a # [1] 1 # $b # [1] 2 # $c # [1] 3 # $d # [1] 4 # $e # [1] 5 # $f # [1] 6 Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 13-Jan-09 Time: 22:39:53 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rbind for matrices - rep argument
On 07-Jan-09 15:22:57, Niccolò Bassani wrote: Dear R users,I'm facing a trivial problem, but I really can't solve it. I've tried a dozen of codes, but I can't get the result I want. The question is: I have a dataframe like this one [,1] [,2] [,3] [,4] [,5] [1,]12345 [2,]25549 [3,]16812 [4,]86415 made up of decimal numbers, of course. I want to append this dataframe to itself a number x of times, i.e. 3. That is I want a dataframe like this [,1] [,2] [,3] [,4] [,5] [1,]12345 [2,]25549 [3,]16812 [4,]86415 [5,]12345 [6,]25549 [7,]16812 [8,]86415 [9,]12345 [10,]25549 [11,]16812 [12,]86415 I'm searching for an authomatic way to do this (I've already used the rbind re-writing x times the name of the frame...), as it must enter a function where one argument is exactly the number x of times to repeat this frame. Any ideas?? Thanks in advance! Niccolò I don't know whether there is anywhere a ready-made function which will implement a rep paramater for an rbind, but the following ad-hoc function will do it for you efficiently (i.e. with the minimum number of applications of the rbind() function). To produce a result which consists of k replicates of x, row-bound: Krbind - function(x,k){ y - x if(k==1) return(x) p - floor(log2(k)) for(i in (1:p)){ z - rbind(y,y) y - z } k - (k - 2^p) if(k==0) return(y) else return(rbind(y,Krbind(x,k))) } ## Example: Xdf - data.frame(X1=c(1.1,1.2),X2=c(2.1,2.2), X3=c(3.1,3.2),X4=c(4.1,4.2)) Krbind(Xdf,6) # X1 X2 X3 X4 # 1 1.1 2.1 3.1 4.1 # 2 1.2 2.2 3.2 4.2 # 3 1.1 2.1 3.1 4.1 # 4 1.2 2.2 3.2 4.2 # 5 1.1 2.1 3.1 4.1 # 6 1.2 2.2 3.2 4.2 # 7 1.1 2.1 3.1 4.1 # 8 1.2 2.2 3.2 4.2 # 9 1.1 2.1 3.1 4.1 # 10 1.2 2.2 3.2 4.2 # 11 1.1 2.1 3.1 4.1 # 12 1.2 2.2 3.2 4.2 Of course, if you're not worried by efficiency, then the simple loop y - x for(i in (1:(k-1))){y - rbind(y,x)} will do it! Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 07-Jan-09 Time: 18:08:14 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R in the NY Times
On 07-Jan-09 18:03:19, Erik Iverson wrote: I pointed a friend of mine toward the article, to which he replied: I hope that they run SAS on Solaris too, god only knows how tainted the syscalls are in that linux freeware. Of course, now Solaris is 'freeware', too, so I suppose that according to SAS, running SAS on Windows is the best way to be sure you're getting the right answers. I'm not so sure about that. Since the article described R as a supercharged version of Microsoft's Excel, surely people should run R on Windows and be *ab*so*lute*ly* sure of getting the right answers (and supercharged to boot) Ted. On Wed, 07 Jan 2009 10:56:53 -0600, Marc Schwartz marc_schwa...@comcast.net wrote: I would also point out that the use of the term freeware as opposed to FOSS by the SAS rep, comes off as being unprofessional and deliberately condescending... The author of the article, to his credit, was pretty consistent in using open source terminology. Regards, Marc on 01/07/2009 10:26 AM Bryan Hanson wrote: I believe the SAS person shot themselves in the foot more in more ways than one. In my mind, the reason you would pay, as Frank said, for non-peer-reviewed software with hidden implementations of analytic methods that cannot be reproduced by others Would be so that you can sue them later when a software problem in the designing of the engine makes your plane fall out of the sky! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 07-Jan-09 Time: 18:30:39 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R badly lags matlab on performance?
On 03-Jan-09 18:28:03, Ben Bolker wrote: Ajay Shah ajayshah at mayin.org writes: On Sat, Jan 03, 2009 at 06:59:29PM +0100, Stefan Grosse wrote: On Sat, 3 Jan 2009 22:25:38 +0530 Ajay Shah ajayshah at mayin.org wrote: AS system.time(for (i in 1:1000) {a[i] - a[i] + 1}) AS I wonder what we're doing wrong! it is no secret that R does badly with loops. Thats why it is recommended to use vectorized operations. But there's a big difference even on the vectorised version: a - a + 1. Why should that be? Both systems should merely be handing down to the BLAS. The (stock) R install has a less carefully setup BLAS as compared with the (stock) matlab install? See my other message. I'm suspicious of the real size of the difference, I think the difference could well be noise. Also, this particular bit of arithmetic doesn't involve BLAS -- see arithmetic.c (dig down until you get to integer_binary) ... Ben Bolker I just ran Ajay's examples 3 times over: R 2.8.1 on Debian Etch using 1MB of RAM in a VirtualBox VM running on a 1.73GHz CPU. Results: user system elapsed Vector: 0.112 0.288 0.393 Loop: 65.276 0.300 65.572 Vector: 0.076 0.312 0.389 Loop: 65.744 0.332 66.076 Vector: 0.068 0.328 0.394 Loop: 65.292 0.308 65.597 Not dissimilar to Ajay's R times (though my loop was about 50% longer). However, all three runs were very similar -- a little noise, but not much! I don't have octave (on the same machine) to compare these with. And I don't have MatLab at all. So I can't provide a comparison on that front, I'm afraid. Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 03-Jan-09 Time: 19:10:51 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linux words file
On 03-Jan-09 21:39:55, Erin Hodgess wrote: Dear R People: I have a small function that solves the Jumble puzzle from the newspaper (I know...big deal). It uses the the Linux words file. My question is: is there a similar words file for Windows, please? Thanks, Happy New (Gnu) Year. Sincerely, Erin And the same to you! I don't know whether you can easily find the same readymade for Windows, but you could always make one from the Linux file, since it is a plain text file. I have it in /etc/dictionaries-common/words You might need to DOSify it first (i.e. end lines with CRLF instead of just LF), though I don't know how Windows reacts to Unix text files these days. In that case (in Linux): cp /etc/dictionaries-common/words words.txt unix2dos words.txt unix2dos is an old program which is not present on all Linux distributions. Maybe you don't need it anyway, in which case just do the 'cp' (to get the extension .txt). And are you going to share your R program? Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 03-Jan-09 Time: 22:30:06 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] understanding recursive functions
On 18-Dec-08 22:33:28, Jeffrey Horner wrote: joseph.g.bo...@gsk.com wrote on 12/18/2008 04:22 PM: I'm trying to understand the use of recursive functions described on page 45 of An Introduction to R by the R core development team. A function is a list of expressions, which all get executed with only the last being assigned to a global variable, right? So if a function refers recursively to itself, it should simply start with the first expression and go from there. At least that is my understanding of why the example given on page 45 works. In light of the above, I would appreciate it if someone would understand why the following example does not work: q - function(x,h) {if (x 2) {x - x+1; return(q(x))} else return(x)} If x 1, this should add 1 to x and go back to the beginning of the if expression, and the final result should be 2. So q(0) should return 2. But it returns an error message. All references to x save one (the assignment with the - operator) are found within the current frame, not by lexical scoping, and hence is never changed... producing infinite recursion. The following at least fixes your example: q - function(x,h) {if (x 2) {x - x+1; x - x+1; return(q(x))} else return(x)} ls() # no x in global env just yet q(-10) ls() The following fixes it even more simply (using the same principles): q - function(x,h){ if (x 2) {u - x+1; return(q(u))} else return(x) } Note that - is not necessary. Just to test the method more thoroughly: q - function(x,h){ if (x 2) {u - x+h; return(q(u,h))} else return(x) } q(0,0.3) # [1] 2.1 Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 18-Dec-08 Time: 22:51:41 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Logical inconsistency
On 07-Dec-08 18:41:16, Charles C. Berry wrote: On Mon, 8 Dec 2008, Berwin A Turlach wrote: G'day Wacek, On Sat, 06 Dec 2008 10:49:24 +0100 Wacek Kusnierczyk [EMAIL PROTECTED] wrote: [] there is, in principle, no problem in having a high-level language perform the computation in a logically consistent way. Is this now supposed to be a Radio Eriwan joke? [snip] Classical Radio Eriwan stuff, and I thought this class of jokes have died out. Evidently there are a few diehards still out there. From this http://www.theologyweb.com/campus/showthread.php?t=55535 thread: --- [Question:]Can the gamma function be used in combinatorics? [Answer:] As Radio Eriwan would have said In principle, yes, because it interpolates the factorial. But what does it mean to throw a dice with pi sides sqrt(15) times ? --- :-) Chuck [rest deleted] Charles C. Berry(858) 534-2098 Well, in principle you can. But achieving the fractional part has probability zero. Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 07-Dec-08 Time: 19:06:03 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Function output difficulties
On 03-Dec-08 13:41:53, emj83 wrote: is there anyway for some parts of the function output to not be returned, even if the output has been used to calculate a further part of the function? i have tried invisible() to no avail. Thanks Emma It's not clear what you mean by some parts of the function output. The function output is whatever you nominate it to be, and can include any selection of whatever was computed within the function. For example: testfn - function(x){ X - x^2 ; Y - cos(X) ; Z - (X + Y)/(1 + x) c(X,Z) } testfn(1.1) # [1] 1.21 0.744295 However, I suspect your case may be a bit more complicated than that! If you were to post the function (or something like it), explaining what you want it to return, and in what way it returns what you do not want, it would be easier to help. Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 03-Dec-08 Time: 19:03:29 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Examples of advanced data visualization
On 01-Dec-08 03:19:57, Duncan Temple Lang wrote: Hans W. Borchers wrote: [...] A few days before your mail, I started putting together some examples of using R and SVG/ECMAScript and R and Flash/Flex/MXML/ActionScript. There are some examples of R graphics that provide interactivity in various forms and ways at http://www.omegahat.org/SVGAnnotation/tests/examples.html (Most examples will work with Firefox, Opera is the most comprehensive browser however for these examples.) The Flex examples will take more time. D. I visited that URL (with the extra t!), and got a message from my browser (Iceweasel on Debian Etch, which is Firefox under another name) that additional plugins (unspecified) were needed to display the material. When I clicked on the Install Missing Plugins button, the result was No suitable plugins found Unknown Plugin (text/svg+xml) Any suggestions for further progress? With thanks, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 01-Dec-08 Time: 09:08:05 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Examples of advanced data visualization
On 01-Dec-08 09:22:34, Gábor Csárdi wrote: On Mon, Dec 1, 2008 at 10:21 AM, Gábor Csárdi [EMAIL PROTECTED] wrote: On Mon, Dec 1, 2008 at 10:08 AM, Ted Harding [EMAIL PROTECTED] wrote: [...] I visited that URL (with the extra t!), and got a message from my browser (Iceweasel on Debian Etch, which is Firefox under another name) that additional plugins (unspecified) were needed to display the material. When I clicked on the Install Missing Plugins button, the result was No suitable plugins found Unknown Plugin (text/svg+xml) Any suggestions for further progress? Ted, what is your browser version? If 2.x, then I'm afraid that you'll have to upgrade to 3.x, or install Opera. Oooops, maybe I am wrong, see http://perlitist.com/articles/on-firefox2-and-svg Gabor Gabor With thanks, Ted. Hmm ... Interesting. Many thanks for the reasearches, Gabor. Yes, it is a 2.x version. That URL states: [...] So what was the cause of all this woe and despair? Well it all stemmed from the fact that I upgrades from 1.5 to 2, and in 1.5 I was using the Adobe SVG plugin which disables the built-in SVG rendering. Firefox 2 helpfully copied this setting, but not the plugin. For future reference svg.enabled in about:config needs to be true for the built-in SVG viewer to be used. I checked the status of svg.enabled in 'about.config', and I see that it is TRUE. Further, exploring my ~/.mozilla tree, I do not see any reference to an Adobe SVG plugin -- the only Adobe plugins are for Acrobat Reader and Flash Player. So there is a bit more to this than meets the eye. Maybe I should try to find a different source of SVG material, maybe less demanding than Hans Borchers's examples, to see whether the alleged built-in SVG capability really works anyway. Thanks again! Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 01-Dec-08 Time: 10:26:23 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] NA and logical indexes
On 28-Nov-08 21:25:36, Sebastian P. Luque wrote: Hi, I vaguely remember this issue being discussed at some length in the past, but am having trouble relocating the proper thread (defining an adequate search string to do so): --cut here---start- R foo - data.frame(A=gl(2, 5, labels=letters[1:2]), X=runif(10)) R foo$A[1] - NA R foo$A == b [1]NA FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE R foo$A[foo$A == b] [1] NA bbbbb Levels: a b R foo$X[foo$A == b] [1] NA 0.4425 0.7164 0.3171 0.1967 0.8300 R foo[foo$A == b, ] A X NA NA NA 6 b 0.4425 7 b 0.7164 8 b 0.3171 9 b 0.1967 10b 0.8300 --cut here---end--- Why is foo$X[1] set to NA in that last call? Cheers, Seb It is not! In my repetition (which has different runifs): foo[foo$A == b, ] # A X # NA NANA # 6 b 0.2300618 # 7 b 0.5109791 # 8 b 0.7947862 # 9 b 0.3400228 # 10b 0.5464989 foo # A X # 1 NA 0.5013591 # 2 a 0.4475963 # 3 a 0.2600449 # 4 a 0.9240698 # 5 a 0.4205284 # 6 b 0.2300618 # 7 b 0.5109791 # 8 b 0.7947862 # 9 b 0.3400228 # 10b 0.5464989 NA can seem to have a bewildering logic, but it all becomes clear if you interpret NA as value unkown. You asked for foo[foo$A == b, ]. What happens is that when the test foo$A == b encounters f$A[1] it sees NA, so it does not know what the value is. Hence it does not know whether this row of foo satisfies the test. Hence the entire row is of unkown status. Hence a row is output all of whose elements (including the row label, i.e. the row number) are flagged unknown, i.e. NA. AFter all, if it gave the value of foo$X[1] = 0.5013591, and you subsequently acessed foo[foo$A == b,][1,2] and got 0.5013591, you would presumably proceed as though this was a value corresponding to a case where foo$A == b. But it is not -- since foo$A[1] = NA, you don't know whether that is the case. Hence you don't know the value of foo[foo$A == b,][1,2]. Clear? ( :)) Hoping this helps, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 28-Nov-08 Time: 22:01:11 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi-Square Test Disagreement
On 26-Nov-08 17:57:52, Andrew Choens wrote: [...] And, since I do work for government, if I ask for a roomful of calculators, I might just get them. And really, what am I going to do with a roomful of calculators? --andy Insert something humorous here. :-) Next time the launch of an incoming nuclear strike is detected, set them to work as follows (following Karl Pearson's historical precedent): Anti-aircraft guns all day long: Computing for the Ministry of Munitions JUNE BARROW GREEN (Open University) From January 1917 until March 1918 Pearson and his staff of mathematicians and human computers at the Drapers Biometric Laboratory worked tirelessly on the computing of ballistic charts, high-angle range tables and fuze-scales for AV Hill of the Anti-Aircraft Experimental Section. Things did not always go smoothly -- Pearson did not take kindly to the calculations of his staff being questioned -- and Hill sometimes had to work hard to keep the peace. If you have enough of them (and Pearson undoubtedly did, so you can quote that in your requisition request), then you might just get the answer in time! [ The above excerpted from http://tinyurl.com/6byoub ] Good luck! Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 26-Nov-08 Time: 18:35:25 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help With Permutations
On 24-Nov-08 13:36:31, Mulazzani Fabio (Student Com06) wrote: I have a problem with permutations functions in R I just started using R so maybe it is an easy question I need to obtain all the 9.somthingExp 157 permutations that can be given from the number from 1 to 100 I wrote the following commands: library(gregmisc) options(expressions=1e5) cmat - combinations(300,2) dim(cmat) # 44850 by 2 permutations(n=100, r=100) Unfortunately at a certain point (after few minutes) I get the following Error: Error: cannot allocate vector of size 609.1 Mb What can I do? Thanks Fabio The first thing to do is to decide how long you are willing to wait! To an adequate approximation there are 10^158 of them. Simply to obtain them all (at a rate of 10^10 per second, which is faster than the CPU frequency of most desktop computers) would take 10^148 seconds, or slightly longer than 3*(10^140) years. Current estimates of the age of the Universe are of the order of 1.5*(10^10) years, so the Universe will have to last about 2*(10^130) times as long as it has already existed, before the task could be finished. So: why do you want to do this? Best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 24-Nov-08 Time: 16:43:29 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to test for the empty set
On 24-Nov-08 17:41:25, G. Jay Kerns wrote: Dear R-help, I first thought that the empty set (for a vector) would be NULL. x - c() x However, the documentation seems to make clear that there _many_ empty sets depending on the vector's mode, namely, numeric(0), character(0), logical(0), etc. This is borne out by y - letters[1:3] z - letters[4:6] intersect(y,z) which, of course, is non-NULL: is.null(character(0)) # FALSE In the above (and below) cases, would not (length(x)==0) do? Ted. So, how can we test if a vector is, say, character(0)? The following doesn't (seem to) work: x - character(0) x == character(0) # logical(0) More snooping led to the following: wiki.r-project.org/rwiki/doku.php?id=tips:surprises:emptysetfuncs and at the bottom of the page it says logical(0) is an empty set, thus is TRUE. However, I get isTRUE(logical(0)) # FALSE but, on the other hand, all.equal(x, character(0)) # TRUE This would seem to be the solution, but am I missing something? and in particular, is there an elegant way to check in the case that the mode of the vector is not already known? Thanks in advance for any insight you may have. Best, Jay E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 24-Nov-08 Time: 18:15:31 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grep for asterisks *'s
On 23-Nov-08 08:41:02, Robin Clark wrote: Hello, I'm trying to determine if a string contains asterisks using the grep function. I know that this is slightly difficult because * is a special character in regular expressions and needs to be escaped. However, escaping the * using \ doesn't work either: In a regular expression, a term like [*] will match * literally, even though * is normally a special (meta) character. Thus: test-c(1*1,222,*33,44*,555) test # [1] 1*1 222 *33 44* 555 ?grep grep([*],test) # [1] 1 3 4 test[grep([*],test)] # [1] 1*1 *33 44* This also works for other metacharacters, even (surprise!) ]: test2-c(1[1,222,[33,44],55]) grep([]],test2) # [1] 4 5 grep([[],test2) # [1] 1 3 However, your error was to use \* with only one \. You should have used \\*: grep(\\*,test) # [1] 1 3 4 and the \\ also works for other special characters, though \ itself can be tricky since to enter a \ into a string you have to type it as \\, and R will print it back as \\ as well: test3-c(1\\1,222,\\33,44\\,55\\) test3 # [1] 1\\1 222 \\33 44\\ 55\\ nchar(test3) # [1] 3 3 3 3 3 so there really are only 3 characters in each element of test3. But to grep for \ you have to enter \\ twice over: grep(\\,test3) # Error in grep(pattern, x, ignore.case, extended, value, fixed) : # invalid regular expression grep(\\\,test3) # Error: syntax error grep(,test3) # [1] 1 3 4 5 And, just as [*] catches *, so [\\] is needed to catch \; grep([\\],test3) # [1] 1 3 4 5 Hoping this helps! Ted. if(grep(\*, model)0) #does the model have an interaction { do something... } produces the following error message: Error in grep(*, model) : invalid regular expression '*' In addition: Warning messages: 1: '\*' is an unrecognized escape in a character string 2: unrecognized escape removed from \* 3: In grep(*, model) : regcomp error: 'Invalid preceding regular expression' Execution halted Any ideas anyone? Robin -- View this message in context: http://www.nabble.com/grep-for-asterisks-%22*%22%27s-tp20644195p20644195 .html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 23-Nov-08 Time: 09:29:03 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 2^2 factorial design question
On 07-Nov-08 00:53:56, Edna Bell wrote: Dear R Gurus: How do you put together a 2^2 (or even 2^k) factorial problem, please? I'm not clear about what you mean here, especially about put together! Can you describe what you want to end up with in more detail? Since you have 2 levels for A and B, do you put in A+ and A- as factors, please? In any case, I would not use the names A+ and A-, since this provides too many opportunities for expressions involving these to be interpreted as arithmetic operations. Better, and safer, to use names A1 and A0, say. Best wishes, Ted. Thanks, Edna Bell E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 07-Nov-08 Time: 09:51:39 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] count data with some conditions
On 01-Nov-08 02:51:37, David Winsemius wrote: Do you want the count of remaining elements which are strictly greater than the first element? length(which(a[1] a[2:10])) [1] 4 or perhaps a bit more deviously: sum( a[1]a[2:10]+0 ) #adding 0 to TRUE or FALSE creates 1 or 0. [1] 4 No need to be devious! Simply sum(a[1] a[2:10]) # [1] 4 will do it. The reason is that when TRUE or FALSE are involved in an arithmetic operation (which sum() is), they are cast into 1 or 0. Ted. On Oct 31, 2008, at 7:56 PM, sandsky wrote: Hi there, I have a data set: a=cbind(5,2,4,7,8,3,4,11,1,20) I want to count # of data, satistfying a[1]a[2:10]. Anyone helps me solving this case? Thank you in advance, Jin E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 01-Nov-08 Time: 07:30:17 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How R calculate p-value ?
On 01-Nov-08 20:59:29, RON70 wrote: Still no reply. Is my question not understandable at all? RON70 wrote: I am wondering how R calculate p-value for a test. Does R do some Approximate integration on the p.d.f of null distribution? How I can see the code for this particular calculation? Your help will be highly appreciated. Regards, How R calculates a P-value depends on the analysis being done, but it will use a standard method. For example, a t-test (standard frm, not Welch)will assume a normal diretribution for the deviations from the mean (or means, for a two-sample test, with the same variance for each group); will calculate the T-statistic, and will calculate the exact (to within numerical accuracy) for this T-value from the distribution of t on the correct number of degrees of freedom. In the case of fitting a linear model with normally distributed errors (using lm()), the P-value for each coefficient will be based on the z value (value of coefficient divided by the standard error), referred to a t distribution with appropriate degrees of freedom. Again, an exact P-value. Similarly, for an analysis of variance, the F-statistic will be calculated in the standard way, and the P-value will be obtained by reference to the F distribution with appropriate degrees of freedom. Again exact, to within numerical accuracy. For some other kinds of analysis, standard (often large-sample chi-squared approximations, in some cases to the distribution of a likelihood ratio) methods will be used, and the resulting P-value will be approximate to within th accuracy of the approximating distribution. This kind of thing occurs for example in fitting generalised linear models using glm(). These P-values are approximate, but the approximations are standard. There is as much variety in all this as in the variety of standard methods for obtaining a test statistic and in using tractable approximations to the distribution of the test statistic. Hoping this helps, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 01-Nov-08 Time: 21:56:14 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R random number generation
On 23-Oct-08 19:58:12, hiphop wrote: i have to generate random numbers from length 2 to 30 and length two should have the numbers 1,2 and length 3 should have the numbers 1,2,3 so on and so forth till size 30. i should get an sequence like 221 or 112 till both the values appear and for 3 it should be 123 or 3331112. i should get similar output for all sizes which means i should get 30 sequences.but am getting only one sequence. please help. this is my code y=0; for (i in 1:3) { while(y!=i) { y=sample(1:3,replace=TRUE); } } If I understand you correctly, the following should do what you are looking for (i.e. sample repeatedly from (1:k) until you first have all of 1:k in the sample, repetitions being allowed): mksample - function(k){ All - (1:k) ; Left - rep(TRUE,k) ; Done - FALSE Result - NULL while(! Done){ j - sample(All,1) Result - c(Result, j); Left[j] - FALSE Done - (sum(Left)==0) } Result } Example: mksample(5) # [1] 1 4 4 5 2 1 1 3 I'm not clear about why, in your illustrations (such as 3331112) you presented it as a sequence of runs (333, 111,2) -- maybe this is not relevant to your query. Hoping this helps, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 23-Oct-08 Time: 21:51:04 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculating confidence limits for the difference between
On 22-Oct-08 17:09:27, Dennis Fisher wrote: Colleagues, I am working on a problem at the edge of my knowledge of statistics. Basically, I have values for two groups - I am trying to calculate the 90% confidence limits for the difference between means. I can implement this without difficulty based on standard equations that are available in stat textbooks: (difference between means) / (pooled standard error) My problem arises when I try to correct for the value of a covariate. I can do the ANOVA with the covariate as a factor. But, I don't know how to use the output to obtain the confidence limits of the difference. I suspect that several participants in this board have implemented code to so this. I hope that someone is willing to share the code. Mostly, we haven't, though a few of us once did (in coding lm() and t.test()). Assuming that you will adopt equal variances for the two groups (as in standard ANOVA), you can approach this in at least two different ways. Both illustrated below, for comparison. ## Make up some data, and a Group indicator set.seed(213243) X1 - rnorm(15) ; X2 - 0.1+rnorm(25) ; X - c(X1,X2) Group - factor(c(rep(0,length(X1)),rep(1,length(X2 mean(X1) # [1] 0.1108255 mean(X2) # [1] 0.1002999 mean(X2)-mean(X1) # [1] -0.01052560 summary(lm(X ~ Group))$coef #Estimate Std. Error t value Pr(|t|) # (Intercept) 0.11082552 0.2543404 0.43573696 0.6654929 # Group1 -0.01052560 0.3217180 -0.03271686 0.9740716 ## Note that the Group coefficient is the difference of means here Diff-summary(lm(X ~ Group))$coef[2,1] ## -0.01052560 SE -summary(lm(X ~ Group))$coef[2,2] ## 0.3217180 Diff + qt(0.975,38)*c(-1,1)*SE ## d.f. = (15-1)+(25-1) # [1] -0.6618097 0.6407585 ## 95% confidence interval calculated from output of lm() above t.test(X2,X1,var.equal=TRUE) # Two Sample t-test # data: X2 and X1 # t = -0.0327, df = 38, p-value = 0.974 # alternative hypothesis: true difference in means is not equal to 0 # 95 percent confidence interval: # -0.6618097 0.6407585 # sample estimates: # mean of x mean of y # 0.1002999 0.1108255 ## The 95% CI fron t.test() is the same as previously calulculated ## from the ouput of lm(). But t.test() does not directly output ## the difference of means. Hoping this helps Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 22-Oct-08 Time: 19:05:47 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] card games
On 22-Oct-08 23:04:34, Barry Rowlingson wrote: 2008/10/22 Erin Hodgess [EMAIL PROTECTED]: This is for programming purposes. I wanted my students to do some games, but I wanted to see if there were items already out there. A quick googling (oooh, if ONLY R was called something more googly!) found Duncan Murdoch's poker package: http://www.stats.uwo.ca/faculty/murdoch/repos/html/pokerv0.0.html v0.0 could give your students something to start on. It has some card handling and shuffling packages as well as hand evaluations. Here's my poker calculator: action = function(hand, bet, cash){ if(rubbish(hand)){ drink() ; drink() if(drunk()) { return(bluff) }else{ return(fold) } } else { drink() bet = runif(1,min,max) return(bet) } } It's a pretty accurate simulation. *hic* Barry *** Segmentation fault - virtuous memory exceeded E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 23-Oct-08 Time: 00:21:27 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ? extended rep()
Hi Folks, I'm wondering if there's a compact way to achieve the following. The dream is that, by analogy with rep(c(0,1),times=c(3,4)) # [1] 0 0 0 1 1 1 1 one could write rep(c(0,1),times=c(3,4,5,6)) which would produce # [1] 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 in effect recycling x through 'times'. The objective is to produce a vector of alternating runs of 0s and 1s, with the lengths of the runs supplied as a vector. Indeed, more generally, something like rep(c(0,1,2), times=c(1,2,3,2,3,4)) # [1] 0 1 1 2 2 2 0 0 1 1 1 2 2 2 2 Suggestions appreciated! With thanks, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 20-Oct-08 Time: 21:57:15 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ? extended rep()
On 20-Oct-08 21:19:21, Stefan Evert wrote: On 20 Oct 2008, at 22:57, (Ted Harding) wrote: I'm wondering if there's a compact way to achieve the following. The dream is that one could write rep(c(0,1),times=c(3,4,5,6)) which would produce # [1] 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 in effect recycling x through 'times'. rep2 - function (x, times) rep(rep(x, length.out=length(times)), times) rep2(c(0,1),times=c(3,4,5,6)) [1] 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 Any prizes for shortest solution? ;-) Best, Stefan If ever we are both within reach of 'en øl', then yes. But Gabor came up with a shorter one. I tried to shorten Gabor's but failed. However, all competitors are entitled to a consolation prize! (And that includes me ... ) Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 20-Oct-08 Time: 22:59:16 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ? extended rep()
On 20-Oct-08 21:17:22, Gabor Grothendieck wrote: Try this: with(data.frame(x = 0:1, times = 3:6), rep(x, times)) or even shorter: do.call(rep, data.frame(x = 0:1, times = 3:6)) That is sneaky! data.frame(x = 0:1, times = 3:6) # x times # 1 0 3 # 2 1 4 # 3 0 5 # 4 1 6 (Which is why it won't work with list(x=0:1,times=3:6)) Ted. On Mon, Oct 20, 2008 at 4:57 PM, Ted Harding [EMAIL PROTECTED] wrote: Hi Folks, I'm wondering if there's a compact way to achieve the following. The dream is that, by analogy with rep(c(0,1),times=c(3,4)) # [1] 0 0 0 1 1 1 1 one could write rep(c(0,1),times=c(3,4,5,6)) which would produce # [1] 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 in effect recycling x through 'times'. The objective is to produce a vector of alternating runs of 0s and 1s, with the lengths of the runs supplied as a vector. Indeed, more generally, something like rep(c(0,1,2), times=c(1,2,3,2,3,4)) # [1] 0 1 1 2 2 2 0 0 1 1 1 2 2 2 2 Suggestions appreciated! With thanks, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 20-Oct-08 Time: 21:57:15 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 20-Oct-08 Time: 23:02:00 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R plot
On 17-Oct-08 09:01:08, Benoit Boulinguiez wrote: Hi, Personally I always use xlim and ylim with the plot or points function like that: plot( X,Y,pch=16,col=2,cex.axis=1.5,cex.lab=1.5, xlim=c(0,1.05*max(X)),ylim=c(0,1.05*max(Y)) ) Regards/Cordialement Benoit Boulinguiez I think (from his original post) that Haoda already knows about the use of xlim and ylim. What he finds annoying is keeping track of what they should be! Here, I'm afraid, I am inclined to agree with him. For example, if you want to plot say 10 time series, with different time-ranges and different value-ranges, all on the one graph, and they have to be obtained separately (even by reading in from 10 different data files), then the only way I have found is to wait until all the objects are available, then compute the min and max of the x-range and the y-range of each, and finally base xlim on the min of the min x-ranges and the max of the max x-ranges (and similarly for ylim). Of course you could alternatively do this cumulatively as you go along, and even build the process into a function. But it is a lot of admin along the way, and it can get complicated. This sort of thing has caused me a lot of extra work on many occasions, which would have been unnecessary if plots could re-size themselves when asked to go outside existing ranges. I grin and bear it, because that's how things work; but I have to admit that I don't like it! Best wishes to all, Ted. -Message d'origine- De : [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] De la part de Wacek Kusnierczyk Envoyé : vendredi 17 octobre 2008 10:47 À : Haoda Fu Cc : R help Objet : Re: [R] R plot Haoda Fu wrote: All - When I plot something like a-rnorm(5) b-rnorm(5) plot(a,b,col = red) points(10,-10) The last point is missing because it is out of range of the first plot. I just try to switch from Matlab to R. In Matlab, it always can automatic adjust the xlim and ylim for such case. Is it possible auto adjust in R? Otherwise keep tracking xlim and ylim is really annoying. if you know the range in advance, you can specify it using the xlim and ylim parameters to plot. you can also use them in points (it doesn't cause an error), but it does not seem to have the desired effect of reshaping the plot. it's perhaps a pity it works this way, but you have to get used to it. or drop r if you find matlab better. vQ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 17-Oct-08 Time: 10:31:03 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting 2D Logistic regression
On 17-Oct-08 17:59:55, Feldman, Ruben wrote: Hi, My data has time series for different variables and I want to predict ctw with the value of each other variable at that point in the series. I have run a logistic regression: logreg - glm(ctw ~ age + OFICO + ... + CCombLTV, data=mydata, family=binomial(logit)) And I am trying to get a plot of the logistic regression I have just performed, on the same plot as the actual time series. Apparently lin.reg can only deal with linear regressions, from the error I am getting. Does plot(mydata$age, logreg$fit) do something like what you are looking for? Ted. Do you know what function I should be using and where I can get it? Thanks so much! RF E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 17-Oct-08 Time: 22:20:31 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Add notes to sink output
On 13-Oct-08 20:02:20, Michael Just wrote: Hello, How can I add notes (i.e. text) to a sink output? sink(test.txt) #This text will describe the test summary(x) sink() How can I add that text above to the sink output? Thanks, Michael Anything on the lines of: sink(test.txt) cat(This text will describe the test\n) cat(\n) summary(x) sink() E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 13-Oct-08 Time: 21:12:54 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conditionally skip over for(..){ and }??
On 12-Oct-08 11:32:31, Barry Rowlingson wrote: 2008/10/12 Ted Harding [EMAIL PROTECTED]: Hi Folks, I'm wondering if there's a secret trick to achieve the following. I have some big code for analysis related to a named variable, which will be one of several in the columns of a dataframe, and I would like to be able to choose between a run for just one of these variables, or a run which loops over them all. So, for a single run, I could have the following kind of thing in a file code.R to be sourced by source(code.R): # VarName - Var1 VarName - Var2 # VarName - Var3 # VarName - Var4 ### CUT OUT LOOP # VarNames - c(Var1,Var2,Var3,Var4) # for( VarName in VarNames ) { Lots of code related to analysis of variable VarName ### CUT OUT END OF LOOP # } which would do the single case for Var2. A run for a different VarName would be done by editing the first block to select the new VarName. Now, when I want to run the loop over all VarNames, I could of course likewise edit the file so as to uncomment the code which follows the CUT OUT ... lines. But I would prefer to avoid having to do the latter, and am therefore wondering if there is some way to conditionally skip over those bits of code. What's the problem with doing a loop over a single VarName in VarNames? I'd put something like: VarNames = c( Var1, Var2, Var3 ) at the start of my code for easy editing, and then if I only wanted to do it for one VarName you can just do: VarNames = c( #Var1, Var2 #Var3 ) - it's just a shame that R doesn't like trailing commas in c() calls, which would make it even easier. Of course the real way to do it is to rewrite it as a function and do foo(c(Var1,Var2)) as desired Barry Now why didn't that occur to me? Must do something about the hypocaffeinaemia. Many thanks, Baz, and also to Berwin Turlach for a similar suggestion! Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 12-Oct-08 Time: 13:02:37 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conditionally skip over for(..){ and }??
Following up on Barry's suggestion led on to a more straightforward approach: VarNames - NULL # VarNames - c(VarNames, Var1) # VarNames - c(VarNames, Var2) # VarNames - c(VarNames, Var3) # [...] for(i in (1:length(VarNames))){ VarName - VarNames[i] [...] Then any one or some of those lines can be uncommented. With thanks for the suggestions. Ted. On 12-Oct-08 11:32:31, Barry Rowlingson wrote: 2008/10/12 Ted Harding [EMAIL PROTECTED]: Hi Folks, I'm wondering if there's a secret trick to achieve the following. I have some big code for analysis related to a named variable, which will be one of several in the columns of a dataframe, and I would like to be able to choose between a run for just one of these variables, or a run which loops over them all. So, for a single run, I could have the following kind of thing in a file code.R to be sourced by source(code.R): # VarName - Var1 VarName - Var2 # VarName - Var3 # VarName - Var4 ### CUT OUT LOOP # VarNames - c(Var1,Var2,Var3,Var4) # for( VarName in VarNames ) { Lots of code related to analysis of variable VarName ### CUT OUT END OF LOOP # } which would do the single case for Var2. A run for a different VarName would be done by editing the first block to select the new VarName. Now, when I want to run the loop over all VarNames, I could of course likewise edit the file so as to uncomment the code which follows the CUT OUT ... lines. But I would prefer to avoid having to do the latter, and am therefore wondering if there is some way to conditionally skip over those bits of code. What's the problem with doing a loop over a single VarName in VarNames? I'd put something like: VarNames = c( Var1, Var2, Var3 ) at the start of my code for easy editing, and then if I only wanted to do it for one VarName you can just do: VarNames = c( #Var1, Var2 #Var3 ) - it's just a shame that R doesn't like trailing commas in c() calls, which would make it even easier. Of course the real way to do it is to rewrite it as a function and do foo(c(Var1,Var2)) as desired Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 12-Oct-08 Time: 16:00:14 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R vs SPSS contrasts
Very many thanks, Chuck and Gabor, for the comments and the references to on-line explanations. It is beginning to become clear! Most grateful. Ted. On 12-Oct-08 18:03:53, Gabor Grothendieck wrote: I found this link: http://webs.edinboro.edu/EDocs/SPSS/SPSS%20Regression%20Models%2013.0.pd f which indicates that the contrast in SPSS that is used depends not only on the contrast selected but also on the reference category selected and the two can be chosen independently. Thus one could have simple/first, simple/last, deviation/first, deviation/last, etc. An R contr.SPSS function would have to specify both the deviation type and the first/last in order to handle all SPSS variations. On Sun, Oct 12, 2008 at 1:48 PM, Gabor Grothendieck [EMAIL PROTECTED] wrote: The formula should be (diag(n) - 1/n)[, -n] On Sun, Oct 12, 2008 at 1:36 PM, Gabor Grothendieck [EMAIL PROTECTED] wrote: Looks like the contrast matrix for indicator is contr.SAS(n), for deviation is contr.sum(n) and for simple is: (diag(n) - 1/n)[, -1] That works at least for the n = 3 example in the link. Perhaps the others could be checked against SPSS for a variety of values of n to be sure. On Sun, Oct 12, 2008 at 12:32 PM, Chuck Cleland [EMAIL PROTECTED] wrote: On 10/11/2008 3:31 PM, Ted Harding wrote: Hi Folks, I'm comparing some output from R with output from SPSS. The coefficients of the independent variables (which are all factors, each at 2 levels) are identical. However, R's Intercept (using default contr.treatment) differs from SPSS's 'constant'. It seems that the contrasts were set in SPSS using /CONTRAST (varname)=Simple(1) I can get R's Intercept to match SPSS's 'constant' if I use contr.sum in R. Can someone please confirm that that is a correct match for the SPSS Simple(1), with identical effect? And is there a convenient on-line reference where I can look up what SPSS's /CONTRAST statements exactly mean? I've done a lot of googling, withbout coming up with anything satisfactory. With thanks, Ted. Hi Ted: Here are two links with the same content giving a brief description of SPSS simple contrasts: http://www.ats.ucla.edu/stat/spss/library/contrast.htm http://support.spss.com/productsext/spss/documentation/statistics/art icles/contrast.htm These pages explain how simple contrasts differ from indicator (contr.treatment) and deviation (contr.sum) contrasts. For a factor with 3 levels, I believe you can reproduce SPSS simple contrasts (with the first category as reference) like this: C(warpbreaks$tension, contr=matrix(c(-1/3,2/3,-1/3,-1/3,-1/3,2/3), ncol=2)) ... attr(,contrasts) [,1] [,2] L -0.333 -0.333 M 0.667 -0.333 H -0.333 0.667 Levels: L M H For a factor with 2 levels, like this: C(warpbreaks$wool, contr=matrix(c(-1/2,1/2), ncol=1)) ... attr(,contrasts) [,1] A -0.5 B 0.5 Levels: A B Your description of the effect of SPSS simple contrasts - intercept coefficient of contr.sum and non-intercept coefficients of contr.treatment - sounds accurate to me. hope this helps, Chuck E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 11-Oct-08 Time: 20:31:53 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Chuck Cleland, Ph.D. NDRI, Inc. (www.ndri.org) 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 12-Oct-08 Time: 21:04:48 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained
[R] png(): Linux vs Windows
Hi Folks, Quick question. I have the following line in an R code file which runs fine on Linux: if(PNG) png(GraphName,width=12,height=15,units=cm,res=200) I learn that, when the same code was run on a Windows machine, there was the following error: Error in png(GraphName,width=12,height=15,units=cm,res=200): unused argument(s) (units = cm) Sorry to be a bother -- but could a Windows Ruser put me wise on any differences between png() on Windows and Linux? (And, sorry, I don't know what version of R, nor what version of Windows, this occurred on). Thanks, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 12-Oct-08 Time: 23:02:41 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Correlation among correlation matrices cor() - Interpret
On 10-Oct-08 08:07:34, Michael Just wrote: Hello, If I have two correlation matrices (e.g. one for each of two treatments) and then perform cor() on those two correlation matrices is this third correlation matrix interpreted as the correlation between the two treatments? In my sample below I would interpret that the treatments are 0.28 correlated. Is this correct? var1- c(.0008, .09, .1234, .5670008, .00110011002200, 8, 9, 12.34, 56.7, 1015245698745859) var2- c(11, 222, 333, , 5, 1.1, 2.2, 3.3, 4.4, .55) var3 -c(22.2, 66.7, 99.9, 108, 123123, .1, .2, .3, .4, .5) var4- c(.01,.1,.0001, .001, .1, .12345, .56789, .67890, .78901, .89012) dat - cbind(var1,var2,var3,var4) dat.d - data.frame(dat) treatment1 - dat.d[1:5,] treatment2 -dat.d[6:10,] t1.d.cor - cor(treatment1) t2.d.cor - cor(treatment2) I -lower.tri(t1.d.cor) t1.t2 - cor(cbind(T1 = t1.d.cor[I], T2 = t2.d.cor[I])) t1.t2 T1T2 T1 1.000 0.2802750 T2 0.2802750 1.000 My code may be unpolished, Thanks, Michael You cannot conclude that, because corresponding elements of two correlation matrices are correlated, the two sets of variables that they were derived from are correlated (between the two sets). Here (in R) is an example of the contrary. One set of 4 variables (X1, X2, X3, X4) is such that there are correlations (by construction) between X1, X2, X3, X4. The other set of four variables Y1, Y2, Y3, Y4 is constructed in the same way. But the four variables (X1,X2,X3,X4) are independent of the four variables (Y1,Y2,Y3,Y4) and hence the two sets are not correlated. Nevertheless, the correlation matrix (analagous to your t1.t2) between the corresponding elements of the two correlation matrices is in general quite high. N - 500 ; Cors - numeric(N) for(i in (1:500)){ X1-rnorm(10);X2-rnorm(10);X3-rnorm(10);X4-rnorm(10) V1-X1; V2-X1+X2; V3-X1+X2+X3; V4-X1+X2+X3+X4 D-cor(cbind(V1,V2,V3,V4)) Y1-rnorm(10);Y2-rnorm(10);Y3-rnorm(10);Y4-rnorm(10) W1-Y1; W2-Y1+Y2; W3-Y1+Y2+Y3; W4-Y1+Y2+Y3+Y4 E-cor(cbind(W1,W2,W3,W4)) I - lower.tri(D) D.E - cor(cbind(D=D[I],E=E[I])) Cors[i] - D.E[2,1] } hist(Cors) In any case, with the distribution of values exhibited by some of your variables, even the primary correlation matrices are of very questionable interpretability. Indeed, I wonder, if these are real data, what on Earth (literally) they are data on? The ratio of the largest to the smallest of, for instance, your var1 is (approx) 10^26. A cup of small cup of Espresso coffee is about 60 grams (2 ounces). 60 gm * 10^26 = 6*(10^27) gm = 6*(10^24) kg, which is the mass of the Earth. Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 10-Oct-08 Time: 11:40:50 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fw: MLE
On 08-Oct-08 11:14:39, Ron Michael wrote: I made one typo in my previous mail. May I ask one statistics related question please? I have one query on MLE itself. It's property says that, for large sample size it is normally distributed [i.e. asymptotically normal]. On the other hand it is Consistent as well. My doubt is, how this two asymptotic properties exist simultaneously? If it is consistent then asymptotically it should collapse to truth i.e. for large sample size, variance of MLE should be zero. However asymptotic normality says, MLE have some distribution and hence variance. Can anyone please clarify me? Your help will be highly appreciated. The false step in your argument is in the following: If it is consistent then asymptotically it should collapse to truth i.e. for large sample size, variance of MLE should be zero. The first part would better expressed as: If it is consistent then asymptotically it should collapse *towards* truth and indeed that is pretty well the definition of consistent. More precisely: 1. Decide how close you want the MLE to be to the true value. (Say, for example, that this is 0.0001). You're not allowed to choose spot on (i.e. zero). 2. Decide how sure you want to be that it is that close (Say, for example, that you want to be 99.999% sure). You're not allowed to choose 100%. 3. Then you can find a sample size N (which may be very large, but you are being asymptotic so you can take as much as you need) such that, if the sample size it as least N, then Probability(|MLE - Truth| 0.0001) 0.9 N, of course, depends on the numbers you chose in (1) and (2). Not that this does NOT say, anywhere, that the distribution of the MLE has, for such an N, collapsed strictly to truth, i.e. that the variance is zero. All that is implied is that the variance is very small, sufficiently small for (3) to be true. And that is all that consistency is saying: That, for large enough N, you can be as sure as you wish (via variance as small as you need) that the MLE is at least as close as you wish to the true value. Consistency is not saying more than that. Therefore the second part of your statement: i.e. for large sample size, variance of MLE should be zero. is not true: you don't attain zero for any large sample size; you can only get very close. (Except in certain very special cases -- e.g. sampling 100 different items out of a population of 100 items, i.e. without replacement, will give you exactly the value of some quantity calculated on that population). Hoping that helps! Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 08-Oct-08 Time: 13:14:27 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using grep
On 08-Oct-08 15:19:02, mentor_ wrote: Hi, I have a vector A with (200, 201, 202, 203, 204, ... 210) and a vector B with (201, 204, 209). Now I would like to get the position in vector A matches with the entries in vector B So what I want to have is the following result: [1] 2 5 10 First of all: A - (200:210) B-c(201, 204, 209) A # [1] 200 201 202 203 204 205 206 207 208 209 210 B # [1] 201 204 209 which(A %in% B) # [1] 2 5 10 as desired. I tried the following: grep(B, A) grep(c(B), A) A - as.character(A) B - as.character(B) grep(B, A) grep(c(B), A) and several other combinations. But nothing is giving me the right result?! Does anyone know why? In grep(pattern,x,...): pattern: character string containing a regular expression (or character string for 'fixed = TRUE') to be matched in the given character vector. Coerced by 'as.character' to a character string if possible. x, text: a character vector where matches are sought, or an object which can be coerced by 'as.character' to a character vector. you can clearly have 'x' as a vector of character strings, so your as.character(A) is valid for 'x'. But as.character(B) is neither a regular expression nor a character string -- it is a vector of character strings. So it is not valid for 'pattern'. What seems to happen here is that grep() takes the first element (201) of as.character(B), and uses this as the regular expression (or character string): grep(A,c(ABC,BCD,CDE,EAB)) # [1] 1 4 # (as expected) grep(c(A,B),c(ABC,BCD,CDE,EAB)) # [1] 1 4 # (the same as the previous; B in c(A,B) is ignored) Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 08-Oct-08 Time: 17:43:03 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R seven years ago
On 08-Oct-08 18:00:27, Liviu Andronic wrote: Hello everyone, As some may know, today Google unveiled its 2001 search index [1]. I was curious to see how was R like at that time, and was not disappointed. Compared to today's main page [2], seven years ago the page looked [3] a bit rudimentary, especially the graphic. (It is wort noting that structurally the pages are very similar.) What definitely changed is the `Contributed packages' section. Then R featured 29 contributed packages [4], while now it features 1500+ [5]. It was surprising to realize the growth of R during the past seven years. Regards, Liviu [1] http://www.google.com/search2001.html [2] http://www.r-project.org/ [3] http://web.archive.org/web/20010722202756/www.r-project.org/ [4] http://web.archive.org/web/20010525004023/cran.r-project.org/bin/macos/c ontrib/src/ [5] http://cran.at.r-project.org/web/packages/ Many thanks for this, Liviu! One might also compare the mailing list usage: [R-help 1997]: 484 messages [R-help 2001]: 4309 messages [R-help 2007]: 26250 1721+1909+2196+2145+2210+2309+ 2142+2246+2028+2711+2602+2031 So we now get more posts in a week than we did in the whole of 1997! Best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 08-Oct-08 Time: 19:34:28 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Statistically significant in linear and non-linear model
On 07-Oct-08 17:46:52, Hsiao-nan Cheung wrote: Hi, I have a question to ask. if in a linear regression model, the independent variables are not statistically significant, is it necessary to test these variables in a non-linear model? Since most of non-linear form of a variable can be represented to a linear combination using Taylor's theorem, That depends on the coefficients in the Taylor's series expansion. It is quite possible to have the linear coefficient zero, and the quadratic coefficient non-zero. so I wonder whether the non-linear form is also not statistically significant in such a situation. Best Regards Hsiao-nan Cheung 2008/10/08 Example: X - 0.2*((-10):10) Y - 0.5*(X^2) + 0.2*rnorm(21) X2 - X^2 [A] Linear regression, Y on X: summary(lm(Y ~ X))$coef # Estimate Std. Error t value Pr(|t|) # (Intercept) 0.72840442 0.1554215 4.6866382 0.0001606966 # X 0.06570652 0.1283351 0.5119919 0.6145564688 So the coefficient of X is not significant. [B] Quadratic regression, Y on X and X^2: summary(lm(Y ~ X + X2))$coef #Estimate Std. Error t value Pr(|t|) # (Intercept) 0.003425041 0.07203265 0.04754846 9.625997e-01 # X 0.065706524 0.03957727 1.66020864 1.141924e-01 # X2 0.494304121 0.03666239 13.48259513 7.570563e-11 So the coefficient of X is still not significant (P = 0.14), but the coefficient of X^2 is *highly* significant! So it all depends ... of course the original coefficients (Taylor) could be anything. Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 07-Oct-08 Time: 19:16:04 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sexy Little Number :-)-O
On 05-Oct-08 10:26:29, Dr Eberhard W Lisse wrote: Got me the Asus EEE PC 1000 (the one with the 8 and 32 GIG solid state drives, and added a 4GiG SD for the swap space. Will add another Gig of RAM for a total of 2). Threw the old (Xandros) Linux off and the EEE specific Ubuntu 8.04.1 onto it. Got an atom Intel processor which apparently has two cores. Quite impressive... Does anyone know, off hand, how I set up the CRAN repository so it updates from the latest packages (2.7.2)? What happenend to the South African mirror by the way? greetings, el I'll be interested in responses too! (Though mine is only the EEE PC 900, still a nice little thing). And in expereinces of replacing the Xandros with the Ubuntu. Meanwhile, I was tickled by your subject line, so could not resist: sexyNo-function(){ for(i in (1:20)){ S-paste(makeNstr( ,i),0,makeNstr( ,(40-2*i)),--1\n,sep=) cat(S); Sys.sleep(1) } cat(paste(makeNstr( ,21),0~1\n,sep=)); Sys.sleep(2) for(i in (1:9)){ cat(paste(makeNstr( ,22),|\n,sep=)); Sys.sleep(2) } cat( ***!!! 0.1 !!!***\n) } sexyNo() :-) Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 05-Oct-08 Time: 13:18:10 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [OOPS!]Sexy Little Number :-)-O
[OOPS! By oversight I perpetrated a show-stopper! (Overlooked that I already had Hmisc loaded). Corrected below] On 05-Oct-08 12:18:13, Ted Harding wrote: On 05-Oct-08 10:26:29, Dr Eberhard W Lisse wrote: Got me the Asus EEE PC 1000 (the one with the 8 and 32 GIG solid state drives, and added a 4GiG SD for the swap space. Will add another Gig of RAM for a total of 2). Threw the old (Xandros) Linux off and the EEE specific Ubuntu 8.04.1 onto it. Got an atom Intel processor which apparently has two cores. Quite impressive... Does anyone know, off hand, how I set up the CRAN repository so it updates from the latest packages (2.7.2)? What happenend to the South African mirror by the way? greetings, el I'll be interested in responses too! (Though mine is only the EEE PC 900, still a nice little thing). And in expereinces of replacing the Xandros with the Ubuntu. Meanwhile, I was tickled by your subject line, so could not resist: library(Hmisc) sexyNo-function(){ for(i in (1:20)){ S-paste(makeNstr( ,i),0,makeNstr( ,(40-2*i)),--1\n,sep=) cat(S); Sys.sleep(1) } cat(paste(makeNstr( ,21),0~1\n,sep=)); Sys.sleep(2) for(i in (1:9)){ cat(paste(makeNstr( ,22),|\n,sep=)); Sys.sleep(2) } cat( ***!!! 0.1 !!!***\n) } sexyNo() :-) Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 05-Oct-08 Time: 13:49:49 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sample mean in R
On 05-Oct-08 20:00:00, dilian wrote: I am having issues with the following: (muhat = 1/n^2(sum of all the xi's) ) essentially if xbar = the sample mean, muhat = sample mean but square the n. Question: Use R to run a Monte Carlo simulation which compares the finite-sample performance of xbar and muhat. Specifically generate 1000 samples n=30 from a standard normal distribution. For each sample calculate xbar and muhat. I have no problem calculating the mean of the xbar's - however I cannot figure out how to set up the muhat variable and find the means. My code is as follows: # R code starts here rm(list=ls()) set.seed(100) n-30 s-1000 xbar-rep(0,s) muhat-rep(0,s) for (i in 1:s) { x-rnorm(0,n=10) xbar[i]-mean(x) muhat[i]-mean(x^(-1/2)) } The line muhat[i]-mean(x^(-1/2)) is anomalous -- in more than one way! [1] It does not match up with your stated definition of muhat (there is no x^(-1/2) there); [2] x^(-1/2) is going to give a bad result for negative values of x anyway (as will be the case with your rnorm(0,n=10)). To achieve what you defined as muhat, surely muhat[i] - mean(x)/n (where n - length(x) somewhere, or simply n - 10). But in any case I am wondering why you are interested in that muhat = 1/n^2(sum of all the xi's) definition of muhat. Part of your message seems to be going into one ear, and part into my other; when they meet in the middle, they compare notes and being to wonder if you are getting Mean mixed up with Standard Error (SE^2 = var(x)/n). Hmmm. Hoping this helps, Ted. cat(Estimated mean of xbar:,mean(xbar),\n) cat(Estimated mean of muhat:,mean(muhat),\n) Any help would be greatly appreciated. -- View this message in context: http://www.nabble.com/Sample-mean-in-R-tp19828546p19828546.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 05-Oct-08 Time: 21:37:21 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What is the meaning of segfault 'memory not mapped' ?
One thing I have often done with code (including C and R) that falls through the ice unexpectedly is to plant (as a temporary measure) debug prints which emit information about where they are in the program, how many times they have gone round a particular loop, values of any potentially suspect variables, etc. Then you can look at the tail end of what they print and begin to localise the place where things go wrong. Having approximately located it, one can then replace the debug prints with more narrowly focussed ones. Often, just a few at a time will work fine. One can often wrap such a print into a function. Exactly how it is best done depends on how the code is written, and on what seems to be going wrong. Hoping this helps, Ted. On 05-Oct-08 20:22:53, Thomas Lumley wrote: On Fri, 3 Oct 2008, Ubuntu.Diego wrote: I'm trying to get some easy coding to reproduce the error. In the meantime I have R code that run for 20 or more hours and suddenly i got a segfault 'memory not mapped' error. I have to clarify that the error not alway occurs and sometimes the process end satisfactorily. The process is basically a search using an MCMC strategy, sometimes the algorithm converge and stops others I got the error message. I was wondering if it could be meaning that I run out of RAM. The 'memory not mapped' error means that your code is reading from or writing to memory that it doesn't own. This should have nothing to do with running out of RAM. If this is pure R code you have found a bug in R. If you are calling your own C code it is more likely to be a bug there. If your C code uses PROTECT/UNPROTECT it is even more likely to be there. Usually the best way to track down these bugs is to run the code under Valgrind, but this is slow, which will be a problem for code that already takes many hours. If you are extremely lucky, the crash will have happened on the first incorrect memory access. Running R under a debugger such as gdb you can use backtrace after the crash occurs to find where the bug happened. Unfortunately it is quite likely that the crash happened because some earlier piece of code overwrote a stored variable or something similar. The fact that the segfault occurs only some of the time reinforces this, especially if you don't always get the crash even with the same sequence of random numbers. If you have a Linux computer that you can use for a week or so it would be worth running your code under Valgrind to see if it finds the problem. -thomas Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 05-Oct-08 Time: 21:46:27 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to get the day of the year
On 01-Oct-08 10:44:10, [EMAIL PROTECTED] wrote: I am new to R and I would like to get the day of the year from numerous data in the following form: %a %d/%b/%Y %H:%M:%S (for example from Tu 10/Jul/2007 21:59:13 I would like to get 191) Whatever I try, I get NAs. The problem is th 'Tu' bit. Abbreviated day-of-week names and three letters long. You'll have to replace/ delete them. datestr - Tu 10/Jul/2007 21:59:13 newdatestr - sub(Tu, Tue, datestr) dateval - strptime(newdatestr, %a %d/%b/%Y %H:%M:%S) dateval$yday#190 In addition to the above: BEWARE that strptime(...)$yday counts upwards from 0 to 364 (normal year) or 365 (leap year). In other words, 1 Jan is day 0. Hence Richard got the result 190 where you expected 191. So add 1 to get the answer everyone expects! Best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 01-Oct-08 Time: 12:24:12 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pattern match in R
On 30-Sep-08 14:36:04, bioinformatics_guy wrote: I want to make sure this piece of code I wrote is doing what I want it to do. ll-function(string) { grep(string,dir(),value=T) } subdir = ll(Coverage_[1-9][0-9]$) I basically wrote a little function that would grab all the files of form Coverage_[0-99] The way I wrote it, will it grab Coverage_5 or does it have to have 2 numbers (10-99)? I think you want Coverage_[1-9]*[0-9]$, since your form will only match Coverage_mn where m is one of 1-9 and n is one of 0-9. The * means zero or any number of ... . Example (command-line grep in Linux): grep 'Coverage_[1-9]*[0-9]$' EOT Coverage Coverage_5 Coverage_01 Coverage_19 Coverage_19q EOT Coverage_5 Coverage_19 Note that this does not catch Coverage_01 because [1-9] does not include 0. Use [0-9] here as well if you need this. Hoping this helps, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 30-Sep-08 Time: 16:43:01 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] quantile
On 29-Sep-08 20:09:14, liujb wrote: Hello, I need to assign a number to each x[i], i=1:100, based on the value of x[i] and where they are in the distribution of x[i]. For example 1 for x[4] means x[4] is below 25%. I can obtain the quantile using quantile command, and just loop through the 1:100 and assign the correct number. But I was just wondering whether there are a more efficient way. Thank you, sincerely, Julia Well, you can certainly do it with a much shorter loop! set.seed(31425) x - rnorm(13) x.v - numeric(length(x)) ix - order(x) Q - quantile(x) for(i in (5:2)){ x.v[x=Q[i]] - (i-1) } cbind(x,x.v, x[ix],x.v[ix]) x x.v [1,] -0.7565336 2 -1.7045077 1 [2,] -0.3287683 2 -1.0693801 1 [3,] -1.7045077 1 -1.0671752 1 [4,] 0.7259883 4 -0.9718954 1 [5,] 0.6174724 3 -0.7565336 2 [6,] -1.0693801 1 -0.3668566 2 [7,] 1.9826596 4 -0.3287683 2 [8,] -0.9718954 1 0.2491123 3 [9,] -1.0671752 1 0.4733287 3 [10,] -0.3668566 2 0.6174724 3 [11,] 0.2491123 3 0.7259883 4 [12,] 0.4733287 3 1.9826596 4 [13,] 2.2095536 4 2.2095536 4 Hoping this helps, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 29-Sep-08 Time: 22:33:33 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regression and data types
On 27-Sep-08 15:18:32, Gavin Simpson wrote: On Fri, 2008-09-26 at 13:17 +0100, GRANT Lewis wrote: Dear All I have three data sets, X1, X2 and Y. X1 is data, X2 and Y were generated in (different) R programs. All three vectors have one column of 60 data points. I am using the code lm(Y~X1)$coef and lm(Y~X2)$coef. Others have replied with an answer to your question. I just wanted to suggest you don't rummage around in R model objects taking what you like using '$'. 9 times out of 10 you'll get what you want, but that last remaining time will have you rubbing your head in confusion at best, at worst, answers may be just plain wrong. Use extractor functions, such as coef() instead: coef(lm(Y ~ X1)) G Surely, if you read (for example) in ?lm that: Value: [...] An object of class 'lm' is a list containing at least the following components: coefficients: a named vector of coefficients [...] then (subject to using enough of the name to give a unique partial matching, as is the case here) you should find that lm(...)$coef returns what ?lm says! Even with more complex model fitting, such as lme (in 'nlme') you should still get what ?lme -- ?lmeObject says: coefficients: a list with two components, 'fixed' and 'random', where the first is a vector containing the estimated fixed effects and the second is a list of matrices with the estimated random effects for each level of grouping. For each matrix in the 'random' list, the columns refer to the random effects and the rows to the groups. Since coef() is a gneric function, you might expect coef(lme(...)) to give the same; or even perhaps give a more easily interpreted presentation. But there's nothing wrong with lme(...)$coef provided you're fully aware of what it is (as explained in the 'help').. BUT: In the case of lme, if you run the example: fm1 - lme(distance ~ age, data = Orthodont) # random is ~ age then fm1$coef really does return a list with components: $fixed (Intercept) age 16.761 0.6601852 $random $random$Subject (Intercept) age M16 -0.1877576 -0.068853674 [...] F04 1.0691628 -0.029862143 F11 1.2176440 0.083191188 whereas if you do coef(fm1) you will get only: (Intercept) age M1616.57335 0.5913315 [...] F0417.83027 0.6303230 F1117.97876 0.7433764 so (a) not only do you NOT get the fixed effects part, but (b) what you might interpret as the random effects part has very different values from the $random component of fm1$coef! Well, there must be some explanation for this, mustn't there? But, in ?lmeObject, I find ONLY: Objects of this class have methods for the generic functions 'anova', 'coef', [...] so I can't find out from there what coef(fm1) does; and if I look in ?lme again I find only: The functions 'resid', 'coef', 'fitted', 'fixed.effects', and 'random.effects' can be used to extract some of its components. so that doesn't tell me much (and certainly not why I get the different results). And ?coef doesn't say anything specific either. That being the case, I would, myself, stick with fm1$coef. At least, then I know what I'm getting. With coef(fm1), I don't. Of course, summary(fm1) is informative in its own way; but that's a different matter; my point is that coef() doesn't necessarily give you what you might expect, while (...)$coef does -- since it is explained in full in ?lmeObject. This is, in fact, the opposite conclusion to that drawn by Gavin! Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 27-Sep-08 Time: 18:09:54 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.