Re: [R] 0 x 0 matrix
On 04-Sep-09 10:45:27, Markku Karhunen wrote: True. Should have read ?diag. However, this provokes a more general question: Is there some way I can declare some scalar and _all its functions_ as matrices? For instance, I would like to A = as.matrix(0.98) B = function(A) C = diag(sqrt(B)) so that all scalars are explicitly [1,1] matrices. BR, Markku Hmmm, it might be a good idea to explain why you want to do this. For instance: M - matrix(c(1,2,3,4),nrow=2) c - matrix(2,nrow=1) c%*%M # Error in c %*% M : non-conformable arguments c*M # Error in c * M : non-conformable arrays c+M # Error in c + M : non-conformable arrays So what would you want to use the [1,1]-matrix scalars for, that cannot be done just using them as numbers? Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 04-Sep-09 Time: 14:51:52 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Function for all 2^N subsets of N
Greetings all! I have been searching the Site for a function, say subsets, such that for instance subsets(10) would return a (say) matrix of indices to the 2^10 subsets of N items -- perhaps in the form of 2^10 rows each of which is 10 entries each either TRUE or FALSE. Or 1 or 0. Or ... I can of course write my own, using good old looping technology or similar, but it would be good to find one which did it quick and snappy, at the compiled level. A Site Search in Function on all subsets didn't seem to yield anything of the kind, which surprised me. Maybe I overlooked something ... (This is prompted by the recent OT discussion on HT vs. HH, to which I want to respond later). With thanks, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 01-Sep-09 Time: 09:09:09 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Function for all 2^N subsets of N
On 01-Sep-09 08:33:41, Gerrit Eichner wrote: Maybe expand.grid( rep( list( 0:1), 10)) does what you want. Best regards -- Gerrit Thanks! That does seem to do the job. I hadn't thought of expand.grid(). Ted. On Tue, 1 Sep 2009, ted.hard...@manchester.ac.uk wrote: Greetings all! I have been searching the Site for a function, say subsets, such that for instance subsets(10) would return a (say) matrix of indices to the 2^10 subsets of N items -- perhaps in the form of 2^10 rows each of which is 10 entries each either TRUE or FALSE. Or 1 or 0. Or ... I can of course write my own, using good old looping technology or similar, but it would be good to find one which did it quick and snappy, at the compiled level. A Site Search in Function on all subsets didn't seem to yield anything of the kind, which surprised me. Maybe I overlooked something ... (This is prompted by the recent OT discussion on HT vs. HH, to which I want to respond later). With thanks, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 01-Sep-09 Time: 09:09:09 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 01-Sep-09 Time: 09:42:18 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Offtopic, HT vs. HH in coin flips
On 31-Aug-09 19:16:33, Erik Iverson wrote: Dear R-help, Could someone please try to explain this paradox to me? What is more likely to show up first in a string of coin tosses, Heads then Tails, or Heads then Heads? ##generate 2500 strings of random coin flips ht - replicate(2500, paste(sample(c(H, T), 100, replace = TRUE), collapse = )) ## find first occurrence of HT mean(regexpr(HT, ht))+1#mean of HT position, 4 ## find first occurrence of HH mean(regexpr(HH, ht))+1#mean of HH position, 6 FYI, this is not homework, I have not been in school in years. I saw a similar problem posed in a blog post on the Revolutions R blog, and although I believe the answer, I'm having a hard time figuring out why this should be? Thanks, Erik Iverson Be very careful about the statement of the problem! [1] The probability that HH will occur first (i.e. before HT) is the same as the probability that HT will occur first (i.e. before HH). [2] However, the probability that the first occurrence of HT will be on a given position of the H is generally not the same as the probability that the first occurrence of HH will be on the same position of the first H. [1]: At the first occurrence of (either HH or HT), there is an initial string S, ending in an H, followed by either an H (for HH) or a T (for HT). Both are equally likely. So the probability that the first occurrence of (either HH or HT) is an HH is the same as the probability that it is an HT. [2]: (A) the first occurrence of an HH is in a sequence of any collection of H and T provided there is no HH in the sequence, and the last is H, followed by H. However, HT is allowed to occur in the sequence. But (B) the first occurrence of an HT is in a sequence of (zero or more T) followed by (1 or more H) followed by T. This is the only pattern in which HT does not occur prior to the final HT. Similarly, HH is allowed to pccur in the sequence. The reason that, in general, the probability of HH first occuring at a given position is different from the probability if HT first occurring at that position lies in the differences between the number of possible sequences satisfying (A), and the number of possible sequences satisfying (B). The first few cases (HH or HT first occurring at (k+1), so that the position of the first H in HH or HT is at k) are, with their probabilities: k=1: HH HT 1/41/4 K=2: THH HHT THT 1/8 2/8 k=3: TTHH HHHT HTHH THHT TTHT 2/16 3/16 k=4:TTTHH T THTHH THHHT HTTHH TTHHT TTTHT 3/32 4/32 The HT case is simple: P.HT[k] = Prob(1st HT at (k+1)) = k/(2^(k+1)) Exercise for the reader: Sum(P.HT) = 1 The HH case is more interesting. Experimental scribblings on parer threw up an hypothesis, which I decided to explore in R. Thanks to Gerrit Eichner for suggestion the use of expand.grid()! ## Function to count sequences giving 1st HH on throw k+1 countHH - function(k){ M - as.matrix(expand.grid(rep(list(0:1),k))) ix - (M[,k]==1) ## k must be an H (then k+1 will be H) for(i in (1:(k-1))){ ix-ix( !((M[,i]==1)(M[,i+1]==1)) ) } sum(ix) ## list(Count=sum(ix),Which=M[ix,]) } Now, ignoring the case k=1: HHcounts - NULL for(i in (2:12)){ HHcounts-c(HHcounts,countHH(i)) } rbind((3:13),HHcounts) # [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] #3456789 10 111213 #HHcounts12358 13 21 34 5589 144 Lo and Behold, we have a Fibonnaci sequence! Another exercise for the reader ... Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 01-Sep-09 Time: 10:38:58 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Google style
On 01-Sep-09 10:25:53, Duncan Murdoch wrote: Jim Lemon wrote: Duncan Murdoch wrote: On 8/31/2009 11:50 AM, Mark Knecht wrote: On Mon, Aug 31, 2009 at 6:36 AM, Terry Therneauthern...@mayo.edu wrote: SNIP The authors borrowed so much else from C, the semicolon would have been good too. Something I have thought myself. I know real R coders will chuckle I'd say cringe, rather than chuckle. This is going to make you waste a lot of time some day, when you stare and stare at code like Terry's and can't figure out what's wrong with it: zed - function(x,y,z) { x + y +z; } The value of the function is +z, not x+y+z, even though the C part of your brain made you type it that way and reads it as one statement in the body, not two. This is getting interesting. One habit I have developed in R to emphasize a line continuation is to always write the above as: zed-function(x,y,z) { x+y+ z } That's a good habit. An alternative is to put parentheses around the expression: (x + y + z) will work. The trailing operator signalling to me and the interpreter that there's more to come. A semicolon after the z would be innocuous. Now I know that this marks me as a crabby old fart who learned to program on Hollerith cards where there had to be firm conventions on when a line of code ended. Still, given the moiety of global warming attributable to endless discussions about how many spaces should be used for indentation, I think the use of the semicolon as a personal aid to interpretation is at worst a harmless affectation. I think it's worse. To me, it's like putting in a comment that is wrong, or writing code like this: one - 2 x - x + one Code has meaning, it's not just a bunch of binary instructions to the computer. If the meaning and the look of the code clash, it is going to lead to problems. Duncan Murdoch And surely that is precisely the point of Jim's use of ;! It is, in effect, ignored by R; but to Jim it means This marks the end of a command. Surely useful, and surely not in the same league as a comment that is wrong. You may see it as noise, but then you can filter it out. As one COF to another, I have to say that Jim's posting took me back to the early days of my own evolution. That was dandy! (Dinosaurs are not dead yet). Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 01-Sep-09 Time: 11:37:52 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bootstrap inference for the sample median?
On 30-Aug-09 16:24:08, Emmanuel Charpentier wrote: Le dimanche 30 août 2009 Ã_ 18:43 +0530, Ajay Shah a écrit : Folks, I have this code fragment: set.seed(1001) x - c(0.79211363702017, 0.940536712079832, 0.859757602692931, 0.82529998629531, 0.973451006822,0.92378802164835, 0.996679563355802,0.943347739494445, 0.992873542980045, 0.870624707845108,0.935917364493788) range(x) # from 0.79 to 0.996 e - function(x,d) { median(x[d]) } b - boot(x, e, R=1000) boot.ci(b) The 95% confidence interval, as seen with `Normal' and `Basic' calculations, has an upper bound of 1.0028 and 1.0121. How is this possible? If I sample with replacement from values which are all lower than 1, then any sample median of these bootstrap samples should be lower than 1. The upper cutoff of the 95% confidence interval should also be below 1. Nope. Normal and Basic try to adjust (some form of) normal distribution to the sample of your statistic. But the median of such a small sample is quite far from a normal (hint : it is a discrete distribution with only very few possible values, at most as many value as the sample. Exercise : derive the distribution of median(x)...). To convince yourself, look at the histogram of the bootstrap distribution of median(x). Contrast with the bootstrap distribution of mean(x). Meditate. Conclude... HTH, Emmanuel Charpentier Ajay, You have not said why you are adopting the bootstrap approach. Maybe you specifically want to study the behaviour of bootstrap methods, in which case my response below is irrelevant. But if, on the other hand, you are simply looking for a confidence interval for the median, you might consider a non-paramatric approach. This -- which does not depend on distributional assumptions about the data -- is based on the fact that if med denotes the median of the distribution of a continuous variables X, Prob(X med) = P(X med) = 1/2 Hence, if X[1], X[2], ... , X[n] denote the values of a random sample of X-values in increasing order, then Prob(X[1] med) = 1/2^n, and for r 1: Prob( (X[r]med) ) = Prob( (X[1]med) ) + Prob( (X[1]med)(X[2]med) ) + ... + Prob( (X[1]med)(X[2]med)...(X[r-1]med)(X[r]med) ) i.e. terms summed over all the r disjoint ways in which X[r]med can occur. These terms are all straightforward binomial probabilities, with p = 1/2, so Prob( (X[r]med) ) = (1/2^n) + n*(1/2^n) + ... + choose(n,(r-1))*(1/2^n) = pbinom((r-1),n,1/2) Hence if, for instance, you find the maximum value of r such that for given n: Prob( (X[r]med) = 0.025 then the probability is at least 0.975 that X[r] med whatever the value of med may be. Hence the element of the sorted sample in the r-th position is a lower 97.5% one-sided confidence limit for med. Because you are looking at the median, the situation is symmetrical, i.e. Prob(X med) = Prob(X med) = 1/2, so a corresponding 97.5% upper one-sided confidence limit for med is the element of the sorted sample in the (n+1-r)-th position. Hence the pair of these two limits is a 95% confidence interval for the median. Now, since your (admittedly small) sample size is n=11, you can get at it as follows: Ps - pbinom(q=(0:11), n=11, p=1/2) round(rbind((0:11),Ps,rev(Ps)), 3) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] 0 0.006 0.033 0.113 0.274 0.500 0.726 0.887 0.967 0.994 1.000 1 1 1.000 0.994 0.967 0.887 0.726 0.500 0.274 0.113 0.033 0.006 0 Hence the maximum value of r from the first line is r=2, so the second of 11 sorted observations is the lower limit, and the 10th is the upper limit, for a 95% confidence interval. In the case of your data, x - c(0.79211363702017, 0.940536712079832, 0.859757602692931, 0.82529998629531, 0.973451006822,0.92378802164835, 0.996679563355802,0.943347739494445, 0.992873542980045, 0.870624707845108,0.935917364493788) print(sort(x)[c(2,10)],15) # [1] 0.825299986295310 0.992873542980045 Note that this is a conservative confidence interval, in that it is guaranteed that the confidence level Prob(LowerLimit median UpperLimit ) 0.95 and it is the shortest such interval with symmetry. As it happens, with n=11, this probability is 1 - 2*pbinom(1,11,1/2) = 0.9882812 so it is very conservative (a consequence of small n). You could get a less conservative interval by using an asymmetrical interval, e.g. the 2nd and 9th, or the 3rd and 10th, when the probability would be 1 - pbinom(1,11,1/2) - pbinom(2,11,1/2) = 0.9614258 which is pretty close to the target confidence level while still not being less than the target, but then you have to have a reason for preferring one ([2,9]) to the other [(3,10)]! Or you could choose one of them at random ... Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax
Re: [R] write file to date-stamped folder
On 31-Aug-09 13:18:38, Henrique Dallazuanna wrote: Try this: write.table(test.table, file.path(outputDir, test.table.txt), sep=\t) That may not work (depending on the platform OS) if the directory 'outputDir' does not already exist (it will not work on Linux). In that case, first: system(paste(mkdir,outputDir)) Then Henrique's suggestion should work (untested), or simply the bare-hands construction (tested!): write.table(test.table, paste(outputDir,/,test.table.txt,sep=), sep=\t) Ted. On Mon, Aug 31, 2009 at 4:45 AM, suzylee c.me...@sms.ed.ac.uk wrote: Hello, I would like to be able to write all files produced on one day to an output directory with that date stamp, or alternatively stamp the date in the filename. So that if i run the same code the next day the files will not be overwritten. here's what i have to start with: baseDir = getwd() outputDir = paste(baseDir,/OutputData-, Sys.Date(),sep=) and lets say i want to write the table test.table to a file: write.table(test.table, test.table.txt, sep=\t) How do i make this write to outputDir? Thanks. -- View this message in context: http://www.nabble.com/write-file-to-date-stamped-folder-tp25219504p2521 9504.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 31-Aug-09 Time: 14:42:51 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Google's R Style Guide
On 29-Aug-09 17:51:54, diegol wrote: Max Kuhn wrote: Perhaps this is obvious, but Ive never understood why this is the general convention: An opening curly brace should never go on its own line; I tend to do this: f - function() { if (TRUE) { cat(TRUE!!\n) } else { cat(FALSE!!\n) } } I favor your approach. BUT I add one more level of indentation. Your function would look like: f - function() { if (TRUE) { cat(TRUE!!\n) } else { cat(FALSE!!\n) } } This way I quickly identify the beginning of the function, which is the one line at the top of the expression AND sticking to the left margin. In your code you use this same indentation in the if/else construct. I find it also useful for the function itself. When I want to rely on indentation and vertical alignments to keep track of program structure, I would tend to write the above like f - function() { if (TRUE) { cat(TRUE!!\n) } else { cat(FALSE!!\n) } } so that an opening { is aligned with the keyword it is associated with, and then at the end of the block so also is the closing }. However, in this case (if I keep all the {...} for the sake of structure) I would also tend to save on lines with f - function() { if (TRUE) { cat(TRUE!!\n) } else { cat(FALSE!!\n) } } which is still clear enough for me. This probably breaks most guidelines! But in practice it depends on what it is, and on how readily I find I can read it. Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 29-Aug-09 Time: 19:26:51 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] breaking multi-modal histograms down into combinations o
On 28-Aug-09 10:34:46, Thomas Groen wrote: Dear All, Does anybody know if there is a functionality in R to break histograms that show a clear bi-modal (or multi-modal) distribution into a series of unimodal histograms that added up result in the original histogram? I was thinking of using QQ-plots (for which tools are available in R), and then observing the number of times the observed quantiles cross the 1:1 line, but this only gives an indication of how many peaks the current histogram has. Therefore I was wondering whether other approaches exist. Thanks in advance for any suggestions. p.s. also thanks to those who helped me on my previous question on Modelling different combinations of explanatory variables. The leaps package and the regsubsets command worked really well! There are a number of points of information which would help us to be more specific about suggestions. 1: Do you have the raw data from which the histogram was constructed? Decomposition of a multimodal sample into constituent unimodal components is best done by adopting a generic distirbution type (e.g. Normal) for each component, and then estimating the paramaters of each component from the data. There is more information (and there better estimation) in the raw data than in the histogram. 2: Do you have a preferred generic distribution type (e.g. Normal) for the component distributions? (If not, and you don't care what distribution you adopt, then what is to stop you drawing arbitary dividing lines between the peaks, and asserting that what lies between two consecutive divisions is one component of the mixture? Then you would end up with a set of disjoint histograms, one for each component, chosen in a somewhat arbitrary way. Since you presumably don't intend that to happen, you presumably have reasons why it should not happen which would amount to a preference for generic distribution type). Once the generic type is chosen, a specific method is indicated. For example, do an R Site Search on normal mixture in Functions at: http://finzi.psych.upenn.edu/nmz.html You may want to look at http://finzi.psych.upenn.edu/R/library/mclust/html/00Index.html (Model-Based Clustering / Normal Mixture Modeling). Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 28-Aug-09 Time: 12:12:36 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Google's R Style Guide
On 28-Aug-09 12:59:24, Esmail wrote: Perhaps most of you have already seen this? http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html Comments/Critiques? Thanks, Esmail ps: Reminds me of PEP 8 for Python http://www.python.org/dev/peps/pep-0008/ Maybe not that surprising since Python is also one of the main languages used by Google. I think it is grossly over-prescriptive. For example: function names have initial capital letters and no dots is violated throughout R itself. Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 28-Aug-09 Time: 14:21:58 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Statistical question about logistic regression simulatio
On 26-Aug-09 14:17:40, Denis Aydin wrote: Hi R help list I'm simulating logistic regression data with a specified odds ratio (beta) and have a problem/unexpected behaviour that occurs. The datasets includes a lognormal exposure and diseased and healthy subjects. Here is my loop: ors - vector() for(i in 1:200){ # First, I create a vector with a lognormally distributed exposure: n - 1 # number of study subjects mean - 6 sd - 1 expo - rlnorm(n, mean, sd) # Then I assign each study subject a probability of disease with a # specified Odds ratio (or beta coefficient) according to a logistic # model: inter - 0.01 # intercept or - log(1.5) # an odds ratio of 1.5 or a beta of ln(1.5) p - exp(inter + or * expo)/(1 + exp(inter + or * expo)) # Then I use the probability to decide who is having the disease and who # is not: disease - rbinom(length(p), 1, p) # 1 = disease, 0 = healthy # Then I calculate the logistic regression and extract the odds ratio model - glm(disease ~ expo, family = binomial) ors[i] - exp(summary(model)$coef[2]) # exponentiated beta = OR } Now to my questions: 1. I was expecting the mean of the odds ratios over all simulations to be close to the specified one (1.5 in this case). This is not the case if the mean of the lognormal distribution is, say 6. If I reduce the mean of the exposure distribution to say 3, the mean of the simulated ORs is very close to the specified one. So the simulation seems to be quite sensitive to the parameters of the exposure distribution. 2. Is it somehow possible to stabilize the simulation so that it is not that sensitive to the parameters of the lognormal exposure distribution? I can't make up the parameters of the exposure distribution, they are estimations from real data. 3. Are there general flaws or errors in my approach? Thanks a lot for any help on this! All the best, Denis -- Denis Aydin You need to look at the probabilities 'p' being generated by your code. Taking first your case mean - 6 (and sorting 'expo', and using whatever seed my system had current at the time): n - 1 # number of study subjects mean - 6 sd - 1 expo - sort(rlnorm(n, mean, sd)) p - exp(inter + or * expo)/(1 + exp(inter + or * expo)) p[1:20] # [1] 0.9763438 0.9918962 0.9924002 0.9965314 0.9980887 0.9984698 # [7] 0.9993116 0.9993167 0.9994007 0.9994243 0.9996288 0.9997037 # [13] 0.9998728 0.9998832 0.284 0.346 0.446 0.528 # [19] 0.561 0.645 so that almost all of your 'p's are very close to 1.0, which means that almost all or even all) of your responses will be 1. Indeed, continuiung from the above: disease - rbinom(length(p), 1, p) disease[1:20] # [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 sum(disease) # [1] NaN sum(is.nan(disease)) # [1] 710 What has happened here is that the higher values of 'expo' are so large (in the 1000s) that the calculation of 'p' gives NA, because the value of exp(inter + or * expo) is +Inf, so the calculation of 'p' is in terms of (+Inf)/(+Inf), which is NA. Now compare with what happens when mean - 3: mean - 3 sd - 1 expo - sort(rlnorm(n, mean, sd)) p - exp(inter + or * expo)/(1 + exp(inter + or * expo)) p[1:20] # [1] 0.5514112 0.5543155 0.5702318 0.5830025 0.5885994 0.5889078 # [7] 0.5908860 0.6004657 0.6029123 0.6042805 0.6048688 0.6122290 # [13] 0.6123407 0.6135233 0.6137499 0.6139299 0.6153900 0.6181017 # [19] 0.6184093 0.6203757 sum(is.na(p)) # [1] 0 max(expo) # [1] 728.0519 So now no NAs (max(expo), though large, is now not large enough to make the calculation of 'p' yield NA). These smaller probabilities are now well away from 1.0, so a good mix of 0 and 1 responses can be expected, although a good number of the 'p's will still be very close to 1 or will be set equal to 1. disease - rbinom(length(p), 1, p) disease[1:20] # [1] 0 1 0 0 1 0 1 1 1 0 1 0 0 1 0 1 1 0 0 1 (as expected), and sum(disease) # [1] 9740 As well as the problem with p = NA --- disease = NaN, when you have all the probabiltiies close to 1 and (as above) get all the 'disease' outcomes = 1, the resulting attempt to fit the glm will yield nonsense. In summary: do not use silly paramater values for the model you are simulating. It will almost always not work (for reasons illustrated above), and even if it appreas to work the result will be highly unreliable. If in doubt, have a look at what you are getting, along the line, as illustrated above! The above reasons almost certainly underlie your finding that the mean of simulated OR estimates is markedly different from the value which you set when you run the case mean - 6, and the much better finding when you run the case mean - 3. Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 26-Aug-09
Re: [R] robust method to obtain a correlation coeff?
On 24-Aug-09 14:47:02, Christian Meesters wrote: Hi, Being a R-newbie I am wondering how to calculate a correlation coefficient (preferably with an associated p-value) for data like: d[,1] [1] 25.5 25.3 25.1 NA 23.3 21.5 23.8 23.2 24.2 22.7 27.6 24.2 ... d[,2] [1] 0.0 11.1 0.0 NA 0.0 10.1 10.6 9.5 0.0 57.9 0.0 0.0 ... Apparently corr(d) from the boot-library fails with NAs in the data, Yes, apparently corr() has no option for dealing with NAs. also cor.test cannot cope with a different number of NAs. On the other hand, cor.test() does have an option na.action which, by default, is the same as what is in getOption(na.action). In my R installation, this, by default, is na.omit. This has the effect that, for any pair in (x,y) where at least one of the pair is NA, that pair will be omitted from the calculation. For example, basing two vectors x,y on your data above, and a third z which is y with an extra NA: x-c(25.5,25.3,25.1,NA,23.3,21.5,23.8,23.2,24.2,22.7,27.6,24.2) y-c( 0.0,11.1, 0.0,NA, 0.0,10.1,10.6, 9.5, 0.0,57.9, 0.0, 0.0) z-y; z[8]-NA I get cor.test(x,y) # Pearson's product-moment correlation # data: x and y # t = -1.3986, df = 9, p-value = 0.1954 # alternative hypothesis: true correlation is not equal to 0 # 95 percent confidence interval: # -0.8156678 0.2375438 # sample estimates: # cor # -0.422542 # cor.test(x,z) # Pearson's product-moment correlation # data: x and z # t = -1.3466, df = 8, p-value = 0.215 # alternative hypothesis: true correlation is not equal to 0 # 95 percent confidence interval: # -0.8338184 0.2738824 # sample estimates: #cor # -0.4298726 So it has worked in both cases (see the difference in 'df'), despite the different numbers of NAs in x and z. For functions such as corr() which do not have provision for omitting NAs, you can fix it up for yourself before calling the function. In the case of your two series d[,1], d[,2] you could use an index variable to select cases: ix - (!is.na(d[,1]))(!is.na(d[,2])) corr(d[ix,]) With my variables x,y,z I get ix.1 - (!is.na(x))(!is.na(y)) ix.2 - (!is.na(x))(!is.na(z)) d.1 -cbind(x,y) corr(d.1[ix.1,]) # [1] -0.422542 ## (and -0.422542 from cor.test above as well) d.2 - cbind(x,z) corr(d.2[ix.2,]) # [1] -0.4298726 ## (and -0.4298726 from cor.test above as well) Hoping this helps, Ted. Is there a solution to this problem (calculating a correlation coefficient and ignoring different number of NAs), e.g. Pearson's corr coeff? If so, please point me to the relevant piece of documentation. TIA Christian __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 24-Aug-09 Time: 16:26:53 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unix like commands in R?
On 24-Aug-09 21:56:06, zrl wrote: Dear List: I am trying to find a command in R which is like the unix command less or more to show the data in a object of R. did anyone can help me on this? Is there a collection of such unix-like commands in R? Thanks. -ZRL There is a page() command in R -- see '?page'. From your query I take it you are in fact using either Linux or Unix (or possible a Mac BSD type OS). Have a look at options(pager) to see what is currently uses. I have modified my R setup so that it uses 'less' (on Linux) as the pager program. I did this in my .Rprofile (in home directory) by putting in the lines: .xthelp - function() { tdir - tempdir() pgr - paste(tdir, /pgr, sep=) con - file(pgr, w) cat(#! /bin/bash\n, file=con) cat(export HLPFIL=`mktemp , tdir, /R_hlp.XX`\n, sep=, file=con) cat(cat $HLPFIL\nxterm -e less $HLPFIL \n, file=con) close(con) system(paste(chmod 755 , pgr, sep=)) options(pager=pgr) } .xthelp() rm(.xthelp) This opens anything which might be paged in a separate xterm, so thatfor instance, all responses to ?. come up in a new xterm being paged by 'less'. Likewise, if you page a large object (such as a matrix M with 500 rows), using 'page(M), you will again see the result paged in 'less' in a separate window. This code was suggested by Roger Bivand in response to a query of mine back in 2003. The URL is http://finzi.psych.upenn.edu/R/Rhelp02/archive/21642.html This was an improvement on a previous solution of my own, which is also quoted in the above URL. Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 24-Aug-09 Time: 23:32:18 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A matrix calculation
The problem with David's proposal is revealed by: mat[7:1,] # [,1] [,2] [,3] # [1,]7 14 21 # [2,]6 13 20 # [3,]5 12 19 # [4,]4 11 18 # [5,]3 10 17 # [6,]29 16 # [7,]18 15 which simply reverses the rows. Then: c(mat[7:1,]) # [1] 7 6 5 4 3 2 1 14 13 12 11 10 9 8 21 20 19 18 17 16 15 since a matrix is stored by columns -- i.e. as a vector with its elements ordered down each column, per col1 then col2 then ... And the following won't work either: mat[,1:3] # [,1] [,2] [,3] # [1,]18 15 # [2,]29 16 # [3,]3 10 17 # [4,]4 11 18 # [5,]5 12 19 # [6,]6 13 20 # [7,]7 14 21 which is simply the original matrix, and hence: c(mat[,1:3]) # [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 for the same reson as before. What you need to do is based on the following: t(mat[7:1,]) # [,1] [,2] [,3] [,4] [,5] [,6] [,7] # [1,]7654321 # [2,] 14 13 12 11 1098 # [3,] 21 20 19 18 17 16 15 which now has the elements (down the columns) in the order you want. So: c(t(mat[7:1,])) # [1] 7 14 21 6 13 20 5 12 19 4 11 18 3 10 17 2 9 16 1 8 15 As desired. Ted. On 23-Aug-09 18:53:38, Bogaso wrote: No no, I actually want following result : 7, 14, 21, 6, 13, 20, 5, 12, 19, David Winsemius wrote: On Aug 23, 2009, at 2:37 PM, Bogaso wrote: I have suppose a matrix like that mat - matrix(1:21, 7) mat [,1] [,2] [,3] [1,]18 15 [2,]29 16 [3,]3 10 17 [4,]4 11 18 [5,]5 12 19 [6,]6 13 20 [7,]7 14 21 From this matrix, I want to create a vector like tha : c(mat[7,], mat[6,], mat[5,], ., mat[1,]) Can anyone please guide me, how to do that? c( mat[7:1,] ) # [1] 7 6 5 4 3 2 1 14 13 12 11 10 9 8 21 20 19 18 17 16 15 -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/A-matrix-calculation-tp25106048p25106224.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 23-Aug-09 Time: 20:17:58 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help on comparing two matrices
Steve, I don't know for sure whether this will help to solve your problem, but you may be interested to read about the algorithm devised by David Kendall for sorting 0-1 matrices, as described in Incidence matrices, interval graphs and seriation in archeology. Pacific J. Math. Volume 28, Number 3 (1969), 565-570 which is available on Open Access at Project Euclid: http://projecteuclid.org/DPubS/Repository/1.0/Disseminate? view=bodyid=pdf_1handle=euclid.pjm/1102983306 [also = http://tinyurl.com/mw2oap ] On 22-Aug-09 19:38:47, Steve Lianoglou wrote: On Sat, Aug 22, 2009 at 2:45 PM, Michael Kogan michael.ko...@gmx.net wrote: 1. Sort the rows after the row sums (greater sums first). 2. Sort the columns after the first column (columns with ones in the first row go left, columns with zeros go right). 3. Save the left part (all columns with ones in the first row) and the right part in separate matrices. 4. Repeat steps 2 and 3 with both of the created matrices (now taking the second row for sorting), repeat until all fragments consist of a single column. 5. Compose the columns to a sorted matrix. If you want to go about this by implementing the algo you described, I think you'd be best suited via some divide-and-conquer/recursion route: Starting from step 2, that is. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 22-Aug-09 Time: 22:24:49 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Special LS estimation problem
On 21-Aug-09 16:28:26, megh wrote: Hi, I have following kind of model : Y = X1 * a1 + X2 * a2 + error Here sampled data for Y, X1, X2 are like that : Y - replicate(10, matrix(rnorm(2),2), simplify = F) X1 - replicate(10, matrix(rnorm(4),2), simplify = F) X2 - replicate(10, matrix(rnorm(4),2), simplify = F) My goal is to calculate LS estimates of vectors a1 and a2. Can anyone please guide me how to do that in R? Is there any special function to handle this kind of problem? Thanks In your example, a1 and a2 cannot be vectors, since Y, X1 and X2 are all 2x2 matrices. On the other hand, either or both of a1, a2 could be either a scalar or a 2x2 matrix, any of which would make X1*a1 + X2*a2 (provided you interpret * as %*% in R) into a 2x2 matrix. But if (say) a1 is a vector it can only be a 2x1 vector (for multiplication conformity), and then X1*a1 (or X1%*%a1 in R) would be a 2x1 vector (in the linear algebra sense, i.e. a 2x1 matrix in the R sense). So what is it that you want in the model? Next, if in fact a1 and a2 are scalars, then in effect you are looking at Y1[1,1] = a1*X1[1,1] + a2*X2[1,1] + error[1,1] Y1[1,2] = a1*X1[1,2] + a2*X2[1,2] + error[1,2] Y1[2,1] = a1*X1[2,1] + a2*X2[2,1] + error[2,1] Y1[2,2] = a1*X1[2,2] + a2*X2[2,2] + error[2,2] 10 times over. The relevant question here is: Are you wanting to include possible covariance between the 4 elements of 'error'? One possibility in this case is to treat this as a multilevel or longitudinal model, with 4 observations per subject, and correlated error within-subject. But you really need to spell out more detail of the kind of model you are seeking to fit. What you described is not enough! Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 21-Aug-09 Time: 18:48:31 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Special LS estimation problem
On 21-Aug-09 18:12:45, megh wrote: Y is 2 dimensional vector, X1 and X2 are 2x2 matrices. Sorry -- I mis-read your code! I will think about your problem. Apologies. Ted. Ted.Harding-2 wrote: On 21-Aug-09 16:28:26, megh wrote: Hi, I have following kind of model : Y = X1 * a1 + X2 * a2 + error Here sampled data for Y, X1, X2 are like that : Y - replicate(10, matrix(rnorm(2),2), simplify = F) X1 - replicate(10, matrix(rnorm(4),2), simplify = F) X2 - replicate(10, matrix(rnorm(4),2), simplify = F) My goal is to calculate LS estimates of vectors a1 and a2. Can anyone please guide me how to do that in R? Is there any special function to handle this kind of problem? Thanks In your example, a1 and a2 cannot be vectors, since Y, X1 and X2 are all 2x2 matrices. On the other hand, either or both of a1, a2 could be either a scalar or a 2x2 matrix, any of which would make X1*a1 + X2*a2 (provided you interpret * as %*% in R) into a 2x2 matrix. But if (say) a1 is a vector it can only be a 2x1 vector (for multiplication conformity), and then X1*a1 (or X1%*%a1 in R) would be a 2x1 vector (in the linear algebra sense, i.e. a 2x1 matrix in the R sense). So what is it that you want in the model? Next, if in fact a1 and a2 are scalars, then in effect you are looking at Y1[1,1] = a1*X1[1,1] + a2*X2[1,1] + error[1,1] Y1[1,2] = a1*X1[1,2] + a2*X2[1,2] + error[1,2] Y1[2,1] = a1*X1[2,1] + a2*X2[2,1] + error[2,1] Y1[2,2] = a1*X1[2,2] + a2*X2[2,2] + error[2,2] 10 times over. The relevant question here is: Are you wanting to include possible covariance between the 4 elements of 'error'? One possibility in this case is to treat this as a multilevel or longitudinal model, with 4 observations per subject, and correlated error within-subject. But you really need to spell out more detail of the kind of model you are seeking to fit. What you described is not enough! Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 21-Aug-09 Time: 18:48:31 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/%22Special%22-LS-estimation-problem-tp25083103p250 84866.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 21-Aug-09 Time: 18:59:54 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PowerCut Killed R - is my code retrievable?
On 19-Aug-09 22:35:01, Barry Rowlingson wrote: On Wed, Aug 19, 2009 at 10:57 PM, Polwart Calum (County Durham and Darlington NHS Foundation Trust)calum.polw...@nhs.net wrote: I've been tweaking code for several days on and off in R, cut and pasting in from a text editor (I just leave them open all the time). _I think I got something that was usable but then a powersurge tripped the fuses and unfortunately the machine I was working on doesn't have a UPS. So you were just using the text editor as a scratch pad, and not saving it? A half-decent text editor should be saving things to disk as you go. For example, in emacs, if your emacs dies while editing a file then when you restart it, it will notice and offer to restore it from its restore file. If you were using emacs you might have files like #foo.R# which are emacs' auto-restore files. Other editors might do other things - possibly leaving files in /tmp on a Linux system (but they might get zapped by a reboot). To follow up on what Barry wrote above: Don't be put off by the recommendation to use EMACS (which is not to everyone's taste). I use vim, which does much the same sort of thing: there is a hidden .whatever.swp file which contains a back-trackable history of the editing changes that have been made to the file. So, if you get a crash, on re-opening the file in vim you are asked if you want to recover. The .swp file is *not* zapped by a reboot! But, in any case, it is very wise to frequently and deliberately save the current state of your file. In vim's command mode, all you need to do is to type :w and it is saved. And to get into command mode (or make sure that you are in it), just press ESC. So ESC:w does it. Takes 0.5 seconds. Again, what I routinely do (in Linux) when developing R code is to have two terminal windows open. In one I am running R. In the other, beside it, I am editing a file of R code. To run code in R that has been entered in the code window, I just highlight it with the mouse in the code window, and then paste it into the R window. This method also has the advantage that it is easy to scroll back to code entered earlier, and either paste it into the R window, or copy it down in the code window if you want to modify it. (The code window does not have to be a .R file to be sourced by R -- though it may later become one. Its function is to preserve a trace of what you have done, and make it easy to modify things until you get what you want). Ted. Also - is there a better way for the future? _I know some people use IDE's but is that for serious programming or for building a small function and tweaking it? Tip #1 is save your text file scratchpads! Tip #2 is save your R session regularly (just do 'save.image()' and it will save your current R session objects in a .RData file) Tip #3 is you could use emacs + ESS as an IDE - you run R within emacs, giving you cut n paste of code, syntax highlighting, session transcripts, and emacs' protection from data loss on a crash! Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 20-Aug-09 Time: 00:21:42 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding logical vectors
On 13-Aug-09 19:21:36, David Huffer wrote: When adding several logical vectors I expect each vector will be coerced to integers and these vectors will then be added. That doesn't always seem to be the case. For example: ( f1 - as.factor ( sample ( x , 25 , rep = T ) ) ) [1] x x x x x x x x x x x x x x x x x x x x x x x x x Levels: x ( f2 - as.factor ( sample ( y , 25 , rep = T ) ) ) [1] y y y y y y y y y y y y y y y y y y y y y y y y y Levels: y ( f3 - as.factor ( sample ( z , 25 , rep = T ) ) ) [1] z z z z z z z z z z z z z z z z z z z z z z z z z Levels: z is.na ( f1 [ sample ( 1:25 , 4 ) ] ) - + is.na ( f2 [ sample ( 1:25 , 4 ) ] ) - + is.na ( f3 [ sample ( 1:25 , 4 ) ] ) - TRUE ## this returns a numeric vector: is.na ( f1 ) + is.na ( f2 ) + is.na ( f3 ) [1] 0 0 0 0 0 1 1 0 0 0 1 0 0 1 0 0 1 0 1 2 2 0 1 0 1 ## but this returns a logical vector !is.na ( f1 ) + !is.na ( f2 ) + !is.na ( f3 ) [1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE [9] TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE [17] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE [25] FALSE Can someone please explain why the returned value is a logical vector when I use the not operator but a numeric vector when I don't. What is special about the !is.na? it returns an object of class logical just like the is.na function: all.equal ( class ( !is.na ( f1 ) ) , class ( is.na ( f1 ) ) ) [1] TRUE Thanks! I think you need to compare: [1] is.na ( f1 ) + !is.na ( f2 ) # [1] 2 0 1 1 2 1 0 1 1 #[10] 1 0 2 1 1 0 1 1 2 #[19] 1 1 1 1 1 1 1 with [2] !is.na ( f1 ) + !is.na ( f2 ) # [1] FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE #[10] FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE #[19] FALSE FALSE FALSE FALSE FALSE FALSE FALSE and with [3] (!is.na ( f1 )) + (!is.na ( f2 )) # [1] 1 1 2 2 1 2 1 2 2 2 1 1 2 2 1 2 2 1 2 2 2 2 2 2 2 In other words, I think you have been trapped by Precedence: see '?Syntax'. What seems to happen is that is.na ( f1 ) + !is.na ( f2 ) evalutes is.na(f1) as a logical vector, !is.na(f2) as a logical vector, and adds them getting a numerical result. See [1]. On the other hand, apparently !is.na ( f1 ) + !is.na ( f2 ) negates the result of [1] (compare the outputs of [1] and [2]) and hence produces a logical vector (because of the first !). In other words, the first ! is applied to the result of is.na ( f1 ) + !is.na ( f2 ). In the form given in [3], the parentheses ensure that the logical negations ! are applied before the + is applied, so two logical vectors are added, with a numerical result. (I hope I've got this right)! Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 13-Aug-09 Time: 20:50:51 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lm coefficients output confusing
and the base level treatment. Alternativve systems of cintrasts give different representations. If i can get this to work correctly can I use the p value to determine which of the hours is significantly different to the others - so in this example hour 5 is significantly different? Or is it just a case of using the p value from the anova to determine that there is a significant difference between hours (in this case) and use a plot to determine which hour(s) are likely to be the cause? Don't trust the individual P-values in the first instance, since with several levels somce are bound to have more extreme P-values than others. However, your anova() test is an overall test for whether one or more of the means jointly depart significantly from the Null Hypothesis that they are all equal. Here, your anova() has given a significant result, so the hour means are not all equal. *Now* you can look at the individual P-values to see where this may be coming from. In the list of coefficients, there is only one candidate for being different from Intercept, and this is Hour 5. Indeed the P-value for HR5 and the P-value for the anova() are nearly equal (the latter being slightly larger, allowing for the fact that it is 1 out of 7). Hoping this helps! Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 13-Aug-09 Time: 22:53:22 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random sampling while keeping distribution of nearest ne
On 12-Aug-09 22:05:24, Emmanuel Levy wrote: Dear All, I cannot find a solution to the following problem although I imagine that it is a classic, hence my email. I have a vector V of X values comprised between 1 and N. I would like to get random samples of X values also comprised between 1 and N, but the important point is: * I would like to keep the same distribution of distances between the X values * For example let's say N=10 and I have V = c(3,4,5,6) then the random values could be 1,2,3,4 or 2,3,4,5 or 3,4,5,6, or 4,5,6,7 etc.. so that the distribution of distances (3 - 4, 3 -5, 3 - 6, 4 - 5, 4 - 6 etc ...) is kept constant. I couldn't find a package that help me with this, but it looks like it should be a classic problem so there should be something! Many thanks in advance for any help or hint you could provide, All the best, Emmanuel If I've understood you right, you are basically putting a sequence with given spacings in a random position amongst the available positions. In your example, you would randomly choose between 1,2,3,4/2,3,4,5/3,4,5,6/4,5,6,7/5,6,7,8/6,7,8,9/7,8,9,10/ Hence a result Y could be: A - min(V) L - max(V) - A + 1 M - (0:(N-L)) Y - 1 + (V-A) + sample(M,1) I think this does it! E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 12-Aug-09 Time: 23:49:22 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matrix power
On 10-Aug-09 21:31:30, cindy Guo wrote: Hi, All, If I have a symmetric matrix, how can I get the negative square root of the matrx, ie. X^(-1/2) ? Thanks, Cindy X - matrix(c(2,1,1,2),nrow=2) X # [,1] [,2] # [1,]21 # [2,]12 E - eigen(X) V - E$values Q - E$vectors Y - Q%*%diag(1/sqrt(V))%*%t(Q) Y #[,1] [,2] # [1,] 0.7886751 -0.2113249 # [2,] -0.2113249 0.7886751 solve(Y%*%Y)## i.e. find its inverse # [,1] [,2] # [1,]21 # [2,]12 Hence (Y%*%Y)^(-1) = X, or Y = X^(-1/2) Hopingb this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 10-Aug-09 Time: 22:53:25 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matrix power
On 10-Aug-09 22:36:03, cindy Guo wrote: Hi, Ted, Thanks for the sample code. It is exactly what I want. But can I ask another question? The matrix for which I want the negative square root is a covariance matrix. I suppose it should be positive definite, so I can do 1/sqrt(V) as you wrote. But the covariance matrix I got in R using the function cov has a lot of negative eigenvalues, like -5.338634e-17, so 1/sqrt(V) generates NA's. Can you tell what's the problem here. Thanks, Cindy Cindy, If that -5.338634e-17 is typical of the lot of negative eigenvalues, then what you are seeing is the result of R's attempt to calculate zero eigenvalues, but defeated by the inevitable rounding errors. In other words, your covariance matrix is singular, and the variables involved are not linearly independent. The only thing that is guaranteed about a covariance matrix is that it is positive semi-definite (not positive definite); in other words all eigenvalues are positive or zero (mathematically). For example, if Y=X, var(X) = var(Y) = 1, then cov(X,Y) = 1 1 1 1 which is singular (eigenvalues = 2, 0). The result of attempting to compute them is subject to rounding errors, which (for zero eigenvalues) can be slightly negative. So the covariance matrix in your case would not have an inverse, still less a negative square root! The basic problem is that you have luinear dependence between the variables. To make progress, you would need to find a maximal linearly independent set (or possibly find the principal components with nozero weights). Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 10-Aug-09 Time: 23:58:00 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How '.' is used?
On 09-Aug-09 16:06:52, Peng Yu wrote: Hi, I know '.' is not a separator in R as in C++. I am wondering where it discusses the detailed usage of '.' in R. Can somebody point me a webpage, a manual or a book that discuss this? Regards, Peng To the best of my knowledge, apart from its specific use as a separator between the integer and fractional parts of a number, . has no specific use in R, and you can, for instance, use it just as you would use an alphanumeric character in a name. For instance, you could do . - 1.2345 . # [1] 1.2345 . - function(x) x^2 .(12) # [1] 144 So, unless there is something I don't know about, there is hardly anything to discuss about the detailed usage of '.' in R! Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 09-Aug-09 Time: 17:32:54 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How '.' is used?
On 09-Aug-09 16:53:32, Douglas Bates wrote: On Sun, Aug 9, 2009 at 11:32 AM, Ted Hardingted.hard...@manchester.ac.uk wrote: On 09-Aug-09 16:06:52, Peng Yu wrote: Hi, I know '.' is not a separator in R as in C++. I am wondering where it discusses the detailed usage of '.' in R. Can somebody point me a webpage, a manual or a book that discuss this? Regards, Peng To the best of my knowledge, apart from its specific use as a separator between the integer and fractional parts of a number, . has no specific use in R, and you can, for instance, use it just as you would use an alphanumeric character in a name. For instance, you could do _. - 1.2345 _. _# [1] 1.2345 _. - function(x) x^2 _.(12) _# [1] 144 So, unless there is something I don't know about, there is hardly anything to discuss about the detailed usage of '.' in R! The ',' character is one of the characters allowed in names, hence it can be used as you have suggested. There are (at least) two special usages of the '.' in names. Following the time-honoured Unix convention, names that begin with '.' are considered hidden names and not listed by ls() or objects() unless you set all.names = TRUE in the call. Because of this convention it is inadvisable to use names starting with '.' except when you wish to avoid potential name conflicts. The second special use of '.' in a name is in the construction of the names of S3 method functions. The method for generic function foo applied to class bar is named foo.bar. As in summary.glm, I suppose! However, this prompts a question. In the first place, the construction of summary.glm from summary and glm is, in the first instance, simply using . in its basic role as a permissible character in a name. Correct? Next -- and this is the real question -- how does R parse the name summary.glm? In my naivety, I simply suppose that it looks for an available function whose name is summary.glm in just the same way as it looks for stopifnot, or for that matter data.matrix which is not (as far as I know) a compound of a generic function data applied to a class matrix. Then . would not have a special (parseable) role in the name -- it is simply another letter. But when you do have such a function, like summary.glm, does R in fact parse it as summary then glm (i.e. look out for the generic function summary and then specialise it to handle glm). As I say, I suppose not. And, if not, then the special use of the character . is simply a programmer's convention for the construction of the name, and once the name exists the . does not have a special (parseable) significance for R. Just seeking clarification ... ! Thanks, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 09-Aug-09 Time: 19:58:32 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How '.' is used?
On 09-Aug-09 19:31:47, Duncan Murdoch wrote: (Ted Harding) wrote: [...] Next -- and this is the real question -- how does R parse the name summary.glm? In my naivety, I simply suppose that it looks for an available function whose name is summary.glm in just the same way as it looks for stopifnot, or for that matter data.matrix which is not (as far as I know) a compound of a generic function data applied to a class matrix. Then . would not have a special (parseable) role in the name -- it is simply another letter. It doesn't do anything special when parsing. The special sauce comes when the generic summary() function executes UseMethod(summary). At that point, we know the class of the object, and we know the name of the generic, so it goes looking for a summary.glm method. There are some subtleties to how it does that lookup (see the discussion in Writing R Extensions about NAMESPACES), but if you had a generic function calling UseMethod(data) and it was passed an object of class matrix, data.matrix() would be called, even though that doesn't make sense. This is a flaw in the S3 system, and one of the motivations for the development of the S4 system. Many thanks, Duncan. I think that makes it clear! You've prompted me to read ?UseMethod': When a function calling 'UseMethod(fun)' is applied to an object with class attribute 'c(first, second)', the system searches for a function called 'fun.first' and, if it finds it, applies it to the object. If no such function is found a function called 'fun.second' is tried. If no class name produces a suitable function, the function 'fun.default' is used, if it exists, or an error results. which is pretty explicit that the role of the . is simply to construct a name for the system to look for. One learns ... Thanks. Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 09-Aug-09 Time: 21:22:23 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Seeing negative numbers to zero
On 07-Aug-09 20:29:16, DebbieMB wrote: Hi, I am also new to R and I have a related question. I am trying to set negative values in a single column of a dataframe to zero and I can't seem to do it. I have tried: KN1-subset(KN,select=c(5)) # Here I am selecting the column of the dataframe KN1 and assigning it # the name KN2 - this step works KN2-ifelse(KN1=0,0,KN1) # Here I am trying to set negative numbers to zero and leave all other numbers the same - this doesn't work Any help would be appreciated. Thanks, Debbie How about something on the lines of: KN1 - data.frame(c1=c(1,-2,3,-4,5),c2=c(-1,2,-3,4,-5),c3=c(- KN1 # c1 c2 c3 # 1 1 -1 -2 # 2 -2 2 -1 # 3 3 -3 0 # 4 -4 4 1 # 5 5 -5 2 KN2 - KN1 KN2$c2 - (KN2$c20)*KN2$c2 ## ** See below KN2 # c1 c2 c3 # 1 1 0 -2 # 2 -2 2 -1 # 3 3 0 0 # 4 -4 4 1 # 5 5 0 2 The logic of the ** line is that: 1: (KN2$c20) is FALSE wherever (KN2$c2 = 0), otherwise TRUE. 2: If x is a number, then TRUE*x = x and FALSE*x = 0 3: Hence (KN2$c20)*KN2$c2 is 0 wherever (KN2$c2 = 0), and is the same as KN2$c2 wherever (KN2$c2 0) Does this help? Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 07-Aug-09 Time: 22:59:42 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange column shifting with read.table
On 02-Aug-09 21:10:12, Noah Silverman wrote: Hi, I am reading in a dataframe from a CSV file. It has 70 columns. I do not have any kind of unique row id. rawdata - read.table(r_work/train_data.csv, header=T, sep=,, na.strings=0) When training an svm, I keep getting an error So, as an experiment, I wrote the data back out to a new file so that I could see what the svm function sees. write.table(rawdata, file=r_work/output_data.csv, quote=FALSE, sep=,) It appears as if R has added a column for me with id numbers for each row. That would be fine, except that R SHIFTS ALL MY COLUMN LABELS OVER ONE. That causes several problems: 1) The header names are now wrong for each column 2) My last column has no label 3) The SVM complains about the unlabeled column Would someone please help me sort this out. Thanks! -N Not that the default for row.names in write.table() is TRUE. So. in your caoomand, that is what you get. write.table() then *creates* row-names (by default the row numbers). Compare: D - rbind(c(1.1,1.2,1.3),c(2.1,2.2,2.3),c(3.1,3.2,3.3)) D # [,1] [,2] [,3] # [1,] 1.1 1.2 1.3 # [2,] 2.1 2.2 2.3 # [3,] 3.1 3.2 3.3 write.table(D,file=withTRUE.csv,quote=FALSE,sep=,) # withTRUE.csv: # V1,V2,V3 # 1,1.1,1.2,1.3 # 2,2.1,2.2,2.3 # 3,3.1,3.2,3.3 write.table(D,file=withFALSE.csv,row.names=FALSE,quote=FALSE,sep=,) # withFALSE.csv: # V1,V2,V3 # 1.1,1.2,1.3 # 2.1,2.2,2.3 # 3.1,3.2,3.3 Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 02-Aug-09 Time: 22:35:04 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tests for Two Independent Samples
On 31-Jul-09 13:38:10, tedzzx wrote: Dear R users, I have got two samples: sample A with observation of 223: sample A has five categories: 1,2,3,4,5 (I use the numer 1,2,3,4,5 to define the five differen categories) there are 5 observations in category 1; 81 observations in category 2;110 observations in category 3; 27 observations in category 4; 0 observations in category 5; To present the sample in R: a-rep(1:5, c(5,81,110,27,0)) sample B with observation of 504: sample B also has the same five categories: 1,2,3,4,5 there are 6 observations in category 1; 127 observations in category 2;297 observations in category 3; 72 observations in category 4; 2 observations in category 5; To present the sample in R: b-rep(1:5, c(6,127,297,72,2)) I want to test weather these two samples have significant difference in distribution ( or Tests for Two Independent Samples). I find a webside in: http://faculty.chass.ncsu.edu/garson/PA765/mann.htm This page shows four nonparametric tests. Bust I can only find the test Kolmogorov-Smirnov Z Test. res-ks.test(a,b) Can any one tell me which package has the other 3 tests? or Is there any other test for my question? Thanks advance Ted If your 1,2,3,4,5 are simply nominal codes for the categories, then you may be satisfied with a Fisher test or simply a chi-squared test (using simulated P-values in view of the low frequencies in some cells). Taking your data: A-c(5,81,110,27,0) B-c(6,127,297,72,2) M-cbind(A,B) D-colSums(M) P-M%*%(diag(1/D)) P #[,1][,2] # [1,] 0.02242152 0.011904762 # [2,] 0.36322870 0.251984127 ## So the main differences between # [3,] 0.49327354 0.589285714 ## A and B are in these two categories # [4,] 0.12107623 0.142857143 # [5,] 0. 0.003968254 fisher.test(M,simulate.p.value = TRUE,B=10) # Fisher's Exact Test for Count Data with simulated p-value # (based on 1e+05 replicates) # data: M # p-value = 0.01594 chisq.test(M,simulate.p.value=TRUE,B=10) # Pearson's Chi-squared test with simulated p-value # (based on 1e+05 replicates) # data: M # X-squared = 11.7862, df = NA, p-value = 0.01501 So the P-values are similar in both tests. (Another) Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 31-Jul-09 Time: 17:53:58 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] string
On 31-Jul-09 22:10:46, Mohsen Jafarikia wrote: Hello All: I am wondering how I can have dat1 and dat2 in the following loop where 'dat' and 'i' stick together to make dat1 and dat2 : ifn - MyData dat - read.table(ifn) MyData: 01 0.40 02 0.40 03 0.40 04 0.35 05 0.34 06 0.33 names(dat)-c(Code,M) for(i in 1:2) { dati - dat[i:i+2,] } Try using paste(): for(i in (1:2)){ print(paste(dat, i, sep=)) } See '?paste' Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 31-Jul-09 Time: 23:31:17 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to do poisson distribution test like this?
On 29-Jul-09 03:28:24, glen_b wrote: Corrected links (the originals somehow aquired an extra space, sorry) Paper: http://www.plantphysiol.org/cgi/content/full/148/3/1189 Table I: http://www.plantphysiol.org/cgi/content/full/148/3/1189/TBL1 I think I've cracked it (using the informationm in Table 1). What they seem to have done is: 1: Sum the oberved numbers N[i] of F-Box Genes: total = N 2: Sum the lengths L[i] of the chromosomes: total = L 3: Calculate expected number Exp[i] for chromosome i as Exp[i] = N*L[i]/L 4: Use the Poisson distribution as: ppois(N[i],Exp[i])if N[i] Exp[i] 1-ppois(N[i],Exp[i] if N[i] Exp[i] (equality does not occur) I have implemented this in R, using the Table 1 data, as follows. The variables Exp0 and P0 are the values of Exp and P from the Table; the variables Exp1 and P1 are as calculated according to the above. Len - c(32.16,23.44,17.45,15.08,17.04,17.68,11.90, 15.43,12.41,19.21,13.17,13.03,11.50,13.68, 10.19,12.82,5.44,12.44,10.23) Obs - c(36,25,13,10,19,19,12,15,12,15,17,13,13,13,12,9,2,12,2) Exp0 - c(30,22,17,14,16,17,11,15,12,18,13,12,11,13,10,12,5,12,10) P0 - c(0.137,0.235,0.235,0.159,0.196,0.242,0.340, 0.391,0.395,0.273,0.082,0.353,0.207,0.421, 0.176,0.231,0.112,0.398,0.004) N - sum(Obs) ; L - sum(Len) Exp1 - N*Len/L Low - (Obs Exp1) P1 - Low*ppois(Obs,Exp1) + (1-Low)*(1-ppois(Obs,Exp1)) cbind(Len,Obs,Exp0,Exp1,P0,P1) Len Obs Exp0 Exp1 P0 P1 # [1,] 32.16 3630 30.429265 0.137 0.136529344 # [2,] 23.44 2522 22.178544 0.235 0.234714001 # [3,] 17.45 1317 16.510904 0.235 0.234942347 # [4,] 15.08 1014 14.268449 0.159 0.158563376 # [5,] 17.04 1916 16.122969 0.196 0.196445135 # [6,] 17.68 1917 16.728526 0.242 0.241956811 # [7,] 11.90 1211 11.259585 0.340 0.340015730 # [8,] 15.43 1515 14.599613 0.391 0.390970369 # [9,] 12.41 1212 11.742139 0.395 0.394571188 # [10,] 19.21 1518 18.176187 0.273 0.273013405 # [11,] 13.17 1713 12.461238 0.082 0.082372362 # [12,] 13.03 1312 12.328772 0.353 0.353596077 # [13,] 11.50 1311 10.881112 0.207 0.207821286 # [14,] 13.68 1313 12.943792 0.421 0.420776167 # [15,] 10.19 1210 9.641611 0.176 0.175748117 # [16,] 12.82 912 12.130074 0.231 0.231213063 # [17,] 5.44 2 5 5.147239 0.112 0.112786229 # [18,] 12.44 1212 11.770524 0.398 0.397809438 # [19,] 10.23 210 9.679458 0.004 0.003598524 There is now agreement between P0 (from the article) and P1 (as calculated above) -- subject to the rounded third decimal place of P0 being 1 out in a few cases ([12], [13], [17]). The puzzle clearly qrose in the first place becaue the authors reported their Expected Numbers as integers, not fractions! Well, well, well! Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 29-Jul-09 Time: 11:59:11 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to do poisson distribution test like this?
Follow-up: On 29-Jul-09 03:28:24, glen_b wrote: Corrected links (the originals somehow aquired an extra space, sorry) Paper: http://www.plantphysiol.org/cgi/content/full/148/3/1189 Table I: http://www.plantphysiol.org/cgi/content/full/148/3/1189/TBL1 -- I have now scanned though the Paper, and find the explanation of their calculation of Expected Length (which agrees with the method I described in my previous email) in a somewhat garbled description on page 1198 (page 10/12 of the PDF file): The expected gene number lambda_i on chromosome i would be a sample from a Poisson distribution lambda_i = m*L_i/Sum L_i where m is the total number of genes detected within the assembled sequences and L_i is the length of chromosome i. I agree with your earlier comment, Glen, that the paper would have benefited from more careful scrutiny by referees/editor! Best wishes, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 29-Jul-09 Time: 12:43:26 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to do poisson distribution test like this?
On 28-Jul-09 10:03:41, Mao Jianfeng wrote: Dear R-listers, I want to reperfrom a poisson distribution test that presented in a recent-published biological research paper (Plant Physiology 2008, vol 148, pp. 1189-1200). That test is about the occurrence number of a kind of gene in separate chromosomes. For instance: The observed gene number in chromosome A is 36. The expected gene number in chromosome A is 30. Then, the authors got a probability 0.137 by distribution test on this trial. In this test, a Poisson distribution was used to determine the significance of the gene distribution. Questions: How can I reperform this test in R? Thank you in advance. Mao Jian-Feng Institue of Botany, CAS, China Since it is not clear what test procedure they used, I have done a couple of numerical experiments in R: 1. Compare the upper-tail probability of the POisson distribution with mean mu = 30 of the event that at least 36 are observed: 1-ppois(36,30) # [1] 0.1196266 Not quite the 0.137 that they got. 2. A similar comparison, using a Normal approximation to the Poisson (mean mu = 30, SD = sqrt(mu)): 1 - pnorm(6/sqrt(30)) # [1] 0.1366608 which, after rounding, is exactly the 0.137 that they got. So it seems they have used an upper-tail test based on the Normal approximation to the Poisson distribution. Method 1 (using the exact Poisson distribution) is preferable, since it is accurate (given the assumption of Poisson distribution). So that would, in principle, be the best way to do it in R (as illustrated). Possibly their adoption of Method 2 is based on a naive acceptance of the rule-of-thumb from some textbook; or maybe their available software does not offer ready access to the exact Poisson distribution (which wouldn't happen if they used R -- see Method 1). As stated, it is inaacurate compared with Method 1, so is not to be preferred. However, if you need to reproduce their method (regardless of merit), then use Method 2. Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 28-Jul-09 Time: 11:42:08 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] check for new files in a given directory
On 28-Jul-09 11:04:39, Andreas Posch wrote: I am trying to continuously evaluate online created data files using R-algorithms. Is there any simple way to let R iteratively check for new files in a given directory, load them and process them? Any help would be highly appreciated. Best, A. If in Linux/Unix, the following sort of thing will work: LastList - system(ls,intern=TRUE) then, later, NewList - system(ls,intern=TRUE) and then NewList[!(NewList %in% LastList)] is a character vector of the names in NewList which are not in LastList (i.e. the ones which have come in since LastList was created). Then you can do what you like with these names. Finally: LastList - NewList means you can repeat the check on the same basis. Don't ask me how to do this in Windows ... Hpoing this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 28-Jul-09 Time: 12:24:42 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] check for new files in a given directory
On 28-Jul-09 11:23:21, Barry Rowlingson wrote: On Tue, Jul 28, 2009 at 12:04 PM, Andreas Poschandreas.po...@tugraz.at wrote: I am trying to continuously evaluate online created data files using R-algorithms. Is there any simple way to let R iteratively check for new files in a given directory, load them and process them? Any help would be highly appreciated. list.files(dir) will tell you what files are in a directory. file.info(filepath) will tell you things about a file (modification time etc). Sys.sleep(n) will put R to sleep for n seconds so you don't have a tight loop checking the directory every millisecond. That's probably all the functionality you need, except maybe to keep track of what files you consider 'new', and some way of deciding if something has already been processed. But that's a bit application-specific! Barry Snap, Baz! (Except that we played different cards -- and in view of Andreas's follow-up, yours would be the solution for him). However, this got me looking into '?list.files, and I see there (R version 2.9.0 (2009-04-17)): recursive: logical. Should the listing recurse into directories? But: Directories are included only if 'recursive = FALSE'. Surely the latter is the wrong way round, and should be Directories are included only if 'recursive = TRUE'. (that's how it worked when I just tried it). Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 28-Jul-09 Time: 12:36:45 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Looking for example of usage of function unz
On 28-Jul-09 12:15:33, mau...@alice.it wrote: I would greatly appreciate some example of correct usage of function unz. I have to download and uncompress the following web compressef file: ftp://ftp.sanger.ac.uk/pub/mirbase/targets/v5/arch.v5.txt.homo_sapiens.z ip I tried the following command that does not work: Targets.rec - readLines(zz - unz(ftp://ftp.sanger.ac.uk/pub/mirbase/targets/v5/arch.v5.txt.homo_sapi ens.zip)) Thank you in advance, Maura Maura, unz() is insisting that you give the filename argument, which, according to '?unz', is: filename: a filename within a zip file. This means that you would first need to know the name[s] of the file[s] archived in the .zip file, as far as I can see. So, in that case, a first step would be to download the .zip file anyway, to your local system, and then, in whatever way is appropriate for your system (unzip -l in Linux), find out what the files in it are called. But once you have got that far, you may prefer to handle the .zip file outside of R ... Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 28-Jul-09 Time: 13:50:45 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] check for new files in a given directory
On 28-Jul-09 13:40:31, Barry Rowlingson wrote: On Tue, Jul 28, 2009 at 12:36 PM, Ted Hardingted.hard...@manchester.ac.uk wrote: However, this got me looking into '?list.files, and I see there (R version 2.9.0 (2009-04-17)): _recursive: logical. Should the listing recurse into directories? But: _Directories are included only if 'recursive = FALSE'. Surely the latter is the wrong way round, and should be _Directories are included only if 'recursive = TRUE'. by 'included' it means 'returned'. If you do 'recursive=TRUE' it scans recursively for files and only files. If you do recursive=FALSE it returns files and directories in the specified directory. Makes it tricky to figure out a complete directory tree since empty directories won't appear at all if recursive=TRUE. You'd have to implement your own recursive search based on list.files(d,recursive=FALSE) and then testing for directoriness Sucky, unless there's a better way... Barry Thanks for the clarification, Barry! I hadn't appreciated all that (I think the wording of ?list.files is misleading here). I agree that it can leave you falling between two stools in the sort of situation you describe. Which, as it happens, leads me back to the use of system(...) in Unix/Linux systems. For example, to list all the directories (without their files) in the current directory you could do: system(find . -type d -print) which finds (recursively) everything in and under the current directory (.) which is of type directory (d). Likewise, the 'ls' command with suitable options (perhaps combined with judicious 'grep'ping) can enable you to select what you want. For example, if something is planting data files with extension .dat from time to time, system(ls -tr *.dat) would list all files with extension .dat in time (-t) order reversed (r) (most recent last). Again, find could be useful. Suppose the last one of such files you accessed was lastfile.dat; then: system(find -maxdepth 0 -newer lastfile.dat -name '*.dat' -print) would find, in the current directory only (-depth 0) all files with extension .dat (-name '*.dat') which are newer than the given file lastfile.dat. It is for reasons of flexibility like this that I use system() for this kind of thing anyway (with 'intern=TRUE' if the results are to be saved in a variable), so I hadn't looked into list.files() before! Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 28-Jul-09 Time: 15:24:09 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help
On 28-Jul-09 08:28:22, Inchallah Yarab wrote: How I can vary the parameters for a function? I have a function with 5 parameters I want to turn the function for a range of numbers for one of these parameters!! i want to have in the end the value of the function in the different cas of one of the paramter (the others paramters are fixes!!) thank you for your help It depends on the internals of the function. In R, most functions (including what you write yourself, if written in the right way) allow you to give a vector of numbers to a numerical parameter. The calculations are then vectorised in a single pass, and the results for each value are returned as a vector. For example, the pnorm() function for several different standard deviations: SD = c(1,1.5,2,2.5,3) cbind(SD,pnorm(q=0.5, mean=0, sd=SD)) # [1,] 1.0 0.6914625 # [2,] 1.5 0.6305587 # [3,] 2.0 0.5987063 # [4,] 2.5 0.5792597 # [5,] 3.0 0.5661838 Likewise, if your function is my.fun - function(par1,par2,par3,par4,par5){ (par1 + par2*par3 + (par3^2)*(par4 + par5)) } then you could have par1 - 1.1 ; par2 - 1.2 ; par4 - 1.4; par5 - 1.5 par3 - SD # (as above) cbind(SD,my.fun(par1,par2,par3,par4,par5)) # SD # [1,] 1.0 5.200 # [2,] 1.5 9.425 # [3,] 2.0 15.100 # [4,] 2.5 22.225 # [5,] 3.0 30.800 Hoping this helps! Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 28-Jul-09 Time: 15:39:27 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error message: .Random.seed is not an integer vector but
On 23-Jul-09 15:30:05, Jim Bouldin wrote: I'm trying to run this simple random sample procedure and keep getting the error message shown. I don't understand this; I've Thanks. x = as.vector(c(1:12));x [1] 1 2 3 4 5 6 7 8 9 10 11 12 mode(x) [1] numeric sample(x, 3) Error in sample(x, 3) : .Random.seed is not an integer vector but of type 'list' Jim Bouldin, PhD I think this error is not arising from the code you have given above. I get: x=as.vector(c(1:12)) sample(x,3) # [1] 8 12 2 which is fine. '.Random.seed' should be an integer vector to start with. See '?set.seed'. Therefore it should not be a list. Try entering .Random.seed and see what you get -- when I do it it is a vector of 626 assorted integers. Also try: is.vector(.Random.seed) and, if that is not TRUE, try str(.Random.seed) typeof(.Random.seed) If .Random.seed turns out to be a list, then something has gone seriously wrong somewhere. It may be that somwehere in your code there is a command that meddles with .Random.seed (or perhaps it was saved, in .Rdata, from a previous session where something went wrong). Or, worst scenario, your R installation is broken ... And, while I'm at it, your x=as.vector(c(1:12)) is overkill! Simply x - (1:12) ; sample(x,3) should be enough, or even sample((1:12),3) since (1:12) is a vector (of integers). Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 23-Jul-09 Time: 17:13:55 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error message: .Random.seed is not an integer vector but
On 23-Jul-09 16:08:17, Jim Bouldin wrote: Jim Bouldin wrote: Thank you. However, when I tried that, I got this message: Warning message: In rm(.Random.seed) : variable .Random.seed was not found In that case, have you attached some package that has its own .Random.seed? Try to find where the current .random.seed comes from R complains about. Uwe Ligges No, there are no attached packages, just the ones that load automatically. The R commander has some type of RNG but it is not loaded. Completely stumped. Follow-up to my previous reply (just posted). Having read the other responses and your reactions, try the following: rm(.Random.seed) set.seed(54321) ## (Or your favourite magic number) [*] x = as.vector(c(1:12)) ## To reproduce your original code ... ! sample(x,3) [*] When you did rm(.Random.seed) as suggested by Uwe, the variable .Random.seed was lost, so you have to create it again. If, after the above, you still get the problem, then something is very seriously wrong. Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 23-Jul-09 Time: 17:23:09 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random # generator accuracy
On 23-Jul-09 17:59:56, Jim Bouldin wrote: Dan Nordlund wrote: It would be necessary to see the code for your 'brief test' before anyone could meaningfully comment on your results. But your results for a single test could have been a valid random result. I've re-created what I did below. The problem appears to be with the weighting process: the unweighted sample came out much closer to the actual than the weighted sample (1% error) did. Comments? Jim x [1] 1 2 3 4 5 6 7 8 9 10 11 12 weights [1] 1 1 1 1 1 1 2 2 2 2 2 2 a = mean(replicate(100,(sample(x, 3, prob = weights;a # (1 million samples from x, of size 3, weighted by weights; the mean should be 7.50) [1] 7.406977 7.406977/7.5 [1] 0.987597 b = mean(replicate(100,(sample(x, 3;b # (1 million samples from x, of size 3, not weighted this time; the mean should be 6.50) [1] 6.501477 6.501477/6.5 [1] 1.000227 Jim Bouldin, PhD To obtain the result you expected, you would need to explicitly specify replace=TRUE, since the default for replace is FALSE. (though probably what you really intended was sampling without replacement). Read carefully what is said about prob in '?sample' -- when replace=FALSE, the probability of inclusion of an element is not proportional to its weight in 'prob'. The reason is that elements with higher weights are more likely to be chosen early on. This then knocks that higher weight out of the contest, making it more likely that elements with smaller weights will be sampled subsequently. Hence the mean of the sample will be biased slightly downwards, relative to the weighted mean of the values in x. table(replicate(100,(sample(x, 3 # 1 2 3 4 5 6 # 250235 250743 249603 250561 249828 249777 # 7 8 9 10 11 12 # 249780 250478 249591 249182 249625 250597 (so all nice equal frequencies) table(replicate(100,(sample(x, 3,prob=weights # 1 2 3 4 5 6 # 174873 175398 174196 174445 173240 174110 # 7 8 9 10 11 12 # 325820 326140 325289 325098 325475 325916 Note that the frequencies of the values with weight=2 are a bit less than twice the frequencies of the values with weight=1: (325820+326140+325289+325098+325475+325916)/ (174873+175398+174196+174445+173240+174110) # [1] In fact this is fairly easily caluclated. The possible combinations (in order of sampling) of the two weights, with their probabilities, are: 1s 2s --- 1 1 1 P = 6/18 * 5/17 * 4/163 0 1 1 2 P = 6/18 * 5/17 * 12/162 1 1 2 1 P = 6/18 * 12/17 * 5/152 1 1 2 2 P = 6/18 * 12/17 * 10/151 2 2 1 1 P = 12/18 * 6/16 * 5/152 1 2 1 2 P = 12/18 * 6/16 * 10/151 2 2 2 1 P = 12/18 * 10/16 * 6/141 2 2 2 2 P = 12/18 * 10/16 * 8/140 3 So the expected number of weight=1 in the sample is 3*(6/18 * 5/17 * 4/16) + 2*(6/18 * 5/17 * 12/16) + 2*(6/18 * 12/17 * 5/15) + 1*(6/18 * 12/17 * 10/15) + 2*(12/18 * 6/16 * 5/15) + 1*(12/18 * 6/16 * 10/15) + 1*(12/18 * 10/16 * 6/14) + 0 = 1.046218 Hence the expected number of weight=2 in the sample is 3 - 1.046218 = 1.953782 and their ratio 1.953782/1.046218 = 1.867471 Compare this with the value 1.867351 (above) obtained by simulation! Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 23-Jul-09 Time: 21:05:07 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random # generator accuracy
OOPS! The result of a calculation below somehow got omitted! (325820+326140+325289+325098+325475+325916)/ (174873+175398+174196+174445+173240+174110) # [1] 1.867351 to be compared (as at the end) with the ratio 1.867471 of the expected number of weight=2 to expected number of weight=1. Sorry. Ted. On 23-Jul-09 20:05:09, Ted Harding wrote: On 23-Jul-09 17:59:56, Jim Bouldin wrote: Dan Nordlund wrote: It would be necessary to see the code for your 'brief test' before anyone could meaningfully comment on your results. But your results for a single test could have been a valid random result. I've re-created what I did below. The problem appears to be with the weighting process: the unweighted sample came out much closer to the actual than the weighted sample (1% error) did. Comments? Jim x [1] 1 2 3 4 5 6 7 8 9 10 11 12 weights [1] 1 1 1 1 1 1 2 2 2 2 2 2 a = mean(replicate(100,(sample(x, 3, prob = weights;a # (1 million samples from x, of size 3, weighted by weights; the mean should be 7.50) [1] 7.406977 7.406977/7.5 [1] 0.987597 b = mean(replicate(100,(sample(x, 3;b # (1 million samples from x, of size 3, not weighted this time; the mean should be 6.50) [1] 6.501477 6.501477/6.5 [1] 1.000227 Jim Bouldin, PhD To obtain the result you expected, you would need to explicitly specify replace=TRUE, since the default for replace is FALSE. (though probably what you really intended was sampling without replacement). Read carefully what is said about prob in '?sample' -- when replace=FALSE, the probability of inclusion of an element is not proportional to its weight in 'prob'. The reason is that elements with higher weights are more likely to be chosen early on. This then knocks that higher weight out of the contest, making it more likely that elements with smaller weights will be sampled subsequently. Hence the mean of the sample will be biased slightly downwards, relative to the weighted mean of the values in x. table(replicate(100,(sample(x, 3 # 1 2 3 4 5 6 # 250235 250743 249603 250561 249828 249777 # 7 8 9 10 11 12 # 249780 250478 249591 249182 249625 250597 (so all nice equal frequencies) table(replicate(100,(sample(x, 3,prob=weights # 1 2 3 4 5 6 # 174873 175398 174196 174445 173240 174110 # 7 8 9 10 11 12 # 325820 326140 325289 325098 325475 325916 Note that the frequencies of the values with weight=2 are a bit less than twice the frequencies of the values with weight=1: (325820+326140+325289+325098+325475+325916)/ (174873+175398+174196+174445+173240+174110) # [1] In fact this is fairly easily caluclated. The possible combinations (in order of sampling) of the two weights, with their probabilities, are: 1s 2s --- 1 1 1 P = 6/18 * 5/17 * 4/163 0 1 1 2 P = 6/18 * 5/17 * 12/162 1 1 2 1 P = 6/18 * 12/17 * 5/152 1 1 2 2 P = 6/18 * 12/17 * 10/151 2 2 1 1 P = 12/18 * 6/16 * 5/152 1 2 1 2 P = 12/18 * 6/16 * 10/151 2 2 2 1 P = 12/18 * 10/16 * 6/141 2 2 2 2 P = 12/18 * 10/16 * 8/140 3 So the expected number of weight=1 in the sample is 3*(6/18 * 5/17 * 4/16) + 2*(6/18 * 5/17 * 12/16) + 2*(6/18 * 12/17 * 5/15) + 1*(6/18 * 12/17 * 10/15) + 2*(12/18 * 6/16 * 5/15) + 1*(12/18 * 6/16 * 10/15) + 1*(12/18 * 10/16 * 6/14) + 0 = 1.046218 Hence the expected number of weight=2 in the sample is 3 - 1.046218 = 1.953782 and their ratio 1.953782/1.046218 = 1.867471 Compare this with the value 1.867351 (above) obtained by simulation! Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 23-Jul-09 Time: 21:05:07 -- XFMail -- E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 23-Jul-09 Time: 21:15:52 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random # generator accuracy
! Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 23-Jul-09 Time: 21:05:07 -- XFMail -- Jim Bouldin, PhD Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 23-Jul-09 Time: 23:00:56 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random # generator accuracy
On 23-Jul-09 22:16:39, Thomas Lumley wrote: On Thu, 23 Jul 2009 ted.hard...@manchester.ac.uk wrote: The general problem, of sampling without replacement in such a way that for each item the probability that it is included in the sample is proportional to a pre-assigned weight (sampling with probability proportional to size) is quite tricky and, for certain choices of weights, impossible. For a glimpse of what's inside the can of worms, have a look at the reference manual for the 'sampfling' package, in particular the function samprop(): The 'sampling' package also provides a nice selection of ways to draw PPS samples without replacement. -thomas Thanks, Thomas. There is of course also the package 'pps'. Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 24-Jul-09 Time: 00:00:35 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Correction.
On 21-Jul-09 01:24:04, Rolf Turner wrote: It has been pointed out to me that I erred in an earlier post. ``Go stick your head in a pig.'' is not the motto of the (entire) Sirius Cybernetics Corporation. It is the motto if the Sirius Cybernetics Corporation ***Complaints Division***. My apologies for the misinformation. cheers, Rolf Turner Don't worry, Rolf. It wasn't your fault. You have provided a valid response to a Randomised Conceptualisation Trial (which involves applying Thought Interventions to participating subjects). If you had the impression that the Intervention was not random in your case, be assured that this was as intended. The Sirius Cybernetics Corporation (SCC) will record your response in advanced computers equipped with the SCC's groundbreaking WOM (Write-Only Memory). SCC take very seriously the problem of disposal of computers which cease to function after a certain amount of use, and are working to develop the enhanced EWOM (Erasable Write-Only Memory) module. SCC are proud to have been associated with the development of the unsinkable submarine.[*] Yours cosmically, Z.B. [*] Such things are not fiction. They occur in real life: The prime minister responded: 'We have done everything that we can to increase the number of helicopters and there will be more Merlin helicopters on the ground.' http://news.bbc.co.uk/1/hi/uk/8151244.stm E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 21-Jul-09 Time: 10:41:38 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Zinb for Non-interger data
On 18-Jul-09 17:26:36, JPS2009 wrote: Sorry bit of a Newbie question, and I promise I have searched the forum already, but I'm getting a bit desperate! I have over-dispersed, zero inflated data, with variance greater than the mean, suggesting Zero-Inflated Negative Binomial - which I attempted in R with the pscl package suggested on http://www.ats.ucla.edu/stat/R/dae/zinbreg.htm However my data is non-integer with some pesky decimals (i.e. 33.12) and zinb / pscl doesn't like that - not surprising as zinb is for count data, normally whole integers etc. Does anyone know of a different zinb package that will allow non-integers or and equivalent test/ model to zinb for non-integer data? Or should I try something else like a quasi-Poisson GLM? Apologies for the Newbie question! Any help much appreciated! Thanks! The presence of decimals suggests that those data values are records of quantities which ought to be modelled as continuous variables. For instance, in answer to a survey question How much did you spend on alcoholic drinks yesterday, the answer would be either a positive sum of money (with decimals), or zero, depending on whether the person spent anything at all on alcohol. So: With probability p, the amount spent was positive and, conditional on being positive, has a distribution which can be modelled by a particular continuous distribution (maybe Log-normal?). With probability (1-p), the amount spent was zero. So a correct approach first requires you to face the question of how to model the positive part of the distribution. Once you have settled that question, it is then possible to see whether that particular class of problem is covered by some package in R, or whether you need to develop an approach yourself. In any case, if I am barking up the right tree above, neither negative binomial nor Poisson would, in principle, be correct for such data since, as you observe, these are intended for count data, not for data which is essentially continuous. Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 19-Jul-09 Time: 12:25:39 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (-8)^(1/3) == NaN?
On 18-Jul-09 22:04:57, Dave DeBarr wrote: Why does the expression (-8)^(1/3) return NaN, instead of -2? This is not answered by http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-are-powers-of-negative- numbers-wrong_003f Thanks, Dave Because R does not try to evaluate (-8)^(1/3), but (-8)^x, where x is a very close approximation to 1/3 but is not exactly 1/3 (which is impossible in a finite binary representation). But even if it could exactly represent 1/3, R would still need to have a special look-up for certain fractional powers (1/3, 1/5, ... ) to enable it to recognise that these are odd-integer-roots of negatgive numbers, and therefore can be evaulated as -(nth_root(abs(x))). It doesn't help, either, to try to do it in complex numbers, since (-8) will then be seen as 8*exp(i*pi) whose cube root will be found as 2*exp(i*pi/3) = 2*(cos(pi/3) + i*sin(pi/3)) = 2*(1/2 + i*sqrt(3)/2): (complex(1,-8,0)) # [1] -8+0i complex(1,-8,0)^(1/3) # [1] 1+1.732051i (8*exp(complex(1,0,pi)))^(1/3) # [1] 1+1.732051i sqrt(3) # [1] 1.732051 I'm not sure what best to suggest for your situation. Basically, if it is in a context where it can only be (negative number)^(1/(odd integer)) then you are better off modifying the logic of your program so as to ensure the result you want. Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 18-Jul-09 Time: 23:54:11 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Package norm has been removed. What to use for Maximum L
On 17-Jul-09 12:12:01, Uwe Ligges wrote: Achim Zeileis wrote: On Fri, 17 Jul 2009, Daniel Abbott wrote: Hello, I apologize if an answer to my questions is available, or if I submitted this question incorrectly. I have read the mailing lists, as well as the R Project and CRAN homepages. However, I may have missed something. I noticed the package 'norm' has been removed. Its page http://cran.r-project.org/web/packages/norm/index.html now reads: Package ?norm? was removed from the CRAN repository. Formerly available versions can be obtained from the archive. My questions are: 1. Why was norm removed? I used the package and found it to be very useful. Is there a serious error in the package or another problem? What do you consider to be a serious error? Typical reasons for removing a package from the active CRAN repository are that the package does not pass CRAN checks and/or that the package maintainer is irresponsive. (norm, specifically, hadn't been updated since 2002.) This is serious enough to make all automatic CRAN features such as daily package checks and building of binary packages too cumbersome. But if you do not consider this to be serious, you can keep on using norm, it is still in CRAN's package archives. Or, even better, you could adopt it, fix it, and release a new version to CRAN (although it is not quite clear to me whether the norm's license allows this). Oh yes, indeed. Please ignore my former message, I haven't looked at the license carefully enough. And since the license is rather special (not GPL'ed as I assumed in my message), you cannot take over maintainership without negotiation with the former maintainer, I fear. Best, Uwe Ligges On a point of information: The licence in question: License: The software may be distributed free of charge and used by anyone if credit is given. It has been tested fairly well, but it comes with no guarantees and the authors assume no liability for its use or misuse. is verbatim from Joe Shafer's original licence for his NORM. He used exactly the same wording for his packages CAT, MIX and PAN. These were originally written in S, with FORTRAN code for many of the functions. The various people who have ported these to R have simply copied these words into the R packages. See (if you have the packages installed) library(help=cat) library(help=norm) library)help=mix) I may have some comments about this removed from CRAN issue later, but I need to think about them first ... Best wishes to all, Ted. (Note that CRAN runs no checks whether the methods implemented are reasonable or well-suited for the problem they are trying to address etc.) 2. If norm should no longer be removed, what is another good package for ML (maximum likelihood) missing data imputation? I have seen several recommendations online, but I wonder which one is currently the reigning champ. The Multivariate and the SocialSciences task views have sections about missing data http://CRAN.R-project.org/view=Multivariate http://CRAN.R-project.org/view=SocialSciences Additionally, I can recall that there is the Amelia II package: http://CRAN.R-project.org/package=Amelia and potentially others. hth, Z Thank you very much. I appreciate your time! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 17-Jul-09 Time: 13:38:23 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Package norm has been removed. What to use for Maximum L
Follow-up: On 17-Jul-09 12:38:27, Ted Harding wrote: On a point of information: The licence in question: License: The software may be distributed free of charge and used by anyone if credit is given. It has been tested fairly well, but it comes with no guarantees and the authors assume no liability for its use or misuse. is verbatim from Joe Shafer's original licence for his NORM. He used exactly the same wording for his packages CAT, MIX and PAN. These were originally written in S, with FORTRAN code for many of the functions. The various people who have ported these to R have simply copied these words into the R packages. See (if you have the packages installed) library(help=cat) library(help=norm) library)help=mix) I may have some comments about this removed from CRAN issue later, but I need to think about them first ... Best wishes to all, Ted. While Joe Shafer's MI software web page http://www.stat.psu.edu/~jls/misoftwa.html as cited in the R help pages, still exists (though apparently still dating from 1999), only the Windows versions of NORM, CAT, MIX and PAN are still accessible. The link to the Unix versions: S-PLUS for Unix: http://www.stat.psu.edu/~jls/splunix.html no longer leads anywhere (though it still did only a few years ago). I don't know whether it has been moved, and can be found elsewhere, ot whether Shafer has withdrawn it. The Windows versions would be compiled code. The Unix versions were source code and needed to be compiled. If Shafer has withdrawn the source-code versions, this might have implications for use (these underlie the S-Plus missing library). Does anyone have information about this? Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 17-Jul-09 Time: 13:54:45 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Package norm has been removed. What to use for Maximum L
On 17-Jul-09 13:01:46, Gabor Grothendieck wrote: On Fri, Jul 17, 2009 at 8:54 AM, Ted Hardingted.hard...@manchester.ac.uk wrote: Follow-up: On 17-Jul-09 12:38:27, Ted Harding wrote: On a point of information: The licence in question: License: The software may be distributed free of charge and used _ _ _ _ _by anyone if credit is given. It has been tested fairly _ _ _ _ _well, but it comes with no guarantees and the authors _ _ _ _ _assume no liability for its use or misuse. is verbatim from Joe Shafer's original licence for his NORM. He used exactly the same wording for his packages CAT, MIX and PAN. These were originally written in S, with FORTRAN code for many of the functions. The various people who have ported these to R have simply copied these words into the R packages. See (if you have the packages installed) _ library(help=cat) _ library(help=norm) _ library)help=mix) I may have some comments about this removed from CRAN issue later, but I need to think about them first ... Best wishes to all, Ted. While Joe Shafer's MI software web page _http://www.stat.psu.edu/~jls/misoftwa.html as cited in the R help pages, still exists (though apparently still dating from 1999), only the Windows versions of NORM, CAT, MIX and PAN are still accessible. The link to the Unix versions: S-PLUS for Unix: _http://www.stat.psu.edu/~jls/splunix.html no longer leads anywhere (though it still did only a few years ago). Its archived here: http://web.archive.org/web/20050212143644/http://www.stat.psu.edu/ ~jls/splunix.html Thanks, Gabor! Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 17-Jul-09 Time: 14:27:26 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with Conditional Logit
On 16-Jul-09 19:40:20, Noah Silverman wrote: Hello, I'm brand new to using R. (I've been using Rapid Miner, but would like to move over to R since it gives me much more functionality.) I'm trying to learn how to do a conditional logit model. My data has one dependent variable, 2 independent variables and a group variable. example: classv1v2group sick.3.71 well.2.91 sick.25 .41 sick.32 .62 well.21 .812 sick.5.3 2 Can someone help me formulate the proper command for the clogit() function? Thanks! Have a look at: http://finzi.psych.upenn.edu/R/library/survival/html/clogit.html You would need to install the 'survival' package. To search for this sort of thing in future, go to R Site Search at: http://finzi.psych.upenn.edu/nmz.html and search according to the available options. For your particular query, I unchecked R-help 2008- and Task views, leaving only Functions, and entered clogit into the search box. Up came various hits, amongst which the one suggested above. Welcome to R, and happy hunting in the future! Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 16-Jul-09 Time: 20:54:18 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predictive punishment module (was: Re: Simple cat statem
On 15-Jul-09 14:45:26, S Ellison wrote: Duncan Murdoch murd...@stats.uwo.ca 15/07/2009 15:04:29 * R has a new predictive punishment module. It punishes you for things it knows you will do later. Dang! Does that mean it'll return errors for the statistical idiocy I haven't yet got around to committing? ;-) Steve E It is even more twisted and sadistic than that. Being an Expert System on Human Psychology (how else could it succeed so well in putting the user on the wrong foot), it knows that the user will interpret the punishment as being related to the correct thing that the user just did, even though the punishment is for the thing that will be done later. Users will therefore develop an inhibition in respect of correct actions, and will be driven into incorrect behaviour. The incorrect thing that will be done later will be punished anyway. Since the complement to {an incorrect thing} is {correct thing} union {all the other incorrect things} the user will end up a) Confused b) Increasingly likely to do incorrect things, therefore increasingly likely to be punished, finally being punished on almost all occasions of action (predictively or not). So, now we know ... :( Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 15-Jul-09 Time: 16:21:53 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mann-whitney U-test - U-value
On 11-Jul-09 18:13:29, Martin Batholdy wrote: Hi, I know that I can perform a Mann-Whitney U test with wilcox.test(x,y) But instead of an U-value I get a W-Value. Is it possible to change this? The usual definition of the Man-Whitney U test for two samples x of size m) and y (of size n) is that U = number of pairs (x[i],y[j]) such that X[i] Y[j] For R's wilcox.text(X,Y), '?wilcox.test' says: R's value can also be computed as the number of all pairs '(x[i], y[j])' for which 'y[j]' is not greater than 'x[i]', the most common definition of the Mann-Whitney test. In other words, it is the complement of the definition of U above. Hence the standard U statistic can be obtained from R's W as U = m*n/2 - W since m*n/2 is the total number of possible pairs. (In my view, R is being a bit idiosyncratic there). Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 11-Jul-09 Time: 20:04:40 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] linear regression and testing the slope
On 08-Jul-09 12:29:40, evrim akar wrote: Dear All, First of all I would like to say I do not have much knowledge about this subject, so most of you can find it really easy. I am doing a linear regression and I want to test if the slope of the curve is 0. R gives the summary statistics: Call: lm(formula = x ~ s) Residuals: Min1QMedian3Q Max -0.025096 -0.020316 -0.001203 0.011658 0.044970 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 0.005567 0.016950 0.3280.750 s -0.001599 0.002499 -0.6400.538 Residual standard error: 0.02621 on 9 degrees of freedom Multiple R-squared: 0.04352,Adjusted R-squared: -0.06276 F-statistic: 0.4095 on 1 and 9 DF, p-value: 0.5382 what is this t-value for? The explanation in the help file was unfortunately not clear to me. How can I test my hypotheses that if the slope is 0? Thank you in advance, regards, Evrim The quantity 't' is the estimated value (-0.001599 for the slope 's') divided by its estimated standard error (0.002499). Taking the values as reported by the summary: t = -0.001599/0.002499 = -0.639856 which R has reported (to 3 significant figures) as -0.640 The Pr(|t|) is the probability, assuming the null hypothesis that the slope (coefficient of 's') is zero, that data could arise at random giving rise to a t-value which, in absolute value, would exceed the absolute value |t| = |-0.639856| = 0.639856 which you got from your data. The relevance of this for testing the hypothesis that the slope is 0 is that, if the slope really is 0, then large values (either way) of the coefficient of 's' (reported by R as Estimate) are unlikely. So if you got a value of Pr(|t|) which was small (conventionally less that 0.05, or 0.01, etc.) then you would have a value so large that getting a value at least as large as this if the hypothesis were true would be unlikely. Therefore it would be more plausible that the null hypothesis was false. In your case, the P-value Pr(|t|) = 0.538, so you would be more likely than not to get an estimate at least as deviant from 0 as the one you did get, if the null hypothesis were true. Hence the data do not provide grounds for rejecting the null hypothesis. Note that not having grounds for rejection does not mean that you must accept it: a non-signifcant t-value is not proof that the null hypothesis is true. There is a good basic outline of the t-test in the Wikipedia article Student's t-test: http://en.wikipedia.org/wiki/Student%27s_t-test Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 08-Jul-09 Time: 14:17:52 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Uncorrelated random vectors
Be careful to be clear what you are referring to when you say correlation is zero. The commands x - rnorm(100) y - rnorm(100) will produce two vectors of given length (100) which (to within the effectively ignorable limitations of the ransom number generator) will have been produced independently. Hence the *theoretical* correlation is zero. If that is what you meant, then it is already answered. When you compute cor(x,y), however, the answer will in general be non-zero (though only rarely significantly so). This is because the numbers produced by the second independent run of rnrom() have an almost zero probability of producing two vectors for which the value of cor(x,y) = 0. However, possibly you mean that you want two vectors for which the result of cor(x,y) = 0. One way to achieve this is along the following lines: set.seed(54321) x - rnorm(100) y0 - rnorm(100) My - mean(y0) Sy - sd(y0) y1 - lm(y0 ~ x)$res ; y1 - y1/sd(y1) y - My + y1*Sy mean(y0) # [1] 0.04497584 mean(y) # [1] 0.04497584 sd(y0) # [1] 0.848231 sd(y) # [1] 0.848231 cor(x,y0) # [1] 0.05556468 cor(x,y) # [1] 6.451072e-18 [effectively 0, to within rounding error] In this case, however, note that [A]: The 100 elements of y, given the values of x, are NOT independent of each other, since they satisfy the linear constraint x[1]*y[1] + x[2]*y[2] + ... + x[100]*y[100] - (sum(x))*[y[1] + y[2] + ... + y[100])/100 = 0 and therefore can vary only in 99 dimensions, not 100. Nor are they independent of the values of x (even though the numerical correlation is 0). On the other hand, the values of y0 are independent of the values of x, and of each other. You need to be very clear why you want to have two vectors x,y such that cor(x,y) = 0, since otherwise you are at risk of carrying out an invalid analysis. Hoping this helps, Ted. On 07-Jul-09 14:26:02, Stein, Luba (AIM SE) wrote: Thank you for your help! But is it possible to produe two vectors x and y with a given length such that there correlation is zero. For me ist not enough just to simulate two vectors with there correlation. Thank you, Luba -Urspr?ngliche Nachricht- Von: ONKELINX, Thierry [mailto:thierry.onkel...@inbo.be] Gesendet: Dienstag, 7. Juli 2009 15:51 An: Stein, Luba (AIM SE); r-help@r-project.org Betreff: RE: [R] Uncorrelated random vectors cor.test(rnorm(1), rnorm(1)) ir. Thierry Onkelinx ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Stein, Luba (AIM SE) Verzonden: dinsdag 7 juli 2009 15:46 Aan: r-help@r-project.org Onderwerp: [R] Uncorrelated random vectors Hello, is it possible to create two uncorrelated random vectors for a given distribution. In fact, I would like to have something like the function rnorm or rlogis with the extra property that they are uncorrelated. Thanks for your help, Luba E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 07-Jul-09 Time: 16:48:06 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test for X=1 fails, test for 0 works, data in text file
On 07-Jul-09 19:06:59, Mark Knecht wrote: Hi, I am apparently not understanding some nuance about either the use of subset or more likely my ability to test for a numerical match using '='. Which is it? Thanks in advance. It looks as though you have tripped over the distinction between = and ==. The first is (in effect) an assignment operator (like -). The second is a comparison operator: X==Y is TRUE if X equals Y. So, for example, to select rows from MyResults according to your criteria below, you could do something like: MyResultsNeg - MyResults[(MyResults$PosType==(-1)),] MyResultsPos - MyResults[(MyResults$PosType==1 ),] [Note: the parentheses (...) are in fact logically superfluous, but I like to use them (a) to make it absolutely clear, visually, how things are being evaluated; (b) (not relevant in this particular context) to avoid falling into traps like N - 10 ; X - 1:N-1 which results in X == (0:9) = (1:10) - 1. The correct syntax for the latter would be X - 1:(N-1).] Hoping this helps, Ted. I've read a data file, reshaped it and then created MyResults by keeping only lines where the value column is greater than 0. So far so good. The data in MyResults looks good to me by eye. The problem comes in when I try to further subset MyResults into two files, one with PosType=1 and the other with PosType=-1. Looking at the dimension of the results there's no change. It didn't work which is verified by eye. However if I test for PosType=1 by actually testing for PosType0 then it does work. Is this the general case in R that if I've read data that was written into a csv file as 1 - it is 1 if I look into the file with a text editor - that I cannot test for that? Or is some case where I need to use maybe as.numeric or something else first to ensure R sees the number the way I'm thinking about the number? Cheers, Mark dim(X) [1] 25 425 ReShapeX - melt(X, id = c(Trade, PosType, EnDate, EnTime, ExDate, ExTime, PL_Pos, Costs, Save2)) dim(ReShapeX) [1] 1040011 MyResults - subset(ReShapeX, value 0) dim(MyResults) [1] 1105 11 MyResults.GroupA - subset(MyResults, PosType = 1) dim(MyResults.GroupA) [1] 1105 11 MyResults.GroupB - subset(MyResults, PosType = -1) dim(MyResults.GroupB) [1] 1105 11 MyResults.GroupA - subset(MyResults, PosType 0) dim(MyResults.GroupA) [1] 432 11 MyResults.GroupB - subset(MyResults, PosType 0) dim(MyResults.GroupB) [1] 673 11 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 07-Jul-09 Time: 20:28:44 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] r-project.org address blacklisted by anti-spam software
On 07-Jul-09 21:59:41, Hans W Borchers wrote: Dear List: An e-mail mentioning the r-project.org address and sent to a friend at a German university was considered spam by the local spam filter. Its reasoning: the URL r-project.org is blacklisted at uribl.swinog.ch resp. at antispam.imp.ch. I checked the list http://antispam.imp.ch/swinog-uri-rbl.txt [caution: long list] and indeed, there it was. Can anybody explain how or why the R project made its way onto the list? Is it reasonable to file a protest? --Hans Werner This is clearly a very undiscriminating blacklist. My own ISP (zen.co.uk, highly respected) is listed! I am reminded of the scandalous blacklist maintained by SORBS ( http://www.us.sorbs.net ) -- now happily, it seems, about to die -- whose motive seemed to be to blacklist on the slightest pretext in order to demand exSORBitant fees for removal ($25 per *reported* spam). At one time, the UK academic community subscribed to SORBS, with the effect that I was unable to email to any UK academic institution because my IP address, dynamically allocated on dial-up by British Telecom, was balcklisted by SORBS. I found a work-round, but ... . It seems the UK academic community dropped SORBS, and now similar abandonment (see at above URL) may prompt the welcome closure of SORBS. General information about this blacklist can be found by going one up on your URL to: http://antispam.imp.ch/ Your German will be better than mine, so you will be better placed to try to find out what may be involved in getting r-project.org de-listed. Another suggestion is to ask your friend to try to get his University to abandon antispam.imp.ch altogether, on the grounds that it is preventing a perfectly honest domain (with importance for the work of the University) from communicating with people in the University. And the best of luck. In my experience, sometimes the administrators of computing resources can be less interested in supporting their users than in ticking the boxes handed down by their management -- who, I hope, may evolve to thinking outside the box, going forward.[*] Good luck! Ted. [*] With acknowledgements to Lucy Kellaway: http://www.ft.com/cms/s/0/3cb5f0fa-8965-11dc-b52e-779fd2ac.html See also: http://www.ft.com/cms/s/4238cace-b6f2-11dc-aa38-779fd2ac.html http://www.ft.com/cms/s/48e4569a-cb56-11dc-97ff-77b07658.html and, last but not least: http://www.ft.com/cms/s/7880e774-99f1-11dc-ad70-779fd2ac.html E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 08-Jul-09 Time: 00:18:25 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hartigan's Dip test
On 06-Jul-09 12:39:29, James Allsopp wrote: Hi, I just got a value for the dip test out of my data of 0.074 for a sample size of 33. I'm trying to work out what this actually means though? Could someone help me relate this to a p-value? Thanks James Have a look at http://finzi.psych.upenn.edu/R/library/diptest/html/qDiptab.html Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 06-Jul-09 Time: 13:50:25 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Computer Modern
On 02-Jul-09 09:06:44, Mr Derik wrote: I am trying to use computer modern fonts in postscript files for a latex document. Ultimately I want to automate this through sweave. I've read the documentation ans have tried the following code to use lattice to produce a graph using computer modern: library(lattice) library(grid) testPlot=( xyplot(seq(1:10) ~ seq(1:10), main=one to ten, xlab=the quick fox, ylab=jumped over the lazy brown dog, xlim=c(0,1), ylim=c(0,1), col=black, type=l , lwd=2 ) ) setwd(C:\\R_folder\\CMtests) postscript(cm_test.eps, width = 4.0, height = 3.0, horizontal = FALSE, onefile = FALSE, paper = special, family = ComputerModern, encoding = TeXtext.enc) print(testPlot) dev.off() This produces a plot with courier. I am using R 2.9.0 on a windows XP machine. I did manage to produce one plot with CM as the font so I know it's possible with my set up. I can't get back to that. Please help me with the code. Thank You I think you may need to also use the fonts pAramater to postscript(). See in '?postscript': fonts: a character vector specifying additional R graphics font family names for font families whose declarations will be included in the PostScript file and are available for use with the device. See 'Families' below. Defaults to 'NULL'. Since the Computer Modern family is most probably not built in to your printer, the PostScript file will need to include font definitions for these fonts. If I understand aright, this is what would be achieved by appropriate use of the fonts parameter. If the font definitions are not included, the calls for them will not be recognised by the printer which may then substitute a default (likely to be Courier). See also the section TeX fonts in '?postscript'. Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 02-Jul-09 Time: 11:59:29 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (no subject)^2
Putting your two queries together [see revised Subject ... ]: [R] (no subject)^1: Could you please help me? I am trying to load an csv-file in R, but it works wrong! My data is 0,0127 -0,0016 0,0113 0,0037 -0,0025 Ret X0 X0127 1 016 2 0 113 3 037 4 025 In this case, you need to use the header=FALSE option to read.csv(): Ret-read.csv(Ret.csv, header=FALSE) since the default for read.csv() is header=TRUE, so it assigns names to the variables (col1 and col2) according to what is in the first line. Since your first line had numeric data, it appends these to X. Read what you get from ?read.csv [R] (no subject)^2 Hi Guys, It is very simple question, but I can't find the answer! Please help me. I use R and such simple function as length() doesn't work. The result is always 1 even if my data are more then 1 observations! Do I have to load any additional library? length(Ret_1) [1] 1 length function (x) .Primitive(length) Thank you!!! I surmise that Ret_1 is the result of a command similar to your Ret-read.csv(Ret.csv) Hence it is a dataframe. If I use your Ret-read.csv(Ret.csv) command with your Ret.csv data, I get, as you did: Ret # X0 X0127 # 1 016 # 2 0 113 # 3 037 # 4 025 Here, Ret is a dataframe with two columns. In fact, a dataframe is a special kind of list, one element for each column: str(Ret) # 'data.frame': 4 obs. of 2 variables: # $ X0 : int 0 0 0 0 # $ X0127: int 16 113 37 25 and each $ is one element of the list: $X0 and $X0127. Each element is a vector of numbers in the case of your Ret. You can see what they are separately by using the $ operator to extract them: Ret$X0 # [1] 0 0 0 0 Ret$X0127 # [1] 16 113 37 25 So Ret is a list with 2 elements. Hence: length(Ret) # [1] 2 Therefore, in the case of your Ret_1, I surmise that your Ret_1 is a list with one element (as Sarah Goslee surmised also). In other words, if you constructed Ret_1 by reading from a CSV file, as in Ret-read.csv(Ret.csv) you will have a dataframe with 1 column, namely a list with one element. Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 02-Jul-09 Time: 15:49:56 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Name of data.frame as a text string?
On 02-Jul-09 19:00:44, Mark Knecht wrote: I've passed a data.frame as an input into a function which does some plotting on a 4x4 matrix of certain parameters within in the data.frame. I'd like to add a small header on top of each plot with the name of the data.frame so that it's clear as I compare these 16 things where each on came from. So far I haven't found the right way to get the name of the data.frame as a string which I can use in something like mtext. Is there one? If I put dummy text in, or take the time to pass in the name by hand, then I do get titles just as I'd like, but I'd far rather let the name of the data.frame speak for itself. Thanks, Mark One way to do this (which is how I usually do it) is to set up the dataframe name as a string variable, and then use this as required. For instance: datafr - My1stDF DF - read.csv(paste(datafr,.csv,sep=) {do things} plot(whatever,main=paste(Data from, datafr),...) which will read the CSV file whose name corresponds to what you have set 'datafr' to, and then put a corresponding header into the plot. This is a particularly useful technique if you want to loop through several datasets, on the lines of DataFrs - c(My1stDF, My2ndDFD, My3rdDF, My4thDF) for( datafr in DataFrs ) { {stuff like the above} } Several variants of this kind of approach are possible! Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 02-Jul-09 Time: 20:21:43 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fitting in logistic model
yield identical results. In fact, a bit more probing to GLM shows that there is already a discrepancy at the score level: S0 - GLM$linear.predictors max(abs(S0-S)) # [1] 8.881784e-16 so S0 has not been calculated by applying GLM$coef to X. Also, if we apply the inverse link to S0, then: max(abs(Fit0 - exp(S0)/(1+exp(S0 # [1] 1.110223e-16 which is the same discrepancy as between Fit1 and Fit0. max(abs(Fit1 - exp(S0)/(1+exp(S0 # [1] 1.110223e-16 the same again!! Hence, if I have understood him aright, Fabrizio's question. Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 30-Jun-09 Time: 16:44:56 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I change which R Graphics Device is active?
On 30-Jun-09 16:48:15, Barry Rowlingson wrote: To keep track, call dev.cur() after creating a new plot and assign it to something memorable. Here's how to keep a histogram and an xy plot: dev.new();histPlot = dev.cur() dev.new();xyPlot = dev.cur() dev.set(histPlot);hist(runif(100)) X11cairo 2 dev.set(xyPlot);plot(1:10) X11cairo 3 You just have to remember to call dev.set before your particular plot. Barry Barry, spot on! If there were a prize for postings with simplicity, clarity, conciseness and usability, I would nominate yours (especially in the Why didn't *I* think of that? category). Cheers, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 30-Jun-09 Time: 18:33:40 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fitting in logistic model
On 30-Jun-09 17:41:20, Marc Schwartz wrote: On Jun 30, 2009, at 10:44 AM, Ted Harding wrote: On 30-Jun-09 14:52:20, Marc Schwartz wrote: On Jun 30, 2009, at 4:54 AM, Renzi Fabrizio wrote: I would like to know how R computes the probability of an event in a logistic model (P(y=1)) from the score s, linear combination of x and beta. I noticed that there are differences (small, less than e-16) between the fitting values automatically computed in the glm procedure by R, and the values manually computed by me applying the reverse formula p=e^s/(1+e^s); moreover I noticed that the minimum value of the fitting values in my estimation is 2.220446e-16, and there are many observation with this probability (instead the minimum value obtained by manually estimation is 2.872636e-152). It would be helpful to see at least a subset of the output from your model and your manual computations so that we can at least visually compare the results to see where the differences may be. The model object returned from using glm() will contain both the linear predictors on the link scale (model$linear.predictors) and the fitted values (model$fitted.values). The latter can be accessed using the fitted() extractor function. To use an example, let's create a simple LR model using the infert data set as referenced in ?glm. model1 - glm(case ~ spontaneous + induced, data = infert, family = binomial()) model1 Call: glm(formula = case ~ spontaneous + induced, family = binomial(), data = infert) Coefficients: (Intercept) spontaneous induced -1.7079 1.1972 0.4181 Degrees of Freedom: 247 Total (i.e. Null); 245 Residual Null Deviance:316.2 Residual Deviance: 279.6 AIC: 285.6 # Get the coefficients coef(model1) (Intercept) spontaneous induced -1.7078601 1.1972050 0.4181294 # get the linear predictor values # log odds scale for binomial glm head(model1$linear.predictors) 1 2 3 4 5 6 1.10467939 -1.28973068 -0.87160128 -0.87160128 -0.09252564 0.32560375 You can also get the above by using the coefficients and the model matrix for comparison: # the full set of 248 # coef(model1) %*% t(model.matrix(model1)) head(as.vector(coef(model1) %*% t(model.matrix(model1 [1] 1.10467939 -1.28973068 -0.87160128 -0.87160128 -0.09252564 0.32560375 # get fitted response values (predicted probs) head(fitted(model1)) 1 2 3 4 5 6 0.7511359 0.2158984 0.2949212 0.2949212 0.4768851 0.5806893 We can also get the fitted values from the linear predictor values by using: LP - model1$linear.predictors head(exp(LP) / (1 + exp(LP))) 1 2 3 4 5 6 0.7511359 0.2158984 0.2949212 0.2949212 0.4768851 0.5806893 You can also get the above by using the predict.glm() function with type == response. The default type of link will get you the linear predictor values as above. predict.glm() would typically be used to generate predictions using the model on new data. head(predict(model1, type = response)) 1 2 3 4 5 6 0.7511359 0.2158984 0.2949212 0.2949212 0.4768851 0.5806893 In glm.fit(), which is the workhorse function in glm(), the fitted values returned in the model object are actually computed by using the inverse link function for the family passed to glm(): binomial()$linkinv function (eta) .Call(logit_linkinv, eta, PACKAGE = stats) environment: 0x171c8b10 Thus: head(binomial()$linkinv(model1$linear.predictors)) 1 2 3 4 5 6 0.7511359 0.2158984 0.2949212 0.2949212 0.4768851 0.5806893 So those are the same values as we saw above using the other methods. So, all is consistent across the various methods. Perhaps the above provides some insights for you into how R does some of these things and also to point out, as is frequently the case with R, there is more than one way to get the same result. HTH, Marc Schwartz An interesting and informative reply, Marc; but I think it misses the point of Fabrizio's query. I think Fabrizio's point is the following: set.seed(54321) X - sort(rnorm(100)) a0 - 1.0 ; b0 - 0.5 Y - 1*(runif(100) a0 + b0*X) GLM - glm(Y~X,family=binomial) C - GLM$coef C # (Intercept) X #2.7269662.798429 a1 - C[1] ; b1 - C[2] Fit0 - GLM$fit S - a1 + b1*X Fit1 - exp(S)/(1+exp(S)) max(abs(Fit1 - Fit0)) # [1] 1.110223e-16 This discrepancy is of course, in magnitude, a typical rounding error. But the fact that it occurred indicates that when glm() computed the fitted values it did not do so by using the fitted coefficients GLM$coef, then creating the fitted score (linear predictor) S (as above), then applyhing to this the inverse link exp(S)/(1+exp(S)), since doing that would replicate
Re: [R] read.csv, header=TRUE but skip the first line
On 28-Jun-09 21:05:59, Mark Knecht wrote: Hi, Complete newbie to R here. Just getting started reading manuals and playing with data. I've managed to successfully get my *.csv files into R, however I have to use header=FALSE because the real header starts in line #2. The file format looks like: PORTFOLIO EQUITY TABLE TRADE,MARK-SYS,DATE/TIME,PL/SIZE,PS METHOD,POS SIZE,POS PL,DRAWDOWN,DRAWDOWN(%),EQUITY 1,1,1/8/2004 12:57:00 PM,124.00,As Given,1,124.00,0.00,0,10,124.00 2,1,1/14/2004 9:03:00 AM,-86.00,As Given,1,-86.00,86.00,0.849,10,038.00 3,1,1/14/2004 11:51:00 AM,-226.00,As Given,1,-226.00,312.00,3.082,9,812.00 4,1,1/15/2004 12:57:00 PM,134.00,As Given,1,134.00,178.00,1.758,9,946.00 where the words PORTFOLIO EQUITY TABLE make up line 1, the rest of the text is on line 2, and then the lines starting with numbers are the real data. (Spaces added by me for email clarity only.) If I remove the first line by hand then I can use header=TRUE and things work correctly, but it's not practical for me to remove the first line by hand on all these files every day. I'd like to understand how I can do the read.csv but skip the first line. Possibly read the file, delete the first line and then send it to read.csv, or some other way? Thanks in advance, Mark Simply use the option skip=1, as opposed to the default skip=0. This then skips the first line of the file and only starts reading at line 2. With header=TRUE (which is the default for read.csv() anyway), the first line read in (i.e. line 2 of the file) will be taken as the header, and the remainder as data. You should read what it output by ?read.csv One thing that may be tricky for a beginner to get their head round is that this is *really* the help page for read.table(), and that read.csv() is in fact a front end for read.table() with different defaults. In particular, whereas read.table() has default header=FALSE, read.csv() has default header=TRUE. Also, of course, where read.table() has sep= (i.e. white space), read.csv() has sep=,. Other options for read.csv() which are not mentioned specifically in the usage line for read.csv() (i.e. are subsumed in ...) are the same as options mentioned in the usage line for read.table() and have the same defaults. So, implicitly, skip is an option for read.csv() just as it is for read.table(), and it has the same default, namely skip=0. So you can set it to skip=1 just as you can for read.table() and it will work in the same way. This is stated in ?read.csv to be: skip: integer: the number of lines of the data file to skip before beginning to read data. This is potentially misleading because of the final word data, since a beginner might think this referred to the real data part of the file (i.e. what follows the header), when header=TRUE as in read.csv(). More explicitly, it could be written: skip: integer: the number of lines of the data file to skip before beginning to read the lines in the file. Hoping this helps! Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 28-Jun-09 Time: 22:37:46 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ANOVA with means and SDs as input
On 25-Jun-09 10:15:30, Sebastian Stegmann wrote: Dear R-community, I'm struggling with a paper that reports only fragmented results of a 2by2by3 experimental design. However, Means and SDs for all cells are given. Does anyone know a package/function that helps computing an ANOVA with only Means and SDs as input (resulting in some sort of effect size and significance test)? No hope of that, unless you also have the numbers (N) of observations in each cell (which of course may be equal across cells). You can get a mean and an SD with anything from N=2 upwards. If you do have the Ns, then one way is to construct an artificial sample for each cell, such that the mean and the SD of each is equal to the given mean and SD for the cell. Then you can submit the resulting reconstructed data to a standard lm() in R. and carry on from there. The basic construct is: Let a given cell have N data, mean=M, SD=S. X0 - rnorm(N) X - M + S*(X0 - mean(X0))/sd(X0) Assuming you are going to do a standard normal ANOVA, the result of operating with the above artificial data is algebraically the same as if you had the true original data. Hoping this helps! Ted. The reason why I'm interested is simple: I'm conducting a meta-analysis. If I included only what the authors would like to show the world, the results would be somewhat biased... I've combed through the web and my various books on R, but it's not that easy to find. Or is it? Thanks for your help! Sebastian E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 25-Jun-09 Time: 11:51:38 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to draw a line in plot when I know the start point(x
On 25-Jun-09 18:38:37, Marc Schwartz wrote: On Jun 25, 2009, at 1:30 PM, Lesandro wrote: Hello all, How to draw a line in plot when I know the start point(x1,y1) and end point(x2,y2)? I need make this as additional information in the graph: plot(wl2[[1]],wl2[[2]]) I think that is possible make this with the function abline(), is possible? I looked the function lines() too, but don't understand as make. Thanks! Lesandro See ?segments which does just what you are looking for. lines() is more designed for a series of connected lines (eg. a polygon) rather than a single line segment. abline() can draw a straight line, at a given vertical or horizontal position, or if given a linear model object, the fitted line. HTH, Marc Schwartz Hmm ... for this particular purpose I don't see what is wrong with plot(wl2[[1]],wl2[[2]]) lines(c(x1,x2),c(y1,y2)) along with any additional paramaters to lines() for line-type, colour, etc. -- I do this all the time ... Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 25-Jun-09 Time: 19:51:20 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to change ONLY the first character of each variable
On 18-Jun-09 23:24:39, Mark Na wrote: Dear R-helpers, I would like to adapt the following code names(data)-sub(M,MOLE,names(data)) which changes any occurrence of M (in my variable names) to MOLE such that it ONLY operates on the first character of each variable name, i.e. M will only be changed to MOLE if it's the first character of a variable. I would appreciate any help you might provide. Thanks! Mark Na M - MATMAN sub(^M,MOLE,M) # [1] MOLEATMAN AM - AMATMAN sub(^M,MOLE,AM) # [1] AMATMAN Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 19-Jun-09 Time: 00:46:40 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a proposal regarding documentation
On 14-Jun-09 22:23:24, Gabor Grothendieck wrote: In PHP and also in MySQL the manual has a wiki capability so that users can add notes at the end of each page, e.g. http://www.php.net/manual/en/functions.variable-functions.php http://dev.mysql.com/doc/refman/4.1/en/update.html That would combine documentation and wiki into one. Here it would involve copying the R help pages into the wiki in a readonly mode with the writeable wiki portion at the end of each such page. It would also be necessary to be able to update the help pages in the wiki when new versions became available. That is an interesting and potentially very fruitful idea. I'd like to propose a quasi-readonly mode in which wiki users could interpolate the comments at chosen points within the original text of a help-page (but without any alteration of the original), with the interpolation clearly marked by some kind of highlighting (e.g. change of font or colour). In that way, a user reading through the original help-page on the wiki would encounter, in context and in timely fashion, comments by other users who had themselves met difficulties at that point and who (hopefully) have provided more detailed guidance and/or links to other parts of the documentation which illuminate or fill-in what was not obvious in the first place. With, preferably, the option to back up to where they were before, if they are led to some other help page. Ted. No explicit email group or coordination would be needed. It would also address the organization problem as they could be organized as they are now, i.e. into packages: base, stats, utils, ... It would require the development of a program to initially copy the help pages and to update them while keeping the notes in place whenever a new version of R came out. On Sun, Jun 14, 2009 at 5:35 PM, Peter Flompeterflomconsult...@mindspring.com wrote: I certainly don't have anything against the WIKI, but I think that the documentation is where the action is, especially for newbies. _It's the _natural first step when you want to learn about a function or when you _get an error message you don't understand. Peter Peter L. Flom, PhD Statistical Consultant www DOT peterflomconsulting DOT com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 14-Jun-09 Time: 23:41:39 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Issues getting R to write image files
On 11-Jun-09 09:24:42, Tobias Verbeke wrote: Hi Kenny, Have spent the last couple of days learning R and shell scripting to do batch plotting jobs. I have had success getting R to complete a filled contour plot and output to a file (.jpg or .tiff etc). However, when I try to do the same thing with the simple plot command the script seems to execute correctly yet there is no output. Below is my R code: file - Sys.getenv(input_file) tiff(paste( file, tiff, sep=.)) z - read.table(file) plot(z, type=l, xlim=range(0.6,2), col = red, plot.title = title(main = file, xlab = Wavelength (um), ylab = Intensity (arb.)) dev.off() q() You need to close the tiff graphics device you opened using dev.off() before quitting. HTH, Tobias I thought of that too -- since the graphics device needs to be closed before writing out to the file is completed and the file is closed. However, it occurred to me that possibly q() would also have that effect, since it closes down R which should have the effect of closing devices, flushing buffers, and closing files (though I do not see this documented under ?q). So I experimented. 1. New R session. 2. Assign values to some variables. 3. Open a tiff() device, plot them, and quit R (no dev.off): tiff(file=temp.tif) plot (X,P, type=l) lines(X,I.b, col=blue ) lines(X,I.m, col=green) lines(X,I.bm^2, col=red) q() 4. End of R session, and temp.tif (which did not exist at the start) contains a good TIFF file with exactly what I expected to see. This confirmed my suspicions. So it sould seem that dev.off() is not the answer. Probably something is wrong along the line of reading in the data, or in specifying what to plot. But I can't see anything obvious in the code, so it may depend on what sort of structure z is, for instance. Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 11-Jun-09 Time: 10:43:51 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Postcript font size
On 11-Jun-09 22:16:22, Andre Nathan wrote: On Thu, 2009-06-11 at 19:09 -0300, Andre Nathan wrote: Is there a way to specify a font size that is consistent among calls to plot()? To put it correctly: is there a way to specify a font size that is consistent among calls to postcript()? All the fonts in the same file have the same size, but comparing files, even when par(ps = ...) or postcript(pointsize = ...) are specified the same way, the fonts have different sizes. Best regards, Andre How are you comparing? A: Looking into the PostScript code in the files, and identifying the font-sizes where they are set in the code? B: Comparing apparent font sizes when looking at two plots on screen? C: Comparing apparent font sizes when looking at plots as printed on paper (and, in this case, have the plots been included in an enacapsulated document as EPS graphics)? Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 11-Jun-09 Time: 23:47:59 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ridiculous behaviour printing to eps: labels all messed
to postscript() or to dev.copy2eps() (which accepts the same arguments as postscript()). 4. Zeljko's first suggestion (to use postscript() in the first place, rather than dev.copy2eps() after plotting) is probably not going to be useful, since you would have to set useKerning=FALSE in postscript() anyway, and you can equally well set it in dev.copy2eps(). 5. Zeljko's second suggestion (of using one-letter labels) is also unlikely to be useful, and could be dangerous (e.g. if you used a as a label, and it also occurred in a text string in the diagram and the latter was set by (in) show ... (a) show ... or the like). Two-letter names might be safer, e.g. AA, BB, ... , since it would be unusual for such pairs to be kerned (though not impossible, since PostScript allows quite arbitrary things to be done with text, if the user wants it to happen, or the application thinks it would be a Good Thing). But, in any case, the desirable feedback from seeing exp1 and exp2 (or whatever) on the plot would then be missing. I would be strongly inclined to the useKerning=FALSE solution in this case. 6. I'm not a LaTeX user, so can't comment much on psfrag. However, I have just read its documentation The PSFrag System at http://www.tug.org/teTeX/tetex-texmfdist/doc/latex/psfrag/pfgguide.ps wherein Section 8 (Common mistakes, known problems, and bugs) discusses some of the issues that can arise from the fact that psfrag looks in the EPS file for an exact match for the tags. Executive Summary: Beware! 7. This is exercise for my hobby-horse. If the EPS graphic produced by R is, as it stands, adequate for your purposes then leave it alone and simply import it into your document. If it is not, then my approach (almost invariably) is to extract from R the numerical data which define the graphic, and use these data in an independent document-creation package (which could be LaTeX, perhaps, though not in my case) to re-create the graphic from scratch, and then include any embellishments (such as equations, etc.) which the independent software can do properly, when R can not do it properly or at all. An example of this can be seen at http://www.zen89632.zen.co.uk/R/EM_Aitken/em_aitken.pdf which is a little expository document about Aitken Acceleration of iterative processes, as applied to the EM algorithm. The computations were done in R, and the resulting data were used to draw the figures externally (in fact within the document itself). All three figures therein have some mathematical symbolism, the second one on Page 2 particularly. The full R code is also given. LaTeX-ers: Can you make such figures with the same details in them, without fiddling with an R-generated EPS (i.e. from scratch, using data from R)? I hope so! For what it's worth -- the software used was GNU groff (with the 'pic' preprocessor to create the figures). But that's just me. Dinosaurs do not easily digest organisms more recently evolved ... Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 08-Jun-09 Time: 14:38:03 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Re flect Back to Back Histograms in x-axis?
On 08-Jun-09 13:11:03, sedm1000 wrote: I've looked long and hard for this, but maybe I am missing something... There is a nice module that displays histograms reflected in the y axis, i.e. http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=136 but is it possible to reflect in the x-axis, so to have two datasets, one pointing up and one down rather than left and right? I haven't been able to find a way to plot this way using the guide, or elsewhere. Thanks... -- Try along the lines of: set.seed(54321) X1 - rnorm(500) X2 - rnorm(500) H1 - hist(X1,breaks=0.5*((-8):8),ylim=c(-100,100)) H1 - hist(X1,breaks=0.5*((-8):8),ylim=c(-100,100),col=red) H2 - hist(X2,breaks=0.5*((-8):8),plot=FALSE) H3-H2 ; H3$counts - -H3$counts plot(H3,add=TRUE,col=blue) NB: The choice of breaks and ylim was made after a preliminary inspection of hist(X1) and hist(X2). Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 08-Jun-09 Time: 15:50:48 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Journal Articles that Have Used R
On 07-Jun-09 10:56:25, Gabor Grothendieck wrote: Try this: site:journal.sjdm.org filetype:R When I enter that into Google, I got only the following two hits: # #!/usr/bin/Rscript --vanilla # input is a pre-made list of files ... #!/usr/bin/Rscript --vanilla # input is a pre-made list of files ending in html called ../htmlist # (see below). This is easily modified. ... journal.sjdm.org/RePEc/rss/rss.R - Cached - Similar pages # #!/usr/bin/Rscript --vanilla --verbose # script to convert RePEc ... #!/usr/bin/Rscript --vanilla --verbose # script to convert RePEc-style rdf files (ReDIFF) to DOAJ-type xml files # usage: oai.R [file] # where [file] is a ... journal.sjdm.org/RePEc/rss/oai.R - Cached - Similar pages none of which is what Jonathan os looking for (and the Similar pages links are a waste of time). In regexp language, what he is looking for is http://journal.sjdm.org/[0:9]+/*.R of which there are several instances on the site, for example http://journal.sjdm.org/8210/ shows jdm8210.html13-Dec-2008 1 jdm8210.pdf 13-Dec-2008 11:18 102K jdm8210.tex 13-Dec-2008 11:18 27K jdm8210001.gif 09-Dec-2008 14:38 11K probs.R 09-Dec-2008 14:37 1.0K test.R 23-May-2008 05:46 251 ttest.csv 22-May-2008 21:31 2.6K1:1831K so there are two .Rfiles there (8210 is the number of an article in the Journal). Other similar directories mAy or may not have .R files -- for example http://journal.sjdm.org/8816/ has none. The problem is that utilities like wget won;t work in this case, since HTTP doesn't accept wild cards, unlike FTP; but the journal site doesn't accept FTP ... !! It's an intriguing problem, and I'm seeking advice amongst my Linux acquaintances about it. I sonehow doubt that there is a solution ... Ted. On Sat, Jun 6, 2009 at 6:39 PM, Jonathan Baronba...@psych.upenn.edu wrote: I also use R to redraw figures for the journal I edit (below), when the authors cannot produce usable graphics (about 50% of the author who try). Unfortunately, I cannot find a way to search for just the R files. They are all http://journal.sjdm.org/*/*.R where * is the number of the article. _But Google, to my knowledge will not deal with wildcards like this. Jon -- Jonathan Baron, Professor of Psychology, University of Pennsylvania Home page: http://www.sas.upenn.edu/~baron Editor: Judgment and Decision Making (http://journal.sjdm.org) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 07-Jun-09 Time: 12:37:34 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Journal Articles that Have Used R
. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 07-Jun-09 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Time: 12:37:34 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 07-Jun-09 Time: 15:01:10 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Journal Articles that Have Used R
On 07-Jun-09 14:01:14, Ted Harding wrote: I think the reason Google will not find it is that, in the Journal website, the R files (and the names of the article directories that might contain them, such as journal.sjdm.org/8210/ -- see below) are not directly pointed to by any index.html or any href ... /href in the website, as far as I can see. This would be why 'wget' cannot find them in HTTP mode, and it would prevent Google being led to them. [... ... ...] This all seems to be overkill, however! Much easier if the site would accept FTP. Ted. Or, as Jonathan has now done, the R files were pointed to by links within the web-page! :) Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 07-Jun-09 Time: 15:41:49 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Journal Articles that Have Used R
On 06-Jun-09 18:34:21, Jakson Alves de Aquino wrote: Jason Rupert wrote: Is there a way to get a reference list of journal articles that have used R? I am just looking for some examples of R graphs and presentation of results where R was used to generate the results. Did you try Google Scholar: http://scholar.google.com.br/scholar?q=R%3A+A+Language+and+Environment+f or+Statistical+Computing It has cited by links. A very good suggestion! An even better oone [:)] is to take out the .br: http://scholar.google.com/scholar?q=R%3A+A+Language+and+Environment+ for+Statistical+Computing (158,000 hits of which, on briefly scanning through the first 40, about half seem to be hournal articles; draw you own conclusions). Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 06-Jun-09 Time: 19:50:57 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] contrasts
On 03-Jun-09 17:50:38, Ricardo Arias Brito wrote: Hi all, I want t use a contrasts in adjusted regression (lm()) for a factor with four levels, compared the coefficients first with second, first with third, ..., third with fourth. Someone can help with a contrast matrix to be used. I tried and not yet I obtained. Say: y - rnorm(50) x - cut(rnorm(50, mean=y, sd=0.25),c(-3,-1.5,0,1.5,3)) reg - lm(y ~ x, contrasts = list(x=contr.sum)) Thank you, Ricardo I am not sure if you have written what you meant! If you mean to contrast first level with second, second with third, and third with fourth (which is not what you wrote), then what you want is contr.sdif in the MASS package. However, what you wrote suggests that you want to obtain all possible pairs of levels: 1st with 2nd, 1st with 3rd, 1st with 4th, 2nd with 3rd, 2nd with 4th, 3rd with 4th This is not feasible at the level of model fitting (which is where any matrix such as you ask for would be used), since you can only have 3 linearly independent contrasts between 4 levels. In this case you would be asking for 6 contrasts, so 3 of them would be linear combinations of the 3 which were fitted in the model. If you use contr.sdif, you get 3 of them (1st with 2nd, 2nd with 3rd, 3rd with 4th) straight off. The other three can then be derived subsequently from these 3 via differences. The model fit should allow you to obtain the variance-covariance matrix of the three which were fitted. Then you can derive the variance-covariance matrix (and hence the SEs) of your derived contrasts. Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 03-Jun-09 Time: 19:34:59 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strange behavior when reading csv - line wraps
, the word kempten starts the next line: 39;167757703;12;10.309295;47.724545;21903...@n00 ;36;white;building;tower;clock;clouds;germany;bayern;deutschland;bavar ia;europa;europe;eagle;adler;eu;wolke;dome;townhall;rathaus;turm;weiss ;allemagne;europeanunion;bundesrepublik;gebaeude;glocke;brd;allgau;kup pel;europ;kempten;niemcy;europo;federalrepublic;europaischeunion;europ aeischeunion;germanio; What could be the reason? I ws thinking about solving the issue by using a different separator, that I would use for the first 7 fields and concatenating all of the remaining values into a single stirng value, but could not figure out how to do such a substitution in R. Unfortunately, on my system I cannot specify a range for sed... Thanks for any help/pointers Martin __ R-help@r-project.org mailto:R-help@r-project.org mailto:R-help@r-project.org mailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-pro ject.org/posting-guide.html http://www.r-project.org/posting-guide.html http://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. --Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? -- Martin Tomko Postdoctoral Research Assistant Geographic Information Systems Division Department of Geography University of Zurich - Irchel Winterthurerstr. 190 CH-8057 Zurich, Switzerland email: martin.to...@geo.uzh.ch site: http://www.geo.uzh.ch/~mtomko mob:+41-788 629 558 tel:+41-44-6355256 fax:+41-44-6356848 -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 31-May-09 Time: 16:24:27 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strange behavior when reading csv - line wraps
;400d;rabbitri otnet; where the last values occurs on the next line in the data frame. It does not have to be the last value, as in the follwong example, the word kempten starts the next line: 39;167757703;12;10.309295;47.724545;21903...@n00 ;36;white;building;tower;clock;clouds;germany;bayern;deutschland;bavar ia;europa;europe;eagle;adler;eu;wolke;dome;townhall;rathaus;turm;weiss ;allemagne;europeanunion;bundesrepublik;gebaeude;glocke;brd;allgau;kup pel;europ;kempten;niemcy;europo;federalrepublic;europaischeunion;europ aeischeunion;germanio; What could be the reason? I ws thinking about solving the issue by using a different separator, that I would use for the first 7 fields and concatenating all of the remaining values into a single stirng value, but could not figure out how to do such a substitution in R. Unfortunately, on my system I cannot specify a range for sed... Thanks for any help/pointers Martin __ R-help@r-project.org mailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.or g/posting-guide.html http://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 30-May-09 Time: 21:15:13 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regular point pattern on multi-segment line
On 29-May-09 11:23:15, Pascal LAFFARGUE wrote: I need your help for the following purpose : I would like to create regularly spaced points on a multisegment line (a 'psp class' object). A function of the spatstat library( = pointsOnLines() ) is dedicated to that objective but the problem is that the probability of falling on a particular segment is proportional to the length of the segment. What about a solution using the whole multi-segment line and not individual segments for points distribution ? Many thanks in advance for your help ! Plaff If I understand you right, one fairly obvious approach would be the following. 1. Let L be a vector of lengths of segments: L[i] is the length of segment i (say there are K segments). 2. Use X - sort(sum(L)*runif(N)) to place N points uniformly on (0,sum(L)) 3. Let Lcum - c(0,cumsum(L)) 4. whichsegs - cut(X,breaks=Lcum,labels=FALSE) Then whichsegs is a vector of N integers, from 1:K, which give the segments into whch the N points in X fall. So, for(i in (1:K)), extract the values of X[whichsegs==i], subtract Lcum[i] from each, and then place the points at the resulting distances along segment i. I think this should work (not tested in detail)! Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 29-May-09 Time: 13:14:27 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question about using a remote system
On 28-May-09 00:58:17, Erin Hodgess wrote: Dear R People: I would like to set up a plug-in for Rcmdr to do the following: I would start on a Linux laptop. Then I would log into another outside system and run a some commands. Now, when I tried to do system(ssh e...@xxx.edu) password xx It goes to the remote system. how do I continue to issue commands from the Linux laptop please? (hope this makes sense) thanks, Erin Hi Erin, If I understand what you want, I think you may be asking a bit too much from R itself. When you, Erin, are interacting with your Linux laptop you can switch between two (or more) different X windows or consoles, on one of which you are logged in to the remote machine and on which you issue commands to the remote machine, and can read its output and, maybe, copy this to a terminal running commands on your Linux laptop. When you ask R to log in as you describe, you are tying that instance of R to the login. I don't know of any resources within R itself which can emulate your personal behaviour. Though maybe others do know ... However, there is in principle a work-round. You will need to get your toolbox out and get to grips with Unix plmubing; and, to make it accessible to R, you will need to create a Unix plumber. The basic component is the FIFO, or named pipe. To create one of these, use a Linux command like mkfifo myFIFO1 Then 'ls -l' in the directory you are working in will show this pipe as present with the name myFIFO1. For example, if in one terminal window [A] you start the command cat myFIFO1 then in another [B] you echo Some text to test my FIFO myFIFO1 you will find that this text will be output in the [A] window by the 'cat' command. Similarly, you could 'mkfifo myFIFO2' and write tc this while in the [A] window, and read from it while in the [B] window. So, if you can get R to write to myFIFO1 (which is being read from by another program which will send the output to a remote machine which that program is logged into), and read from myFIFO2 which is being written to by that other program (and that is easy to do in R, using connections), then you have the plumbing set up. But, to set it up, R needs to call in the plumber. In other words, R needs to execute a command like system(MakeMyFIFOs) where MakeMyFIFOs is a script that creates myFIFO1 and myFIFO2, logs in to the remote machine, watches myFIFO1 and sends anything it reads to the remote machine, watches for any feedback from the remote machine and then writes this into myFIFO2. Meanwhile, back in R, any time you want R to send something to the remote machine you write() it (or similar) to myFIFO1, and whenever you want to receive something from the remote machine you readLines() it (or similar) from myFIFO2. That's an outline of a possible work-round. Maybe other have solved your same problem in a better way (and maybe there's an R package which does just that ... ). If so, I hope they'll chip in! Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 28-May-09 Time: 07:33:09 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how do I get to be a member?
On 27-May-09 18:09:30, Michael Menke wrote: Information, please. By subscribing your email address to the list. Visit the R-help info page at: https://stat.ethz.ch/mailman/listinfo/r-help and read what it has to say in general about R-help. Near the bottom is a section Subscribing to R-help which tells you what to do. Please note that it is *email addresses*, not persons, which are subscribed, so (if you have more than one email address ) you should use the one which you have subscribed when you want to post to the list. Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 27-May-09 Time: 20:35:02 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Animal Morphology: Deriving Classification Equation with
a reaction to what your data say. Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 24-May-09 Time: 20:07:43 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [correction] Animal Morphology: Deriving Classification Equation with
[Apologies -- I made an error (see at [***] near the end)] On 24-May-09 19:07:46, Ted Harding wrote: [Your data and output listings removed. For comments, see at end] On 24-May-09 13:01:26, cdm wrote: Fellow R Users: I'm not extremely familiar with lda or R programming, but a recent editorial review of a manuscript submission has prompted a crash course. I am on this forum hoping I could solicit some much needed advice for deriving a classification equation. I have used three basic measurements in lda to predict two groups: male and female. I have a working model, low Wilk's lambda, graphs, coefficients, eigenvalues, etc. (see below). I adjusted the sample analysis for Fisher's or Anderson's Iris data provided in the MASS library for my own data. My final and last step is simply form the classification equation. The classification equation is simply using standardized coefficients to classify each group- in this case male or female. A more thorough explanation is provided: For cases with an equal sample size for each group the classification function coefficient (Cj) is expressed by the following equation: Cj = cj0+ cj1x1+ cj2x2+...+ cjpxp where Cj is the score for the jth group, j = 1 ⦠k, cjo is the constant for the jth group, and x = raw scores of each predictor. If W = within-group variance-covariance matrix, and M = column matrix of means for group j, then the constant cjo= (-1/2)CjMj (Julia Barfield, John Poulsen, and Aaron French http://userwww.sfsu.edu/~efc/classes/biol710/discrim/discriminant.htm). I am unable to navigate this last step based on the R output I have. I only have the linear discriminant coefficients for each predictor that would be needed to complete this equation. Please, if anybody is familiar or able to to help please let me know. There is a spot in the acknowledgments for you. All the best, Chase Mendenhall The first thing I did was to plot your data. This indicates in the first place that a perfect discrimination can be obtained on the basis of your variables WRMA_WT and WRMA_ID alone (names abbreviated to WG, WT, ID, SEX): d.csv(horsesLDA.csv) # names(D0) # WRMA_WG WRMA_WT WRMA_ID WRMA_SEX WG-D0$WRMA_WG; WT-D0$WRMA_WT; ID-D0$WRMA_ID; SEX-D0$WRMA_SEX ix.M-(SEX==M); ix.F-(SEX==F) ## Plot WT vs ID (M F) plot(ID,WT,xlim=c(0,12),ylim=c(8,15)) points(ID[ix.M],WT[ix.M],pch=+,col=blue) points(ID[ix.F],WT[ix.F],pch=+,col=red) lines(ID,15.5-1.0*(ID)) and that there is a lot of possible variation in the discriminating line WT = 15.5-1.0*(ID) Also, it is apparent that the covariance between WT and ID for Females is different from the covariance between WT and ID for Males. Hence the assumption (of common covariance matrix in the two groups) for standard LDA (which you have been applying) does not hold. Given that the sexes can be perfectly discriminated within the data on the basis of the linear discriminator (WT + ID) (and others), the variable WG is in effect a close approximation to noise. However, to the extent that there was a common covariance matrix to the two groups (in all three variables WG, WT, ID), and this was well estimated from the data, then inclusion of the third variable WG could yield a slightly improved discriminator in that the probability of misclassification (a rare event for such data) could be minimised. But it would not make much difference! However, since that assumption does not hold, this analysis would not be valid. If you plot WT vs WG, a common covariance is more plausible; but there is considerable overlap for these two variables: plot(WG,WT) points(WG[ix.M],WT[ix.M],pch=+,col=blue) points(WG[ix.F],WT[ix.F],pch=+,col=red) If you plot WG vs ID, there is perhaps not much overlap, but a considerable difference in covariance between the two groups: plot(ID,WG) points(ID[ix.M],WG[ix.M],pch=+,col=blue) points(ID[ix.F],WG[ix.F],pch=+,col=red) This looks better on a log scale, however: lWG - log(WG) ; lWT - log(WT) ; lID - log(ID) ## Plot log(WG) vs log(ID) (M F) plot(lID,lWG) points(lID[ix.M],lWG[ix.M],pch=+,col=blue) points(lID[ix.F],lWG[ix.F],pch=+,col=red) and common covaroance still looks good for WG vs WT: ## Plot log(WT) vs log(WG) (M F) plot(lWG,lWT) points(lWG[ix.M],lWT[ix.M],pch=+,col=blue) points(lWG[ix.F],lWT[ix.F],pch=+,col=red) but there is no improvement for WG vs IG: ## Plot log(WT) vs log(ID) (M F) plot(ID,WT,xlim=c(0,12),ylim=c(8,15)) points(ID[ix.M],WT[ix.M],pch=+,col=blue) points(ID[ix.F],WT[ix.F],pch=+,col=red) [***] The above is incorrect! Apologies. I plotted the raw WT and ID instead of their logs. In fact, if you do plot the logs: ## Plot log(WT) vs log(ID) (M F) plot(lID,lWT) points(lID[ix.M],lWT[ix.M],pch=+,col=blue) points(lID[ix.F],lWT[ix.F],pch=+,col=red) you now get what looks like much closer agreement between
Re: [R] Animal Morphology: Deriving Classification Equation with
on this forum hoping I could solicit some much needed advice for deriving a classification equation. I have used three basic measurements in lda to predict two groups: male and female. I have a working model, low Wilk's lambda, graphs, coefficients, eigenvalues, etc. (see below). I adjusted the sample analysis for Fisher's or Anderson's Iris data provided in the MASS library for my own data. My final and last step is simply form the classification equation. The classification equation is simply using standardized coefficients to classify each group- in this case male or female. A more thorough explanation is provided: For cases with an equal sample size for each group the classification function coefficient (Cj) is expressed by the following equation: Cj = cj0+ cj1x1+ cj2x2+...+ cjpxp where Cj is the score for the jth group, j = 1 ⦠k, cjo is the constant for the jth group, and x = raw scores of each predictor. If W = within-group variance-covariance matrix, and M = column matrix of means for group j, then the constant cjo= (-1/2)CjMj (Julia Barfield, John Poulsen, and Aaron French http://userwww.sfsu.edu/~efc/classes/biol710/discrim/discriminant.htm) . I am unable to navigate this last step based on the R output I have. I only have the linear discriminant coefficients for each predictor that would be needed to complete this equation. Please, if anybody is familiar or able to to help please let me know. There is a spot in the acknowledgments for you. All the best, Chase Mendenhall The first thing I did was to plot your data. This indicates in the first place that a perfect discrimination can be obtained on the basis of your variables WRMA_WT and WRMA_ID alone (names abbreviated to WG, WT, ID, SEX): d.csv(horsesLDA.csv) # names(D0) # WRMA_WG WRMA_WT WRMA_ID WRMA_SEX WG-D0$WRMA_WG; WT-D0$WRMA_WT; ID-D0$WRMA_ID; SEX-D0$WRMA_SEX ix.M-(SEX==M); ix.F-(SEX==F) ## Plot WT vs ID (M F) plot(ID,WT,xlim=c(0,12),ylim=c(8,15)) points(ID[ix.M],WT[ix.M],pch=+,col=blue) points(ID[ix.F],WT[ix.F],pch=+,col=red) lines(ID,15.5-1.0*(ID)) and that there is a lot of possible variation in the discriminating line WT = 15.5-1.0*(ID) Also, it is apparent that the covariance between WT and ID for Females is different from the covariance between WT and ID for Males. Hence the assumption (of common covariance matrix in the two groups) for standard LDA (which you have been applying) does not hold. Given that the sexes can be perfectly discriminated within the data on the basis of the linear discriminator (WT + ID) (and others), the variable WG is in effect a close approximation to noise. However, to the extent that there was a common covariance matrix to the two groups (in all three variables WG, WT, ID), and this was well estimated from the data, then inclusion of the third variable WG could yield a slightly improved discriminator in that the probability of misclassification (a rare event for such data) could be minimised. But it would not make much difference! However, since that assumption does not hold, this analysis would not be valid. If you plot WT vs WG, a common covariance is more plausible; but there is considerable overlap for these two variables: plot(WG,WT) points(WG[ix.M],WT[ix.M],pch=+,col=blue) points(WG[ix.F],WT[ix.F],pch=+,col=red) If you plot WG vs ID, there is perhaps not much overlap, but a considerable difference in covariance between the two groups: plot(ID,WG) points(ID[ix.M],WG[ix.M],pch=+,col=blue) points(ID[ix.F],WG[ix.F],pch=+,col=red) This looks better on a log scale, however: lWG - log(WG) ; lWT - log(WT) ; lID - log(ID) ## Plot log(WG) vs log(ID) (M F) plot(lID,lWG) points(lID[ix.M],lWG[ix.M],pch=+,col=blue) points(lID[ix.F],lWG[ix.F],pch=+,col=red) and common covaroance still looks good for WG vs WT: ## Plot log(WT) vs log(WG) (M F) plot(lWG,lWT) points(lWG[ix.M],lWT[ix.M],pch=+,col=blue) points(lWG[ix.F],lWT[ix.F],pch=+,col=red) but there is no improvement for WG vs IG: ## Plot log(WT) vs log(ID) (M F) plot(ID,WT,xlim=c(0,12),ylim=c(8,15)) points(ID[ix.M],WT[ix.M],pch=+,col=blue) points(ID[ix.F],WT[ix.F],pch=+,col=red) So there is no simple road to applying a routine LDA to your data. To take account of different covariances between the two groups, you would normally be looking at a quadratic discriminator. However, as indicated above, the fact that a linear discriminator using the variables ID WT alone works so well would leave considerable imprecision in conclusions to be drawn from its results. Sorry this is not the straightforward answer you were hoping for (which I confess I have not sought); it is simply a reaction to what your data say. Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax
Re: [R] Product of 1 - probabilities
On 21-May-09 14:15:08, Mark Bilton wrote: I am having a slight problem with probabilities. To calculate the final probability of an event p(F), we can take the product of the chance that each independent event that makes p(F) will NOT occur. So... p(F) = 1- ( (1-p(A)) * (1-p(B)) * (1-p(C))...(1-p(x)) ) If the chance of an event within the product occurring remains the same, we can therefore raise this probability to a power of the number of times that event occurs. e.g. rolling a dice p(A) = 1/6 of getting a 1... p(F) = 1 - (1- (1/6))^z p(F) = 1 - (1-p(A))^z tells us the probabiltity of rolling a 1 'at least once' in z number of rolls. So then to R... if p(A) = 0.01; z = 4; p(F) = 0.039 obviously p(F) p(A) however the problem arises when we use very small numbers e.g. p(B) = 1 * 10^-30 R understands this value However when you have 1-p(B) you get something very close to 1 as you expect...but R seems to think it is 1. So when I calculate p(F) = 1 - (1-p(B))^z = 1 to the power anything equals 1 so p(F) = 0 and not just close to zero BUT zero. It doesn't matter therefore if z = 1*10^1000, the answer is still zero !! Obviously therefore now p(F) p(B) Is there any solution to my problem, e.g. - is it a problem with the sum (-) ? ie could I change the number of bits the number understands (however it seems strange that it can hold it as a value close to 0 but not close to 1) -or should I maybe use a function to calculate the exact answer ? -or something else Any help greatly appreciated Mark The problem is not with sum(), but with the fact that R adopts the exact value 1 for (1-x) if (on my machine) x 10^(-16). This is to do with the number of binary digits used to store a number. Your 10^(-30) is much smaller than that! x - 10^(-30) x # [1] 1e-30 y - (1-x) ; 1-y # [1] 0 x - 10^(-15) y - (1-x) ; 1-y # [1] 9.992007e-16 x - 10^(-16) y - (1-x) ; 1-y # [1] 1.110223e-16 x - 10^(-17) y - (1-x) ; 1-y # [1] 0 Read ?.Machine and, in particular, about: double.eps: the smallest positive floating-point number 'x' such that '1 + x != 1'. It equals 'base^ulp.digits' if either 'base' is 2 or 'rounding' is 0; otherwise, it is '(base^ulp.digits) / 2'. Normally '2.220446e-16'. There is no cure for this in R, but you could try to work round it (depending on the details of your problem) by using an approximation to the value of the expression. For instance, if x is very small (say 10^(-20)), and n is not very large, then to first order (1 - x)^n = 1 - n*x or, if X is a vector of very small numbers, prod((1 - X)) = 1 - sum(X) Either of these may still result in 1 if what is being subtracted is less than .Machine$double.eps. However, in the particular type of application you describe, 1 - (1 - x)^n = n*x to first order 1 - prod((1 - X)) = sum(X) to first order You will just have to work out what will give you a sensible answer. Basically, you are trying to operate beyond the limits of precision of R, and so will need to re-cast your question in alternative terms which will yield an adequately precise result. Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 21-May-09 Time: 15:59:47 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Product of 1 - probabilities
On 21-May-09 14:45:20, Gabor Grothendieck wrote: There are several arbitrary precision packages available: gmp (an interface to the GNU multi-precision library on CRAN) and bc (an R interface to the bc arbitrary precision calculator): http://r-bc.googlecode.com There are also packages providing R interfaces to two computer algebra systems and they both support not only arbitrary precision but also exact calculation: http://rsympy.googlecode.com http://ryacas.googlecode.com library(bc) 1 - (1-10^-75)^10 [1] 0 bc(1 - (1-10^-75)^10) [1] .00 000100 Thanks for pointing this out, Gabor! In particular, bc is something I often use (but on its own). It is good to know that R can connect with it. However, while your call above will indeed return an adequate answer for 1 - (1-10^-75)^10, and can be assigned numerically by, say, w - as.numeric(bc(1 - (1-10^-75)^10)) this still does not let Mark off the hook of verifying that he will, in the end, get a sensible answer. If this calculation were part of a more elaborate computation in which, say, some formula called for (1-w), then that would still evaluate to 1, with possible dire results. While plain algebraic expressions can easily be evaluated in bc, its repertoire of mathematical functions is limited. No doubt the other mathematical packages you mention have much richer resources; but for anything complicated requiring their use I would probably go for working externally with such a package. However, that may be a naive view, since I have not tried the R interfaces! Nevertheless, I would still strongly recommend doing some analysis of the expressions in a computation when limits of precision are an issue, since often a simple expression can be obtained which, in R, would give adequate results. Ted. On Thu, May 21, 2009 at 10:15 AM, Mark Bilton mcbil...@hotmail.com wrote: I am having a slight problem with probabilities. To calculate the final probability of an event p(F), we can take the product of the chance that each independent event that makes p(F) will NOT occur. So... p(F) = 1- ( (1-p(A)) * (1-p(B)) * (1-p(C))...(1-p(x)) ) If the chance of an event within the product occurring remains the same, we can therefore raise this probability to a power of the number of times that event occurs. e.g. rolling a dice p(A) = 1/6 of getting a 1... p(F) = 1 - (1- (1/6))^z p(F) = 1 - (1-p(A))^z tells us the probabiltity of rolling a 1 'at least once' in z number of rolls. So then to R... if p(A) = 0.01; z = 4; p(F) = 0.039 obviously p(F) p(A) however the problem arises when we use very small numbers e.g. p(B) = 1 * 10^-30 R understands this value However when you have 1-p(B) you get something very close to 1 as you expect...but R seems to think it is 1. So when I calculate p(F) = 1 - (1-p(B))^z = 1 to the power anything equals 1 so p(F) = 0 and not just close to zero BUT zero. It doesn't matter therefore if z = 1*10^1000, the answer is still zero !! Obviously therefore now p(F) p(B) Is there any solution to my problem, e.g. - is it a problem with the sum (-) ? ie could I change the number of bits the number understands (however it seems strange that it can hold it as a value close to 0 but not close to 1) -or should I maybe use a function to calculate the exact answer ? -or something else Any help greatly appreciated Mark - _ _ _ _ _[[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 21-May-09 Time: 16:39:16 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loop avoidance and logical subscripts
On 21-May-09 16:56:23, retama wrote: Patrick Burns kindly provided an article about this issue called 'The R Inferno'. However, I will expand a little bit my question because I think it is not clear and, if I coud improve the code it will be more understandable to other users reading this messages when I will paste it :) In my example, I have a dataframe with several hundreds of DNA sequences in the column data$sequences (each value is a long string written in an alphabet of four characters, which are A, C, T and G). I'm trying to know parameter number of Gs plus Cs over the total [G+C/(A+T+C+G)] in each sequence. In example, data$sequence [1] is something like AATTCCCGG but a little bit longer, and, its G+C content is 0.69 . I need to compute a vector with all G+C contents (in my example, in data$GCsequence, in which data$GCsequence[1] is 0.69). So the question was if making a loop and a combination of values with c() or cbind() or with logical subscripts is ok or not. And which approach should produce better results in terms of efficiency (my script goes really slow). Thank you, Retama Perhaps the following could be the basis of your code for the bigger problem: S - unlist(strsplit(AATTCCCGG,)) S # [1] A A T T C C C G G G G G G (sum((S==C)|(S==G))) # [1] 9 (sum((S==C)|(S==G)))/length(S) # [1] 0.6923077 You could build a function on those lines, to evaluate what you want for any given string; and then apply() it to the elements (which are the separate character strings) of data$sequences (which is presumably a vector of character strings). Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 21-May-09 Time: 18:18:24 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] postscript problems (landscape orientation)
On 21-May-09 23:02:28, David Scott wrote: Well most people deal with that problem by not using Acrobat to read .pdf files. On linux you can use evince or xpdf. On windows just use gsview32. Those readers don't lock the .pdf. I am with Peter and generally go straight to pdf these days. The only reason for going through postscript is if you want to use psfrag. David Scott Going off at a tangent to the original query, I would say that this is too limited a view! For one thing, PostScript (in its Encasulated manifestation) is readily imported and re-scalable by software which does not import PFF. Also, PS is an editable plain-text file (even though there may be chunks in hexadecimal for some graphics -- but it's still ASCII). One thing which I quite often do is edit the %%BoundingBox: line to improve the framing of the graphic -- and viewing it in ghostscript with watch mode on as one edits, one can easily adjust things to a satisfactory visual effect. If you know what you are doing, you can if you wish move things around, or add or delete things (especially bits of text) by using any plain-text editor on the PostScript file. Finally (though this may be a symptom of serious masochsim on my part), if I download a PDF in which the author has plotted the data, after I print to file in PostScript from Acrobat Reader I can usually obtain a very close approximation to the original data values by extracting the PS coordinates of the plotted points (and axis tick-marks) from the PostScript file. The only reason for going through postscript is if you want to use psfrag -- or psnup and/or psbook or ... Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 22-May-09 Time: 00:31:32 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] round function seems to produce maximum 2 decimals
On 20-May-09 20:10:15, Glenn E Stauffer wrote: I am trying to use round()to force R to display a specific number of decimals, but it seems to display =2 decimals no matter what I specify in the digits argument. As an alternative I tried signif(), but it also produces unexpected results. See example code and results below. Format() works, but then the result no longer is numeric. Am I missing something simple? I am using R 2.9.0 on Windows XP. Thanks, Glenn #code h=12345.16711 h round(h,digits=1) round(h,digits=2) round(h,digits=3) round(h,digits=4) round(h,digits=5) signif(h,digits=9) format(h,nsmall=4) #results h=12345.16711 h [1] 12345.17 round(h,digits=1) [1] 12345.2 round(h,digits=2) [1] 12345.17 round(h,digits=3) [1] 12345.17 round(h,digits=4) [1] 12345.17 round(h,digits=5) [1] 12345.17 signif(h,digits=9) [1] 12345.17 format(h,nsmall=4) [1] 12345.1671 What you're missing is that when you do (e.g.) h - 12345.16711 round(h,digits=4) # [1] 12345.17 what is displayed ([1] 12345.17) is not the result of round(), but what the result of round() is to be displayed as given the options digits=7 (default) for the number of *significant figures* in the display of stored values. To see the result as it is stored, you should use print() with the appropriate number of disgits specified: print( round(h,digits=5),10) # [1] 12345.16711 print( round(h,digits=4),10) # [1] 12345.1671 print( round(h,digits=3),10) # [1] 12345.167 print( round(h,digits=2),10) # [1] 12345.17 Internally, round(h) is correctly stored: h4 - round(h,4) h - h4 # [1] 1e-05 h3 - round(h,3) h - h3 # [1] 0.00011 h2 - round(h,2) h - h2 # [1] -0.00289 To illustrate the influence of the display option digits=7: h-45.16711 h # [1] 45.16711 round(h,digits=4) # [1] 45.1671 round(h,digits=3) # [1] 45.167 round(h,digits=2) # [1] 45.17 h-345.16711 h # [1] 345.1671 round(h,digits=4) # [1] 345.1671 round(h,digits=3) # [1] 345.167 round(h,digits=2) # [1] 345.17 h-2345.16711 h # [1] 2345.167 round(h,digits=4) # [1] 2345.167 round(h,digits=3) # [1] 2345.167 round(h,digits=2) # [1] 2345.17 Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 20-May-09 Time: 22:54:41 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] what is wrong with this code?
On 19-May-09 19:52:20, deanj2k wrote: dlogl - -(n/theta)-sum((y/(theta)^2)*((1-exp(y/theta))/(1+exp(y/theta))) d2logl - (n/theta^2) - sum((-2y/theta^3)*(1-exp(y/theta))/(1+exp(y/theta))) - sum(((2*y/theta^4)*exp(y/theta))/((1+exp(y/theta))^2)) returns the error message: Error: unexpected symbol in: dlogl - -(n/theta)-sum((y/(theta)^2)*((1-exp(y/theta))/(1+exp(y/theta))) d2logl do you know what i have done wrong The error message strongly suggests that the line beginning d2logl - is being seen as a continuation of the preceding line. Counting parentheses, I find that you are 1 short of what is required to complete the expression in the line beginning -(n/theta). In that case, R will continue on to the next line seeking the completion, and will encounter d2logl non-syntactically. Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 19-May-09 Time: 22:12:40 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Concatenating two vectors into one
On 18-May-09 11:09:45, Henning Wildhagen wrote: Dear users, a very simple question: Given two vectors x and y x-as.character(c(A,B,C,D,E,F)) y-as.factor(c(1,2,3,4,5,6)) i want to combine them into a single vector z as A1, B2, C3 and so on. z-x*y is not working, i tried several others function, but did not get to the solution. Thanks for your help, Henning And a very simple solution! Use paste(): x-as.character(c(A,B,C,D,E,F)) y-as.factor(c(1,2,3,4,5,6)) paste(x,y) # [1] A 1 B 2 C 3 D 4 E 5 F 6 paste(x,y,sep=) # [1] A1 B2 C3 D4 E5 F6 Ted. PS: 'x*y' will attempt to perform a numerical multiplication. This cannot work for character vectors. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 18-May-09 Time: 12:23:56 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Output of binary representation
I am interested in studying the binary representation of numerics (doubles) in R, so am looking for possibilities of output of the internal binary representations. sprintf() with format a or A is halfway there: sprintf(%A,pi) # [1] 0X1.921FB54442D18P+1 but it is in hex. The following illustrate the sort of thing I want: 1.1001 0010 0001 1011 0101 0100 0100 0100 0010 1101 0001 1000 times 2 11.0010 0100 0011 0110 1010 1000 1000 1000 0101 1010 0011 000 0.1100 1001 1101 1010 1010 0010 0010 0001 0110 1000 1100 0 times 4 (without the spaces -- only put in above for clarity). While I could take the original output 0X1.921FB54442D18P+1 from sprintf() and parse it out into binary using gsub() or the like, of submit it to say an 'awk' script via an external file, this would be a tedious business! Is there some function already in R which outputs the bits in the binary representation directly? I see that Dabid Hinds asked a similar question on 17 Aug 2005: Raw data type transformations http://finzi.psych.upenn.edu/R/Rhelp02/archive/59900.html (without, apparently, getting any response -- at any rate within the following 3 months). With thanks for any suggestions, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 17-May-09 Time: 18:23:49 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Output of binary representation
Many thankis, Gabor! That looks both interesting and powerful. Indeed, it seems to implement with one stroke what I had been thinking of implementing piecemeal. Best wishes, Ted. On 17-May-09 17:48:00, Gabor Grothendieck wrote: gsubfn of the gsubfn package is like gsub but can take a function, list or proto object as the replacement instead of a character string and with a list it can be used to readily turn hex to binary: library(gsubfn) binary.digits - + list(0= , 1= 0001, 2= 0010, 3= 0011, + 4= 0100, 5= 0101, 6= 0110, 7= 0111, + 8= 1000, 9= 1001, A= 1010, B= 1011, + C= 1100, D= 1101, E= 1110, F= ) gsubfn([0-9A-F], binary.digits, 0X1.921FB54442D18P+1) [1] X0001.1001001110110101010001000110110100011000P+0001 On Sun, May 17, 2009 at 1:23 PM, Ted Harding ted.hard...@manchester.ac.uk wrote: I am interested in studying the binary representation of numerics (doubles) in R, so am looking for possibilities of output of the internal binary representations. sprintf() with format a or A is halfway there: _sprintf(%A,pi) # [1] 0X1.921FB54442D18P+1 but it is in hex. The following illustrate the sort of thing I want: 1.1001 0010 0001 1011 0101 0100 0100 0100 0010 1101 0001 1000 times 2 11.0010 0100 0011 0110 1010 1000 1000 1000 0101 1010 0011 000 0.1100 1001 1101 1010 1010 0010 0010 0001 0110 1000 1100 0 times 4 (without the spaces -- only put in above for clarity). While I could take the original output 0X1.921FB54442D18P+1 from sprintf() and parse it out into binary using gsub() or the like, of submit it to say an 'awk' script via an external file, this would be a tedious business! Is there some function already in R which outputs the bits in the binary representation directly? I see that Dabid Hinds asked a similar question on 17 Aug 2005: Raw data type transformations _http://finzi.psych.upenn.edu/R/Rhelp02/archive/59900.html (without, apparently, getting any response -- at any rate within the following 3 months). With thanks for any suggestions, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 17-May-09 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Time: 18:23:49 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 17-May-09 Time: 21:09:12 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Output of binary representation
Thanks, Jim. While that is still in hex, I find I can get the binary represntation using Gabor's gsubfn() function, provided the A-F isw changed to a-f in setting up his 'binary.digits', and the output is explicitly cast to character: gsubfn([0-9a-f], binary.digits, as.character(writeBin(pi,raw(),endian='big') Ted. On 17-May-09 20:04:58, jim holtman wrote: Are you looking for how the floating point is represented in the IEEE-754 format? If so, you can use writeBin: writeBin(pi,raw(),endian='big') [1] 40 09 21 fb 54 44 2d 18 On Sun, May 17, 2009 at 1:23 PM, Ted Harding ted.hard...@manchester.ac.ukwrote: I am interested in studying the binary representation of numerics (doubles) in R, so am looking for possibilities of output of the internal binary representations. sprintf() with format a or A is halfway there: sprintf(%A,pi) # [1] 0X1.921FB54442D18P+1 but it is in hex. The following illustrate the sort of thing I want: 1.1001 0010 0001 1011 0101 0100 0100 0100 0010 1101 0001 1000 times 2 11.0010 0100 0011 0110 1010 1000 1000 1000 0101 1010 0011 000 0.1100 1001 1101 1010 1010 0010 0010 0001 0110 1000 1100 0 times 4 (without the spaces -- only put in above for clarity). While I could take the original output 0X1.921FB54442D18P+1 from sprintf() and parse it out into binary using gsub() or the like, of submit it to say an 'awk' script via an external file, this would be a tedious business! Is there some function already in R which outputs the bits in the binary representation directly? I see that Dabid Hinds asked a similar question on 17 Aug 2005: Raw data type transformations http://finzi.psych.upenn.edu/R/Rhelp02/archive/59900.html (without, apparently, getting any response -- at any rate within the following 3 months). With thanks for any suggestions, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 17-May-09 Time: 18:23:49 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/po sting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 17-May-09 Time: 22:06:59 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sprintf() question
On 17-May-09 22:03:19, Daniel Nordlund wrote: When I type the following, I get results different from what I expected. sprintf('%a',3) [1] 0x1.8 Shouldn't the result be [1] 0x1.8p+2 Well, not p+2 but p+1 (0x1.8 = 1.1000[2] ; *2 = 11.000[2] = 3[10]) ; however, I get: sprintf('%a',3) # [1] 0x1.8p+1 which is indeed correct. R version 2.9.0 (2009-04-17) ## Same as yours platform i486-pc-linux-gnu ## Different from yours ... which perhaps suggests that there may be a mis-compilation in the Windows version. Ted. I read through the help ?sprintf and didn't find anything that changed my expectation. What am I misunderstanding? I am using R-2.9.0 binary from CRAN on Windows XP Pro, and my session info is sessionInfo() R version 2.9.0 (2009-04-17) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base Thanks for any enlightenment. Dan Daniel Nordlund Bothell, WA USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 17-May-09 Time: 23:32:19 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] replace % with \%
On 15-May-09 14:46:27, Liviu Andronic wrote: Dear all, I'm trying to gsub() % with \% with no obvious success. temp1 - c(mean, sd, 0%, 25%, 50%, 75%, 100%) temp1 [1] mean sd 0% 25% 50% 75% 100% gsub(%, \%, temp1, fixed=TRUE) [1] mean sd 0% 25% 50% 75% 100% Warning messages: 1: '\%' is an unrecognized escape in a character string 2: unrecognized escape removed from \% I am not quite sure on how to deal with this error message. I tried the following gsub(%, \\%, temp1, fixed=TRUE) [1] mean sd 0\\% 25\\% 50\\% 70\\% 100\\% Could anyone suggest how to obtain output similar to: [1] mean sd 0\% 25\% 50\% 75\% 100\% Thank you, Liviu 1: The double escape \\ is the correct way to do it. If you give \% to gsub, it will try to interpret % as a special character (like \n for newline), and there is none such (as it tells you). On the other hand, \\ tells gsub to interpret \ (normally used as the Escape character) in a special way (namely as a literal \). 2: The output mean sd 0\\% 25\\% 50\\% 70\\% 100\\% from gsub(%, \\%, temp1, fixed=TRUE) is one of those cases where R displays something different from what is really there! In other words, 0\\% for example is the character string you would have to enter in order for R to store \%. You can see what is really there using cat: cat(gsub(%, \\%, temp1, fixed=TRUE)) # mean sd 0\% 25\% 50\% 75\% 100\% which, of course, is what you wanted. You can see in other ways that what is stored is what you wanted -- for instance: temp2 - gsub(%, \\%, temp1, fixed=TRUE) write.csv(temp2,gsub.csv) and then, if you look into gsub.csv outside of R, you will see: ,x 1,mean 2,sd 3,0\% 4,25\% 5,50\% 6,75\% 7,100\% which, again, is what you wanted. Hoping this helops, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 15-May-09 Time: 16:32:13 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simulation
On 14-May-09 11:28:17, Wacek Kusnierczyk wrote: Barry Rowlingson wrote: As a beginner, I agree the for loop is much clearer to me. [Warning: Contains mostly philosophy] maybe quasi ;) To me, the world and how I interact with it is procedural. When I want to break six eggs I do 'get six eggs, repeat break egg until all eggs broken'. yeah, that's the implementation level. a typical recipe would not say 'for n from 1 to 6, break the nth egg'. it'd rather say 'break the eggs', which is closer to 'apply break to the eggs'. you do of course break the eggs sequentially (or?), but that's below the abstraction useful for the recipe purpose. But it does influence how you organise the subsequent garbage collection. :) Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 14-May-09 Time: 12:40:16 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] specify the number of decimal numbers
On 14-May-09 11:40:21, lehe wrote: Hi, I was wondering how to specify the number of decimal numbers in my computation using R? I have too many decimal numbers for my result, when I convert them to string with as.character, the string will be too long. Thanks and regards! Since you say you want them to end up as character strinngs, it would seem that sprintf() is the ideal tool for the job. 1. Read ?sprintf 2. Examples: pi # [1] 3.141593 sprintf(%.3f,pi) # [1] 3.142 # sprintf(%10.3f,pi) [1] 3.142 sprintf(%010.3f,pi) # [1] 03.142 sprintf(%025.17f,pi) # [1] 003.14159265358979312 So you can do pretty much what you want, in terms of output format. Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 14-May-09 Time: 13:04:30 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.