Re: [R] symbols in a data frame
Hi Sam, But this may not be the important issue here at all. If k means the value is left censored at k -- i.e. we know it's less than k but not how much less -- than Sarah's proposal is not what you want to do. Exactly what you do want to do depends on context, and as it concerns statistical methodology, is not something that should be discussed here. Consult a local statistician if this is a correct guess. I'd like to chime in with Bert's advise here. Unless the LOQs are very few*, they have the potential to seriously mess up any further data analysis. Actually, I'd recommend you go one step back and ask the analysis lab whether they can supply you with the uncensored data, specifying the LOQ separately. A while ago I posted some illustrations about such censoring at LOQ situations on cross validated, which may help you in forming a decision how to go on: http://stats.stackexchange.com/a/30739/4598 Claudia (Analytical Chemist Chemometrician) *or you know that they'll not matter for the particular data analysis you want to do -- Claudia Beleites, Chemist Spectroscopy/Imaging Leibniz Institute of Photonic Technology Albert-Einstein-Str. 9 07745 Jena Germany email: claudia.belei...@ipht-jena.de phone: +49 3641 206-133 fax: +49 2641 206-399 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Needed: Beta Testers and Summer Camp Students
Hi Paul, I skimmed over the pdf. I have comments on the discusssion about centering. I'm from a completely different field (chemometrics). Of course, I also have to explain centering. However, the argumentation I use is somewhat different from the one you give in your pdf. One argument I have in favour of (mean) centering is numerical stability, depending on the algorithm of course. I generally recommend that if data is centered, there should be an argument why the *chosen* center is *meaningful*, emphasizing that centering actually involves decisions, and that the center can have a meaning. While I agree that a centered model with the center chosen without any thought about its meaning is exactly the same in every important way compared to not centering, I disagree with the generality of your claim. A natural center of the data may exist. And in this case, using this appropriate center will ease the interpretation. Examples: - In analytical chemistry / chemometrics e.g. we can often use blanks (samples without analyte) as coordinate origin. Centering to the blank removes the influence of some parts of the instrumentation, like sample holders, cuvettes, etc. - Many of our samples (sample in the meaning of physical specimen) have a so-called matrix (a common composition/substance in which different other substances/things are observed), or is measured in a solvent. - I also work with biological specimen. There we often have controls (either control specimen/patients or for example normal tissue [vs. diseased tissues]) which are another type of natural coordinate origin. - I can even imagine problems where mean centering is meaningful: if the problem involves modeling properties that are deviations from a mean (I'm thinking of process analytics). However, mean centering will always need careful attention about the sampling procedure. Looking from the opposite point of view, some problems of *mean* centering become apparent. If the data comes from different groups, the mean may not be meaningful (I once heard a biologist arguing that the average human has one ovary and one testicle - this gets your audience awake and usually convinces immediately). And the mean may be influenced by the different proportions of the groups in your data. Which is what you do *not* want: what you want is a stable center. Best, Claudia -- Claudia Beleites Spectroscopy/Imaging Institute of Photonic Technology Albert-Einstein-Str. 9 07745 Jena Germany email: claudia.belei...@ipht-jena.de phone: +49 3641 206-133 fax: +49 2641 206-399 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Needed: Beta Testers and Summer Camp Students
Hi Paul, I skimmed over the pdf. I have comments on the discusssion about centering. I'm from a completely different field (chemometrics). Of course, I also have to explain centering. However, the argumentation I use is somewhat different from the one you give in your pdf. One argument I have in favour of (mean) centering is numerical stability, depending on the algorithm of course. I generally recommend that if data is centered, there should be an argument why the *chosen* center is *meaningful*, emphasizing that centering actually involves decisions, and that the center can have a meaning. While I agree that a centered model with the center chosen without any thought about its meaning is exactly the same in every important way compared to not centering, I disagree with the generality of your claim. A natural center of the data may exist. And in this case, using this appropriate center will ease the interpretation. Examples: - In analytical chemistry / chemometrics e.g. we can often use blanks (samples without analyte) as coordinate origin. Centering to the blank removes the influence of some parts of the instrumentation, like sample holders, cuvettes, etc. - Many of our samples (sample in the meaning of physical specimen) have a so-called matrix (a common composition/substance in which different other substances/things are observed), or is measured in a solvent. - I also work with biological specimen. There we often have controls (either control specimen/patients or for example normal tissue [vs. diseased tissues]) which are another type of natural coordinate origin. - I can even imagine problems where mean centering is meaningful: if the problem involves modeling properties that are deviations from a mean (I'm thinking of process analytics). However, mean centering will always need careful attention about the sampling procedure. Looking from the opposite point of view, some problems of *mean* centering become apparent. If the data comes from different groups, the mean may not be meaningful (I once heard a biologist arguing that the average human has one ovary and one testicle - this gets your audience awake and usually convinces immediately). And the mean may be influenced by the different proportions of the groups in your data. Which is what you do *not* want: what you want is a stable center. Best, Claudia -- Claudia Beleites Spectroscopy/Imaging Institute of Photonic Technology Albert-Einstein-Str. 9 07745 Jena Germany email: claudia.belei...@ipht-jena.de phone: +49 3641 206-133 fax: +49 2641 206-399 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OrgMassSpecR peak area issue
Hi Chris, I am having an issue with the OrgMassSpecR package. I run my HPLC using a DAD detector. You are on a statistics IDE related mailing list. Have mercy with people from other fields and tell them that you are using a diode array to measure UV/VIS absorption. (And possibly let them know that you expect the absorbance A = lg I_0 - lg I ~ c.) My raw data is exported form chemstation as a csv file. I then upload the csv into Rstudio no problem. Using the DrawChromatogram function, I get a nice chromatogram, and my retention time, peak area, and apex intensity values are given as well. The problem comes with the peak area value given. The peak area is much smaller than a value that would make sense. How do you know that (see next comment)? My peak area value is actually less than my apex intensity value. This is not a good criterion to determine what area value would actually make sense: area and intensity have different units! Possible solution: a glance on the code in DrawChromatogram reveals that really the polygon area is calculated (as the manual specifies). Thus the area will be in counts*s or counts*min, and of course 1 count*min = 60 counts*s. How long does your analyte take to elute? Unless it is 2 min (if time is in min) or 2 s (for time scale in s), the numeric value of the area should be A_max (approximating the peak as triangule). Your apex (max) absorbance should ideally be a bit below 1, so a rough guesstimate for peak area would be 1/2 A_max * Δt which will be quite below 1 if you measure time in minutes. If you detect by mass spec, you get ion counts which are large numbers, so areas are likely to be 1 (regardless of min or s time scale). Is this because I am using a DAD detector rather than an MS? If so, is there a simply way to edit the peak area equation so that it will also work with absorbance values? Most probably you just want to get your units right! Hope that helps, Claudia PS: for future questions of this sort, you may want to consider asking on stackoverflow.com (or chemistry.stackexchange.com) where you can post nicely formatted code, calculation results and images with your question. -- Claudia Beleites Spectroscopy/Imaging Institute of Photonic Technology Albert-Einstein-Str. 9 07745 Jena Germany email: claudia.belei...@ipht-jena.de phone: +49 3641 206-133 fax: +49 2641 206-399 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OrgMassSpecR peak area issue
Hi Bryan, Division by 2 is correct, comes from trapezoid calculation. The modulo line is a funny way of producing c (2 : n, 1) Best, Claudia Am Mon, 18 Mar 2013 15:00:06 -0400 schrieb Bryan Hanson han...@depauw.edu: If you type DrawChromatogram you can see the method used to calculate the peak area. Looks to me like you could easily hack it if you wanted. The relevant part about peak areas is this: for (j in 1:n) { k - (j%%n) + 1 x[j] - peakTime[j] * peakIntensity[k] - peakTime[k] * peakIntensity[j] } peakArea[i] - abs(sum(x)/2) which looks pretty standard to me, though I'm not clear right off the top of my head why they are dividing by 2. You can always contact the maintainer. Bryan On Mar 18, 2013, at 1:34 PM, Christopher Beaver christopher.bea...@gmail.com wrote: Hello! I am having an issue with the OrgMassSpecR package. I run my HPLC using a DAD detector. My raw data is exported form chemstation as a csv file. I then upload the csv into Rstudio no problem. Using the DrawChromatogram function, I get a nice chromatogram, and my retention time, peak area, and apex intensity values are given as well. The problem comes with the peak area value given. The peak area is much smaller than a value that would make sense. My peak area value is actually less than my apex intensity value. Is this because I am using a DAD detector rather than an MS? If so, is there a simply way to edit the peak area equation so that it will also work with absorbance values? Any help is greatly appreciated. Thanks for your time. Chris Beaver [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Claudia Beleites Spectroscopy/Imaging Institute of Photonic Technology Albert-Einstein-Str. 9 07745 Jena Germany email: claudia.belei...@ipht-jena.de phone: +49 3641 206-133 fax: +49 2641 206-399 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Transpose a big data file and write to a new file
Hi Yao He, this doesn't sound like R to me. I'd go for perl (or awk). See e.g. here: http://stackoverflow.com/questions/1729824/transpose-a-file-in-bash HTH Claudia Am Wed, 6 Mar 2013 22:37:14 +0800 schrieb Yao He yao.h.1...@gmail.com: Dear all: I have a big data file of 6 columns and 6 rows like that: AA AC AA AA ...AT CC CC CT CT...TC .. . I want to transpose it and the output is a new like that AA CC AC CC AA CT. AA CT. AT TC. The keypoint is I can't read it into R by read.table() because the data is too large,so I try that: c-file(silygenotype.txt,r) geno_t-list() repeat{ line-readLines(c,n=1) if (length(line)==0)break #end of file line-unlist(strsplit(line,\t)) geno_t-cbind(geno_t,line) } write.table(geno_t,xxx.txt) It works but it is too slow ,how to optimize it??? Thank you Yao He — Master candidate in 2rd year Department of Animal genetics breeding Room 436,College of Animial ScienceTechnology, China Agriculture University,Beijing,100193 E-mail: yao.h.1...@gmail.com —— __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Claudia Beleites Spectroscopy/Imaging Institute of Photonic Technology Albert-Einstein-Str. 9 07745 Jena Germany email: claudia.belei...@ipht-jena.de phone: +49 3641 206-133 fax: +49 2641 206-399 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] set working directory to current source directory
Hi Sachin, Is there a way to get cran R to set the working directory to be wherever the source file is? Each time I work on a project on different computers I keep having to set the working directory which is getting quite annoying. a while ago I asked a somewhat similar question on stackoverflow: http://stackoverflow.com/questions/8835426/get-filename-and-path-of-sourced-file You may want to have a look at the suggestions I got. Best, Claudia -- Claudia Beleites Spectroscopy/Imaging Institute of Photonic Technology Albert-Einstein-Str. 9 07745 Jena Germany email: claudia.belei...@ipht-jena.de phone: +49 3641 206-133 fax: +49 2641 206-399 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] hyperSpec user survey
Dear all, I'm looking for users of the hyperSpec package for handling (hyper)spectral or spectroscopic data in R which I maintain. First of all, I made a few announcements concerning the further development which can be found in the hyperSpec-help mailing list an on which I hope to get user feedback: see http://lists.r-forge.r-project.org/pipermail/hyperspec-help/2012-July/thread.html. My second issue is: Once in a while I have to convince people (adminstration of the institute) that hyperSpec now has a considerable user base. Obviously, this is important for getting funding to go on with the development. It would be extremely helpful if the hyperSpec users among you could drop me a short email saying - what kind of spectroscopy you use hyperSpec for - where you are (country, if possible city and institution/company) Of course, I'll treat the answers confidentially and I won't use names etc. and I won't sell any information. The goal is to have a few slides (geographical distribution of users - always a nice and fancy thing to show, statistics on the kind of spectroscopy etc.) which I'll also put on the hyperSpec homepage so you can use them as well. Thanks a lot, Claudia Beleites hyperSpec.r-forge.r-project.org PS: please excuse if you get this request multiple times, I try to get to my users in different ways... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to know perfect execution of function ? if error occurred in execution, how to report it?
In addition, if you need to dig down why the error occurs: ?traceback ?recover HTH Claudia Am 23.03.2012 10:29, schrieb Jim Holtman: ?try Sent from my iPad On Mar 23, 2012, at 3:32, sagarnikam123 sagarnikam...@gmail.com wrote: i have one for loop,in which i am dealing with time series arima function, while iterating at some stage there is a error, like Error in arima(x, c(p, 0, q)) : non-stationary AR part from CSS i want to know at which step this error occurred print that iterating number e.g. x-c(1:10) for (i in 1:5 ){ z-arima(x[i]) print(z) } if error occurred in arima function at i=3 step, it should report execute complete loop until i=5 -- View this message in context: http://r.789695.n4.nabble.com/how-to-know-perfect-execution-of-function-if-error-occurred-in-execution-how-to-report-it-tp4498037p4498037.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Claudia Beleites Spectroscopy/Imaging Institute of Photonic Technology Albert-Einstein-Str. 9 07745 Jena Germany email: claudia.belei...@ipht-jena.de phone: +49 3641 206-133 fax: +49 2641 206-399 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] remoting ESS/R with tramp
Tom, what happens with: (Emacs) M-x ssh t (you should have the remote shell buffer now) R (once R is started) M-x ess-remote r ? Claudia -- Claudia Beleites Spectroscopy/Imaging Institute of Photonic Technology Albert-Einstein-Str. 9 07745 Jena Germany email: claudia.belei...@ipht-jena.de phone: +49 3641 206-133 fax: +49 2641 206-399 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Base function for flipping matrices
Hadley, I started to throw some functions that I needed to be extended to arrays together as package arrayhelpers. If you consider that a good home for the new functions, they would be more than welcome. Currently I have the package at r-forge, but I wouldn't mind github, either (so far I just use git-svn). Unit tests use svUnit, not testthat, though. Happy new year to everyone, Claudia Am 02.01.2012 18:38, schrieb Richard M. Heiberger: Hadley, Your request is reminding me of the analysis of aray functions in Philip S Abrams dissertation http://www.slac.stanford.edu/cgi-wrap/getdoc/slac-r-114.pdf AN APL MACHINE The section that starts on page 17 with this paragraph is the one that immediately applies C. The Standard Form for Select Expressions In this section the selection operators considered are take, drop, reversal, transpose, and subscripting by scalars or _J-vectors. Because of the similarity among the selection operators, we might expect that an expression consisting only of selection operators applied to a single array could be expressed equivalently in terms of some simpler set of operators. This expectation is fulfilled in the standard form for select expressions, to be discussed below. I look forward to seeing where you take this in R. Rich On Mon, Jan 2, 2012 at 8:38 AM, Hadley Wickham had...@rice.edu wrote: But if not, it seems to me that it should be added as an array method to ?rev with an argument specifying which indices to rev() over. Yes, agreed. Sometimes arrays seem like something bolted onto R that is missing a lot of functionality. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Claudia Beleites Spectroscopy/Imaging Institute of Photonic Technology Albert-Einstein-Str. 9 07745 Jena Germany email: claudia.belei...@ipht-jena.de phone: +49 3641 206-133 fax: +49 2641 206-399 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to get intecerpt standard error in PLS
Am 24.10.2011 09:07, schrieb Jeff Newmiller: Insufficient problem specification. Read the posting guide and try again with reproducible code and platform identification. --- Jeff Newmiller The . . Go Live... DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. arunkumarakpbond...@gmail.com wrote: Hi how do we get intercepts standard error. I'm using the package pls. i got the coefficient but not able to get the stabdard error I think the answer is just along the lines of Bjørn-Helge Mevik's answer to your previous question. That being said, maybe you could report the variation (std. dev, IQR, ...) of the intercept observed during bootstrap or iterated (repeated) cross validation/jackknife instead of the standard error. Claudia -- View this message in context: http://r.789695.n4.nabble.com/How-to-get-intecerpt-standard-error-in-PLS-tp3932104p3932104.html Sent from the R help mailing list archive at Nabble.com. _ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Claudia Beleites Spectroscopy/Imaging Institute of Photonic Technology Albert-Einstein-Str. 9 07745 Jena Germany email: claudia.belei...@ipht-jena.de phone: +49 3641 206-133 fax: +49 2641 206-399 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Package snow: is there any way to check if a cluster is acticve
Sören, have a look at package snowfall which provides sfIsRunning. HTH Claudia Am 13.10.2011 06:34, schrieb Søren Højsgaard: Is there a 'proper' way of checking if cluster is active. For example, I create a cluster called .PBcluster str(.PBcluster) List of 4 $ :List of 3 ..$ con :Classes 'sockconn', 'connection' atomic [1:1] 3 .. .. ..- attr(*, conn_id)=externalptr ..$ host: chr localhost ..$ rank: int 1 ..- attr(*, class)= chr SOCKnode $ :List of 3 Then I stop it with stopCluster(.PBcluster) .PBcluster [[1]] $con Error in summary.connection(x) : invalid connection str(.PBcluster) List of 4 $ :List of 3 ..$ con :Classes 'sockconn', 'connection' atomic [1:1] 3 .. .. ..- attr(*, conn_id)=externalptr ..$ host: chr localhost ..$ rank: int 1 ..- attr(*, class)= chr SOCKnode - but is there a way in which I can check if the cluster is active?? Regards Søren __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Claudia Beleites Spectroscopy/Imaging Institute of Photonic Technology Albert-Einstein-Str. 9 07745 Jena Germany email: claudia.belei...@ipht-jena.de phone: +49 3641 206-133 fax: +49 2641 206-399 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] speed up this algorithm (apply-fuction / 4D array)
here's another one - which is easier to generalize: x - array(rnorm(50 * 50 * 50 * 91, 0, 2), dim=c(50, 50, 50, 91)) y - x [,,,1:90] # decide yourself what to do with slice 91, but # 91 is not divisible by 3 system.time ({ dim (y) - c (50, 50, 50, 3, 90 %/% 3) y - aperm (y, c (4, 1:3, 5)) v2 - colMeans (y) }) User System verstrichen 0.320.080.40 (my computer is a bit slower than Bill's:) system.time (v1 - f1 (x)) User System verstrichen 0.360 0.030 0.396 Claudia Am 05.10.2011 20:24, schrieb William Dunlap: I corrected your code a bit and put it into a function, f0, to make testing easier. I also made a small dataset to make testing easier. Then I made a new function f1 which does what f0 does in a vectorized manner: x- array(rnorm(50 * 50 * 50 * 91, 0, 2), dim=c(50, 50, 50, 91)) xsmall- array(log(seq_len(2 * 2 * 2 * 91)), dim=c(2, 2, 2, 91)) f0- function(x) { data_reduced- array(0, dim=c(dim(x)[1:3], trunc(dim(x)[4]/3))) reduce- seq(1, dim(x)[4]-1, by=3) for( i in 1:length(reduce) ) { data_reduced[ , , , i]- apply(x[ , , , reduce[i] : (reduce[i]+2) ], 1:3, mean) } data_reduced } f1- function(x) { reduce- seq(1, dim(x)[4]-1, by=3) data_reduced- (x[, , , reduce] + x[, , , reduce+1] + x[, , , reduce+2]) / 3 data_reduced } The results were: system.time(v1- f1(x)) user system elapsed 0.280 0.040 0.323 system.time(v0- f0(x)) user system elapsed 73.760 0.060 73.867 all.equal(v0, v1) [1] TRUE I thought apply would already vectorize, rather than loop over every coordinate. No, you have that backwards. Use *apply functions when you cannot figure out how to vectorize. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Martin Batholdy Sent: Wednesday, October 05, 2011 10:40 AM To: R Help Subject: [R] speed up this algorithm (apply-fuction / 4D array) Hi, I have this sample-code (see above) and I was wondering wether it is possible to speed things up. What this code does is the following: x is 4D array (you can imagine it as x, y, z-coordinates and a time-coordinate). So x contains 50x50x50 data-arrays for 91 time-points. Now I want to reduce the 91 time-points. I want to merge three consecutive time points to one time-points by calculating the mean of this three time-points for every x,y,z coordinate. The reduce-sequence defines which time-points should get merged. And the apply-function in the for-loop calculates the mean of the three 3D-Arrays and puts them into a new 4D array (data_reduced). The problem is that even in this example it takes really long. I thought apply would already vectorize, rather than loop over every coordinate. But for my actual data-set it takes a really long time ... So I would be really grateful for any suggestions how to speed this up. x- array(rnorm(50 * 50 * 50 * 90, 0, 2), dim=c(50, 50, 50, 91)) data_reduced- array(0, dim=c(50, 50, 50, 90/3)) reduce- seq(1,90, 3) for( i in 1:length(reduce) ) { data_reduced[ , , , i]-apply(x[ , , , reduce[i] : (reduce[i]+3) ], 1:3, mean) } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Claudia Beleites Spectroscopy/Imaging Institute of Photonic Technology Albert-Einstein-Str. 9 07745 Jena Germany email: claudia.belei...@ipht-jena.de phone: +49 3641 206-133 fax: +49 2641 206-399 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Package for Neural Network Classification Quality
Alejandro, Hi, somebody knows about one R-package which i can evaluate quality (recall, precision, accuracy, etc) of Neural network classification with more than 2 classes. I found ROCT package, http://cran.r-project.org/web/packages/ROCR/index.html, but it only workes with binary classifications, I guess that is because strictly these measures are defined for binary problems (though I expand them to multi-class situations by using class-A ./. not-class-A binary measures which comes quite naturally for my classes). In case you need something that takes soft or fuzzy class measures: I put my ideas about that into package softclassval and would much appreciate feedback. Best, Claudia Best regards, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Claudia Beleites Spectroscopy/Imaging Institute of Photonic Technology Albert-Einstein-Str. 9 07745 Jena Germany email: claudia.belei...@ipht-jena.de phone: +49 3641 206-133 fax: +49 2641 206-399 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predictive accuracy
Al, I'd redo everything and report in the paper that your peculiar predictor was contributing strongly to models that were built without excluding this predictor. This is an important information: your models get confused by the predictor (I'd consider this a lack of a certain kind of robustness, but I'm not a statistician). HTH Claudia Am 26.05.2011 14:42, schrieb El-Tahtawy, Ahmed: I am trying to develop a prognostic model using logistic regression. I built a full , approximate models with the use of penalization - design package. Also, I tried Chi-square criteria, step-down techniques. Used BS for model validation. The main purpose is to develop a predictive model for future patient population. One of the strong predictor pertains to the study design and would not mean much for a clinician/investigator in real clinical situation and have been asked to remove it. Can I propose a model and nomogram without that strong -irrelevant predictor?? If yes, do I need to redo model calibration, discrimination, validation, etc...?? or just have 5 predictors instead of 6 in the prognostic model?? Thanks for your help Al . [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Claudia Beleites Spectroscopy/Imaging Institute of Photonic Technology Albert-Einstein-Str. 9 07745 Jena Germany email: claudia.belei...@ipht-jena.de phone: +49 3641 206-133 fax: +49 2641 206-399 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with 2-D plot of k-mean clustering analysis
Hi Meng, I would like to use R to perform k-means clustering on my data which included 33 samples measured with ~1000 variables. I have already used kmeans package for this analysis, and showed that there are 4 clusters in my data. However, it's really difficult to plot this cluster in 2-D format since the huge number of variables. One possible way is to project the multidimensional space into 2-D platform, but I could not find any good way to do that. Any suggestions or comments will be really helpful! For suggestions it would be extremely helpful to tell us what kind of variables your 1000 variables are. Parallel coordinate plots plot values over (many) variables. Whether this is useful, depends very much on your variables: E.g. I have spectral channels, they have an intrinsic order and the values have physically the same meaning (and almost the same range), so the parallel coordinate plot comes naturally (it produces in fact the spectra). Claudia Thanks, Meng [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Claudia Beleites Spectroscopy/Imaging Institute of Photonic Technology Albert-Einstein-Str. 9 07745 Jena Germany email: claudia.belei...@ipht-jena.de phone: +49 3641 206-133 fax: +49 2641 206-399 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Outlier removal by Principal Component Analysis : error message
Dear Boule, thank you for your interest in hyperSpec. In order to look into your *problem* I need some more information. I suggest that we solve the error off-list. Please note also that hyperSpec has its own help mailing list: hyperspec-h...@lists.r-forge.r-project.org (due to the amount of spam I got to moderate, you need to subscribe first here: https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/hyperspec-help) - Which version of hyperSpec do you use? If it is the version from CRAN, could you please update to the development version at r-forge with install.packages(hyperSpec,repos=http://R-Forge.R-project.org;) ? - Next, if the problem persists with the latest build, could you send me the raw data file so that I can exactly reproduce your problem? - Also, for tracking down the exact source of the error, please execute traceback () after you got the error and email me its output. It is basically impossible to give general recommendations about *Outlier detection*: a few spectra that are very different from all other spectra may be outliers or they may be the target of a study... This is also why the example in the vignette uses a two step procedure: PCA only identifies suspects, i.e. spectra that have very different scores than all others for some principal components. The second step is a manually supervised decision whether the spectrum is really an outlier. The first step could be replaced by other measures that however depend on your data. E.g. if you expect/know your data to consist of different clusters, suspects could be spectra that are too far away from any cluster. If your data comes from a mixture of a few components, spectra that cannot be modeled decently by a few PLS components could be suspicious. Or spectra that require an own component, ... Some kinds of outliers are actually well-defined in a spectroscopic sense, e.g. contamination by fluorescent lamp light. The second step could be replaced by an automatic decision, e.g. with a distance threshold. Personally, I rather use the term filtering for such automatic rules. And there you can think about any number of rules your spectra must comply with in order to be acceptable: signal to noise ratio, minimal and maximal intensity, original offset (baseline) less than, ... Hope that helps, Claudia I am currently analysis Raman spectroscopic data with the hyperSpec package. I consulted the documentation on this package and I found an example work-flow dedicated to Raman spectroscopy (see the address : http://hyperspec.r-forge.r-project.org/chondro.pdf) I am currently trying to remove outliers thanks to PCA just as they did in the documentation, but I get a message error I can't explain. Here is my code : #import the data : T=read.table('bladder bis concatenation colonne.txt',header=TRUE) spec=new(hyperSpec,wavelength=T[,1],spc=t(T[,-1]),data=data.frame(sample=colnames(T[,-1])),label=list(.wavelength=Raman shift (cm-1),spc=Intensity (a.u.))) #baseline correction of the spectra spec=spec[,,500~1800] bl=spc.fit.poly.below(spec) spec=spec-bl #normalization of the spectra spec=sweep(spec,1,apply(spec,1,mean),'/') #PCA pca=prcomp(~ spc,data=spec$.,center=TRUE) scores=decomposition(spec,pca$x,label.wavelength=PC,label.spc=score/a.u.) loadings=decomposition(spec,t(pca$rotation),scores=FALSE,label.spc=laoding I/a.u.) #plot the scores of the first 20 PC against all other to have an idea where to find the outliers pairs(scores[[,,1:20]],pch=19,cex=0.5) #identify the outliers thanks to map.identify out=map.identify(scores[,,5]) Erreur dans `[.data.frame`(x@data, , j, drop = FALSE) : undefined columns selected Does anybody understand where the problem comes from ? And does anybody know another mean to find spectra outliers ? Thank you in advance. Boule -- View this message in context: http://r.789695.n4.nabble.com/Outlier-removal-by-Principal-Component-Analysis-error-message-tp3496023p3496023.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Claudia Beleites Spectroscopy/Imaging Institute of Photonic Technology Albert-Einstein-Str. 9 07745 Jena Germany email: claudia.belei...@ipht-jena.de phone: +49 3641 206-133 fax: +49 2641 206-399 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ROCR - best sensitivity/specificity tradeoff?
Christian, My questions concerns the ROCR package and I hope somebody here on the list can help - or point me to some better place. When evaluating a model's performane, like this: pred1- predict(model, ..., type=response) pred2- prediction(pred1, binary_classifier_vector) perf- performance(pred, sens, spec) (Where prediction and performance are ROCR-functions.) How can I then retrieve the cutoff value for the sensitivity/specificity tradeoff with regard to the data in the model (e.g. model = glm(binary_classifier_vector ~ data, family=binomial, data=some_dataset)? Perhaps I missed something in the manual? Or do I need an entirely different approach for this? Or is there an alternative solution? a) look into the performance object, you find all values there b) have a look at this thread https://stat.ethz.ch/pipermail/r-help/attachments/20100523/51ec813f/attachment.pl http://finzi.psych.upenn.edu/Rhelp10/2010-May/240021.html http://finzi.psych.upenn.edu/Rhelp10/2010-May/240043.html Claudia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Element by element mean of a list of matrices
Peter, as the matrices in the list have the same shape, you can unlist them into an array and then use rowMeans. HTH Claudia Am 15.03.2011 21:17, schrieb hihi: Hi All, is there any effiective and dense/compact method to calculate the mean of a list of - of course coincident - matrices on an element by element basis? The resulting matrix' [i, j]-th element is the mean of the list's matrices' [i, j]-th elements respectively... Iterating by for statement is quite straightforward, but I am seeking for a more elegant solution, and my attempt with the apply family of functions did not work. Thank you, by Peter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Large dataset operations
Haakon, as replicates imply that they all have the same data type, you can put them into a matrix which is often faster and needs less memory (though whether that can really matter depends of the number of replicates you have: for small no of replicates you won't have much effect anyways). But I find it handy to have the matrix of replicates with data$rep. data - data.frame (plateNo = a, Well = b, rep = I (cbind (c, d, e))) data plateNo Well rep.c rep.d rep.e 11 A01 1312 963 1172 21 A02 10464 6715 5628 31 A03 3301 3257 3281 41 A04 3895 3350 3496 51 A05 8731 7389 5701 62 A01 7893 6748 5920 72 A02 2912 2385 2586 82 A03 985 785 809 92 A04 1346 1018 1001 10 2 A05 794 314 486 dim (data) [1] 10 3 Then: data$norm - data$rep / apply (data$rep, 2, ave, plateNo = data$plateNo) you can also do the division into the apply: data$norm - apply (data$rep, 2, function (x) x / ave(x, plateNo = data$plateNo)) If you always have the sampe number of wells per plate, you could also fold the data$rep matrix into an array: arep - array (data$rep, dim = c (2, 5, 3)) anorm - arep / rep (colMeans (arep), each = 2) dim (anorm) - dim (data$rep) data$norm - anorm Here are some microbenchmark results: Unit: nanoeconds min lq median uq max [1,] 1525160 1561280 1627620 1685020 3575719 [2,] 1505641 1539500 1560301 1649081 3538001 [3,] 113321 115041 115821 116881 155681 [4,] 2589800 2627280 2662540 2794920 4646399 1 and 2 are the two apply versions above. 3 is the array 4 are your loops HTH Claudia Am 11.03.2011 18:38, schrieb hi Berven: Hello all, I'm new to R and trying to figure out how to perform calculations on a large dataset (300 000 datapoints). I have already made some code to do this but it is awfully slow. What I want to do is add a new column for each rep_ column where I have taken each value and divide it by the mean of all values where PlateNo is the same. My data is in the following format: data PlateNo Well rep_1 rep_2 rep_3 1 A01 1312 963 1172 1 A02 10464 6715 5628 1 A03 3301 3257 3281 1 A04 3895 3350 3496 1 A05 8731 7389 5701 2 A01 7893 6748 5920 2 A02 2912 2385 2586 2 A03 985 785 809 2 A04 13462 1018 1001 2 A05 794 314 486 To generate it copy: a- c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2) b- c(A01, A02, A03, A04, A05, A01, A02, A03, A04, A05) c- c(1312, 10464, 3301, 3895, 8731, 7893, 2912, 985, 1346, 794) d- c(963, 6715, 3257, 3350, 7389, 6748, 2385, 785, 1018, 314) e- c(1172, 5628, 3281, 3496, 5701, 5920, 2586, 809, 1001, 486) data- data.frame(plateNo = a, Well = b, rep_1 = c, rep_2 = d, rep_3 = e) Here is the code I have come up with: rows- length(data$plateNo) reps- 3 norm- list() for (rep in 1:reps) { x- paste(rep_,rep,sep=) normx- paste(normalised_,rep,sep=) for (row in 1:rows) { plateMean- mean(data[[x]][data$plateNo == data$plateNo[row]]) wellData- data[[x]][row] norm[[normx]][row]- wellData / plateMean } } Any help or tips would be greatly appreciated! Thanks, Haakon [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plot with same font like in LaTeX
Jonas, have a look at tikzdevice Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Incorrectness of mean()
On 02/28/2011 11:07 AM, zbynek.jano...@gmail.com wrote: I have found following problem: I have a vector: a- c(1.04,1.04,1.05,1.04,1.04) I want a mean of this vector: mean(a) [1] 1.042 which is correct, but: mean(1.04,1.04,1.05,1.04,1.04) [1] 1.04 gives an incorrect value. how is this possible? the x that is averaged is only the first 1.04, the other numbers go into mean's ... argument and are ignored. Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] re-arranging data to create an image (or heatmap)
Pierz, - easy approximation: use coulours with alpha value for plotting points x - runif (1000) y - runif (1000) * x + x plot (x, y) plot (x, y, pch = 20, col = #FF20) - more sophisticated: have a look at hexbin (package hexbin), levelplot (package lattice), package ggplot2 with stat_sum, stat_binhex, or stat_density2d (e.g. http://had.co.nz/ggplot2/stat_sum.html) HTH Claudia On 02/28/2011 03:21 PM, pierz wrote: Let me start by introducing myself as a biologist with only a little knowledge about programming in matlab and R. In the past I have succesfully created my figures in matlab using the hist3d command, but I have not access to matlab right now and would like to switch to R. I have used the plot command to create a figure of my data and it does almost what I want it to do. My data matrix looks like this (these are the first few lines from it, copied from R console): Time Abs [1,] 0.0971428624 [2,] 0.1942857124 [3,] 0.1942857124 [4,] 0.2914285724 [5,] 0.3885714323 [6,] 0.3885714322 [7,] 0.4857142923 [8,] 0.5828571421 [9,] 0.5828571421 [10,] 0.680023 [11,] 0.680025 [12,] 0.680023 [13,] 0.7771428623 [14,] 0.7771428623 [15,] 0.8742857121 [16,] 0.8742857120 [17,] 0.8742857122 [18,] 1.0685714323 [19,] 1.0685714325 The example shows that some of the plotted points appear more than once. I would like to use a heatmap to show that these points have more weight, but I have difficulty arranging the data to be plotted correctly using the image() or heatmap() command. So what I would want to do is to get the same figure as when I use the plot command, but have colors representing the weight of the plotted points (wether they occur once, twice or multiple times). I have tried searching this forum and also used google, but I seem to keep going in circles. I think that the image() command fits my needs, but that my input data is not in the correct format. Attached I have an image example from R and an image example from matlab. This is how far I got using R: http://r.789695.n4.nabble.com/file/n3327986/example_R.jpg This is the result I am aiming for: http://r.789695.n4.nabble.com/file/n3327986/example_matlab.jpg -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems using unique function and !duplicated
Jon, you need to combine the conditions into one logical value, e.g. cond1 cond2, e.g. !duplicated(test$date) !duplicated(test$var2) However, I doubt that this is what you want: you remove too many rows (rows whose single values appeared already, even if the combination is unique). Have a look at the wiki, though: http://rwiki.sciviews.org/doku.php?id=tips:data-frames:count_and_extract_unique_rows Claudia On 02/28/2011 04:51 PM, JonC wrote: Hi, I am trying to simultaneously remove duplicate variables from two or more variables in a small R data.frame. I am trying to reproduce the SAS statements from a Proc Sort with Nodupkey for those familiar with SAS. Here's my example data : test- read.csv(test.csv, sep=,, as.is=TRUE) test date var1 var2 num1 num2 1 28/01/11a1 213 71 2 28/01/11b1 141 47 3 28/01/11c2 867 289 4 29/01/11a2 234 78 5 29/01/11b2 666 222 6 29/01/11c2 912 304 7 30/01/11a3 417 139 8 30/01/11b3 108 36 9 30/01/11c2 288 96 I am trying to obtain the following, where duplicates of date AND var2 are removed from the above data.frame. datevar1var2num1num2 28/01/2011 a 1 21371 28/01/2011 c 2 867289 29/01/2011 a 2 23478 30/01/2011 c 2 28896 30/01/2011 a 3 417139 If I use the !duplicated function with one variable everything works fine. However I wish to remove duplicates of both Date and var2. test[!duplicated(test$date),] date var1 var2 num1 num2 1 0011-01-28a1 213 71 4 0011-01-29a2 234 78 7 0011-01-30a3 417 139 test2- test[!duplicated(test$date),!duplicated(test$var2),] Error in `[.data.frame`(test, !duplicated(test$date), !duplicated(test$var2), : undefined columns selected I get an error ? I got different errors when using the unique() function. Can anybody solve this ? Thanks in advance. Jon -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The L Word
On 02/24/2011 11:20 AM, Prof Brian Ripley wrote: On Thu, 24 Feb 2011, Tal Galili wrote: Thank you all for the answers. So if I may extend on the question - When is it important to use 'Literal integer'? Under what situations could not using it cause problems? Is it a matter of efficiency or precision or both? Efficiency: it avoids unnecessary type conversions. For example length(x) 1 has to coerce the lhs to double. We have converted the base code to use integer constants because such small efficiency gains can add up. Integer vectors can be stored more compactly than doubles, but that is not going to help for length 1: object.size(1) 48 bytes object.size(1L) 48 bytes (32-bit system). see: n - 0L : 100L szi - sapply (n, function (n) object.size (integer (n))) szd - sapply (n, function (n) object.size (double (n))) plot (n, szd) points (n, szi, col = red) Thanks, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Wed, Feb 23, 2011 at 6:15 PM, Tsjerk Wassenaar tsje...@gmail.com wrote: Hi Gene, It means 'Literal integer'. So 1L is a proper integer 1, and 0L is a proper integer 0. Hope it helps, Tsjerk On Wed, Feb 23, 2011 at 5:08 PM, Gene Leynes gleyne...@gmail.com wrote: I've been wondering what L means in the R computing context, and was wondering if someone could point me to a reference where I could read about it, or tell me what it's called so that I can search for it myself. (L by itself is a little too general for a search term). I encounter it in strange places, most recently in the save documentation. save(..., list = character(0L), file = stop('file' must be specified), ascii = FALSE, version = NULL, envir = parent.frame(), compress = !ascii, compression_level, eval.promises = TRUE, precheck = TRUE) I remember that you can also find it when you step inside an apply function: sapply(1:10, function(x)browser()) Called from: FUN(1:10[[1L]], ...) I apologize for being vague, it's just something that I would like to understand about the R language (the R word). Thank you! Gene [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Tsjerk A. Wassenaar, Ph.D. post-doctoral researcher Molecular Dynamics Group * Groningen Institute for Biomolecular Research and Biotechnology * Zernike Institute for Advanced Materials University of Groningen The Netherlands __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The L Word
On 02/24/2011 05:14 PM, Hadley Wickham wrote: Note however that I've never seen evidence for a *practical* difference in simple cases, and also of such cases as part of a larger computation. But I'm happy to see one if anyone has an interesting example. E.g., I would typically never use 0L:100L instead of 0:100 in an R script because I think code readability (and self explainability) is of considerable importance too. But : casts to integer anyway: I know - I just thought that on _this_ thread I ought to write it with L ;-) and I don't think I write 1L : 100L in real life. I use the L far more often as a reminder than for performance. Particularly in function definitions. str(0:100) int [1:101] 0 1 2 3 4 5 6 7 8 9 ... And performance in this case is (obviously) negligible: library(microbenchmark) microbenchmark(as.integer(c(0, 100)), times = 1000) Unit: nanoeconds min lq median uq max as.integer(c(0, 100)) 712 791813 896 15840 (mainly included as opportunity to try out microbenchmark) So you save ~800 ns but typing two letters probably takes 0.2 s (100 wpm, ~ 5 letters per word + space = 0.1s per letter), so it only saves you time if you're going to be calling it more than 125000 times ;) calling 125000 times happens in my real life. I have e.g. one data set with 2e5 spectra (and another batch of that size waiting for me), so anything done for each spectrum reaches this number each time the function is needed. Also of course, the conversion time goes with the length of the vector. On the other hand, in 95 % of the cases taking an hour to think about the algorithm will have much larger effects ;-). Also, I notice that the first few measures of microbenchmark are often much longer (for fast operations). Which may just indicate that the total speed depends much more on whether the code allows caching or not. And that may mean that any such coding details may or may not help at all: A single such conversion may take disproportionally much more time. I just (yesterday) came across a situation where the difference between numeric and integer does matter (considering that I do that with ≈ 3e4 x 125 x 6 array size): as.factor microbenchmark (i = as.factor (1:1e3), d = as.factor ((1:1e3)+0.0)) Unit: nanoeconds min lq median uq max i 884039 891106 895847 901630 2524877 d 2698637 2770936 2778271 2807572 4266197 but then: microbenchmark ( sd = structure ((1:1e3)+0.0, .Label = 1:100, class = factor), si = structure ((1:1e3)+0L, .Label = 1:100, class = factor)) Unit: nanoeconds min lq median uq max sd 52875 53615 54040 54448 1385422 si 45904 46936 47332 47778 65360 Cheers, Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The L Word
On 02/23/2011 05:08 PM, Gene Leynes wrote: I've been wondering what L means in the R computing context, and was wondering if someone could point me to a reference where I could read about it, or tell me what it's called so that I can search for it myself. (L by itself is a little too general for a search term). It means that the number is an integer (a _L_ong integer of 32 bit actually) I encounter it in strange places, most recently in the save documentation. save(..., list = character(0L), file = stop('file' must be specified), ascii = FALSE, version = NULL, envir = parent.frame(), compress = !ascii, compression_level, eval.promises = TRUE, precheck = TRUE) I remember that you can also find it when you step inside an apply function: sapply(1:10, function(x)browser()) Called from: FUN(1:10[[1L]], ...) I apologize for being vague, it's just something that I would like to understand about the R language (the R word). Thank you! Gene [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sort a 3 dimensional array across third dimension ?
Dear James, this is what I understood your sorting along the third dimension to be: x - array(c(9, 9, 7, 9, 6, 5, 4, 6, 2, 1, 3, 2), dim = list(2, 2, 3)) y - apply (x, 1:2, sort) y , , 1 [,1] [,2] [1,]21 [2,]65 [3,]99 , , 2 [,1] [,2] [1,]32 [2,]46 [3,]79 The results of apply are length (result of function) x [shape of x without the dimensions you hand to apply). Thus, your specified result needs rearranging the dimensions: y - aperm (y, c(2, 3, 1)) y , , 1 [,1] [,2] [1,]23 [2,]12 , , 2 [,1] [,2] [1,]64 [2,]56 , , 3 [,1] [,2] [1,]97 [2,]99 HTH Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] {Spam?} Re: sort a 3 dimensional array across third dimension ?
On 02/18/2011 04:11 PM, Maas James Dr (MED) wrote: Hi Claudia, It does help a lot, but not quite there yet ... I'm sure you are correct and is much appreciated, I need some sort of generalized form, actual arrays in my case are 3x3x1000. Do you suspect it could be done in one step with sapply? Why sapply? Sure you can to it in one step: y- aperm (apply (x, 1:2, sort), c(2, 3, 1)) I just think two lines are more readable. Note that all these numbers are the directions of the array and don't have anything to do with the actual size. Just try it out with different array sizes. a - array (runif (9000), c (3, 3, 1000)) a [,,1:2] , , 1 [,1] [,2][,3] [1,] 0.8721 0.5102 0.47370 [2,] 0.7721 0.5744 0.98281 [3,] 0.9357 0.1969 0.08784 , , 2 [,1] [,2] [,3] [1,] 0.1485 0.6878 0.1018 [2,] 0.3784 0.3864 0.9814 [3,] 0.9219 0.5664 0.4565 y- aperm (apply (a, 1:2, sort), c(2, 3, 1)) y [,,1:2] , , 1 [,1] [,2] [,3] [1,] 1.121e-03 1.517e-03 0.0008285 [2,] 7.118e-05 3.303e-04 0.0003870 [3,] 7.445e-04 2.461e-05 0.0005980 , , 2 [,1] [,2] [,3] [1,] 0.001375 0.0049272 0.004581 [2,] 0.002204 0.0004947 0.001148 [3,] 0.004214 0.0006355 0.001610 y [,,999:1000] , , 1 [,1] [,2] [,3] [1,] 0.9989 0.9980 0.9998 [2,] 0.9982 0.9973 0.9994 [3,] 0.9994 0.9978 0.9993 , , 2 [,1] [,2] [,3] [1,] 0.9997 0.9992 0. [2,] 0.9986 0.9981 0.9997 [3,] 0.9998 0.9988 0.9996 BTW: as your MARGINS are short, only 3 x 3 = 9 calls to FUN are necessary. I don't think you can gain much time here. The calculation with 3 x 3 x 1000 on my computer had 3 ms elapsed, and increasing every direction by a factor of 10 still needs 1/3 s. Claudia Regards J === Dr. Jim Maas Research Associate in Network Meta-Analysis School of Medicine, Health Policy and Practice CD Annex, Room 1.04 University of East Anglia Norwich, UK NR4 7TJ +44 (0) 1603 591412 From: Claudia Beleites [mailto:cbelei...@units.it] Dear James, this is what I understood your sorting along the third dimension to be: x- array(c(9, 9, 7, 9, 6, 5, 4, 6, 2, 1, 3, 2), dim = list(2, 2, 3)) y- apply (x, 1:2, sort) y , , 1 [,1] [,2] [1,]21 [2,]65 [3,]99 , , 2 [,1] [,2] [1,]32 [2,]46 [3,]79 The results of apply are length (result of function) x [shape of x without the dimensions you hand to apply). Thus, your specified result needs rearranging the dimensions: y- aperm (y, c(2, 3, 1)) y , , 1 [,1] [,2] [1,]23 [2,]12 , , 2 [,1] [,2] [1,]64 [2,]56 , , 3 [,1] [,2] [1,]97 [2,]99 HTH Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] segfault during example(svm)
Dear Jürgen, did you update.packages (checkBuilt = TRUE) ? I recently had segfaults, too on 64bit linux (with rgl, though) and they disappeared only after updating with checkBuilt (including also the packages originally installed via Dirk's .deb packages. HTH, Claudia On 02/18/2011 09:32 PM, Juergen Rose wrote: Am Freitag, den 18.02.2011, 11:53 -0800 schrieb Peter Ehlers: On 2011-02-18 11:16, Juergen Rose wrote: If do: library(e1071) example(svm) I get: svm data(iris) svm attach(iris) svm ## classification mode svm # default with factor response: svm model- svm(Species ~ ., data = iris) svm # alternatively the traditional interface: svm x- subset(iris, select = -Species) svm y- Species svm model- svm(x, y) svm print(model) Call: svm.default(x = x, y = y) Parameters: SVM-Type: C-classification SVM-Kernel: radial cost: 1 gamma: 0.25 Number of Support Vectors: 51 svm summary(model) Call: svm.default(x = x, y = y) Parameters: SVM-Type: C-classification SVM-Kernel: radial cost: 1 gamma: 0.25 Number of Support Vectors: 51 ( 8 22 21 ) Number of Classes: 3 Levels: setosa versicolor virginica svm # test with train data svm pred- predict(model, x) svm # (same as:) svm pred- fitted(model) svm # Check accuracy: svm table(pred, y) y pred setosa versicolor virginica setosa 50 0 0 versicolor 0 48 2 virginica 0 248 svm # compute decision values and probabilities: svm pred- predict(model, x, decision.values = TRUE) svm attr(pred, decision.values)[1:4,] setosa/versicolor setosa/virginica versicolor/virginica 1 1.196152 1.0914600.6705626 2 1.064621 1.0563320.8479934 3 1.180842 1.0745340.6436474 4 1.110699 1.0531430.6778595 svm # visualize (classes by color, SV by crosses): svm plot(cmdscale(dist(iris[,-5])), svm+ col = as.integer(iris[,5]), svm+ pch = c(o,+)[1:150 %in% model$index + 1]) *** caught segfault *** address (nil), cause 'unknown' Traceback: 1: .Call(La_rs, x, only.values, PACKAGE = base) 2: eigen(-x/2, symmetric = TRUE) 3: cmdscale(dist(iris[, -5])) 4: plot(cmdscale(dist(iris[, -5])), col = as.integer(iris[, 5]), pch = c(o, +)[1:150 %in% model$index + 1]) 5: eval.with.vis(expr, envir, enclos) 6: eval.with.vis(ei, envir) 7: source(tf, local, echo = echo, prompt.echo = paste(prompt.prefix, getOption(prompt), sep = ), continue.echo = paste(prompt.prefix, getOption(continue), sep = ), verbose = verbose, max.deparse.length = Inf, encoding = UTF-8, skip.echo = skips, keep.source = TRUE) 8: example(svm) Possible actions: 1: abort (with core dump, if enabled) .. I did already update.packages(), what can I still do. Works just fine for me. What's your sessionInfo()? Here's mine: sessionInfo() R version 2.12.1 Patched (2010-12-27 r53883) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 [3] LC_MONETARY=English_Canada.1252 LC_NUMERIC=C [5] LC_TIME=English_Canada.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] e1071_1.5-24 class_7.3-3 loaded via a namespace (and not attached): [1] tools_2.12.1 sessionInfo() R version 2.12.1 (2010-12-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base It is working at some of my systems and is failing at the most. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] segfault during example(svm)
... ** testing if installed package can be loaded * DONE (e1071) The downloaded packages are in ‘/tmp/RtmpRJM5aT/downloaded_packages’ Updating HTML index of packages in '.Library' Making packages.html ... done -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] amount of data R can handle in a single file
On 02/17/2011 10:16 AM, Nasila, Mark wrote: Dear Sir/Madam, I would like to know what is the maximum number of observations a single file must have when using R. I am asking this because am trying Dear Mark, to do research on banking transactions and i have around 49million records. Can R handle this? Advise with regard to this. I think R can address up to a length of 2^32 ≈ 4.3e9 elements. 2^32 elements (numeric) = 32 GB per vector (matrix, array). For me, the available RAM is the more important limit: I work without problem with (numeric) matrices of size 2e5 x 250 = 5e7 elements (380 MB) that were produced from 5e4 x 2500 = 1.25e8 elements (≈ 1GB) raw data. The raw data is the practical limit on my 8 GB (64 bit linux) machine: During the processing it becomes complex, thus ≈ 2 GB, and with that I had to be very careful not to copy the matrix too often. This and a bunch of gc() calls let me process the data without swapping. :-) Note that 2 GB corresponds quite nicely to the rule of thumb that the end of fun is reached with variable sizes of 1/3 of the RAM. If you are concerned about your data set, I'd recommend reading a fraction of the data set and have a look at the object.size() and also on how the RAM use is during data analysis of that partial data set. Then extrapolate to the complete data set. HTH Claudia Mark Nasila Quantitative Analyst CBS Risk Management Personal Banking 7th Floor, 2 First Place, Cnr Jeppe and Simmonds Street, Johannesburg, 2000 Tel (011) 371-2406, Fax (011) 352-9812, Cell 083 317 0118 e-mail mnas...@fnb.co.zamailto:mnas...@fnb.co.za www.fnb.co.zahttp://www.fnb.co.za/ www.howcanwehelpyou.co.za http://www.howcanwehelpyou.co.za/ First National Bank - a division of FirstRand Bank Limited. An Authorised Financial Services and Credit Provider (NCRCP20). 'Consider the effect on the environment before printing this email.' To read FirstRand Bank's Disclaimer for this email click on the following address or copy into your Internet browser: https://www.fnb.co.za/disclaimer.html If you are unable to access the Disclaimer, send a blank e-mail to firstrandbankdisclai...@fnb.co.za and we will send you a copy of the Disclaimer. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] When is *interactive* data visualization useful to use?
(which could easily lead to data dredging (were the scope of the multiple comparison needed for correction is not even clear). Sure, yet: - Isn't that what validation was invented for (I mean with a proper, new, [double] blind test set after you decided your parameters)? - Summarizing a whole data set into a few numbers, without having looked at the data itself may not be safe, either: - The few comparisons shouldn't come at the cost of risking a bad modeling modelling strategy and fitting parameters because the data was not properly examined. My 2 ct, Claudia (who in practice warns far more frequently of multiple comparisons and validation sets being compromised (not independent) than of too few data exploration ;-) ) -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] spline interpolation
Hi, Just pure curiosity: may I ask why you want to do spline interplation on fluorescence intensity as function of concentration? Particularly as it looks quite typical for an unknown problem's calibration plot? Claudia On 02/05/2011 03:29 PM, Asan Ramzan wrote: Hello R-help I have the following data for a standard curve concentration(nM),fluorescence 0,48.34 2,58.69 5,70.83 10,94.73 20,190.8 50,436.0 100,�957.9 � (1)Is there function in R�to plot a spline. (2)How can I interpolation,say 1000 point from 0nM-100nM and store this as a data frame of concentration,fluorescence (3)How can I modify�the code below so that instead of retrieving a concentration with the exact value of fluorescence, it gives me concentration for the value that is closest to that fluorescence. � subset(df,fluorescence==200.3456,select=concentration) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Matrix' with Functions
Seems funny to me: f - list (mean, sd, median, sum) dim (f) - c (2, 2) or in one line: f - structure (.Data=list (mean, sd, median, sum), dim = c(2,2)) f [,1] [,2] [1,] ?? [2,] ?? f [1,1] [[1]] function (x, ...) UseMethod(mean) environment: namespace:base f [[1,1]] (1:3) [1] 2 f [[2,1]] (1:3) [1] 1 f [[1,2]] (1:3) [1] 2 f [[2,2]] (1:3) [1] 6 HTH Claudia On 02/03/2011 05:33 PM, Alaios wrote: Dear R members, I have a quite large of function that are named like that f11,f12,...f15 f21,f22,...f25 .. f51,f52,...f52 These are static (hard-coded) functions that the only common they have is that they take the same number and type of input fij(a,b,c,d). As you might understand this is really close to the notion of matrix only that my 'matrix' contains functions. It would be great if I can address all these function using a numbering-scheme like F(i,j) where for example F(1,1) will return the f11(a,b,c,d,). I am sure that this might be quite complex to implement so could you please refer me to some book/tutorial that addresses this kind of topics? I would like to thank you in advance for your help Best Regards Alex __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] slightly off topic...
The fact that the attachment to someone with a .de domain has them labeled mit de Wort Teil suggests that the labeling is being done at the final destination, since the server is set up with English messages. wie er scharfsinnig bemerkte[1] :-) Looking at the ASCII version of one such email reveals: - the part or Teil is how my thunderbird (English at work, German at home) announces the parts of multipart emails. Neither part 1.2 nor Teil 1.2 is written anywhere in the source. - the r-help list footer somehow ended up as separate part of the email Cheers, Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] pdf greek letter typos
Eduardo, On 01/27/2011 12:53 PM, Philipp Pagel wrote: caused by a problem with font substitution in some version of the poppler library which is uses by many LINUX PDF viewers. Try to view the file in acrobat reader and possibly other viewers. I'm running Ubuntu, and uninstalling package ttf-symbols-replacement did the trick for evionce Co. on my system (acrobat reader was never affected; but used to show pdfs with transparency quite ugly - there was a discussion with solutions to both problems on the ggplot2 list last fall). HTH, Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] puzzled with plotmath
Dear all, I'm puzzled with matrix indices in plotmath. I'm plotting matrix elements: Z [i, i], and I'd like to put that as label. I'll describe what I want and what I get in LaTeX-notation. The output should look like Z_{i, i}, and my first try was plot (1, 1, ylab = expression (Z[i, i])) That, however, gives me Z_{i} (no comma, no second i) although the expression looks OK to me: a - expression (Z[i, i]) a [[1]] Z[i, i] str (as.list (a [[1]])) List of 4 $ : symbol [ $ : symbol Z $ : symbol i $ : symbol i I'm able to tweak the ouput looking as I want: plot (1, 1, ylab = expression (Z[i][, ][i])) which is, however, logically very far from what I want to express. What am I missing? I'm almost sure this has been discussed before, but I can't find it: can anyone point me to good search terms? Is it possible to search for the terms being close to each other in RSiteSearch and/or RSeek? I get lots of introductory documents as they point to plotmath and discuss matrices... Thanks a lot for your help, Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] puzzled with plotmath II
sorry, I forgot my sessionInfo: please see below. Original Message Subject: puzzled with plotmath Date: Thu, 20 Jan 2011 12:48:18 +0100 From: Claudia Beleites cbelei...@units.it To: R Help r-help@r-project.org Dear all, I'm puzzled with matrix indices in plotmath. I'm plotting matrix elements: Z [i, i], and I'd like to put that as label. I'll describe what I want and what I get in LaTeX-notation. The output should look like Z_{i, i}, and my first try was plot (1, 1, ylab = expression (Z[i, i])) That, however, gives me Z_{i} (no comma, no second i) although the expression looks OK to me: a - expression (Z[i, i]) a [[1]] Z[i, i] str (as.list (a [[1]])) List of 4 $ : symbol [ $ : symbol Z $ : symbol i $ : symbol i I'm able to tweak the ouput looking as I want: plot (1, 1, ylab = expression (Z[i][, ][i])) which is, however, logically very far from what I want to express. What am I missing? I'm almost sure this has been discussed before, but I can't find it: can anyone point me to good search terms? Is it possible to search for the terms being close to each other in RSiteSearch and/or RSeek? I get lots of introductory documents as they point to plotmath and discuss matrices... Thanks a lot for your help, Claudia sessionInfo () R version 2.12.1 (2010-12-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C LC_TIME=en_US.utf8 [4] LC_COLLATE=en_US.utf8 LC_MONETARY=C LC_MESSAGES=en_US.utf8 [7] LC_PAPER=en_US.utf8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=CLC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] ggplot2_0.8.8 proto_0.3-8 reshape_0.8.3 plyr_1.2.1 loaded via a namespace (and not attached): [1] digest_0.4.2 tools_2.12.1 -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] puzzled with plotmath for matrix indices
Gerrit, thanks viele Grüße nach Oberhessen :-) plot (1, 1, ylab = expression (Z[list(i,i)])) though that cannot be evaluated, either (due to [ not knowing what to do with an index list) for future searches: probably the easiest cheat is, of course, plot (1, 1, ylab = expression (Z[i, i])) Anyways, I put the how to into the R Wiki page on plotmath. And I suggest that it should be mentioned in the plotmath help = email to r-devel. Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] puzzled with plotmath for matrix indices
On 01/20/2011 02:11 PM, Uwe Ligges wrote: On 20.01.2011 14:08, Claudia Beleites wrote: Gerrit, thanks viele Grüße nach Oberhessen :-) plot (1, 1, ylab = expression (Z[list(i,i)])) though that cannot be evaluated, either (due to [ not knowing what to do with an index list) Works for me with a recent R version. Sorry, my comment wasn't clear: sure it produces the desired output, what I meant is: Z [,1] [,2] [1,]13 [2,]24 i - 2 eval (expression (Z[list(i,i)])) Error in Z[list(i, i)] : invalid subscript type 'list' whereas: eval (expression (Z[i,i])) [1] 4 (and of course all the text-based solutions also lack the beauty of the expression actually meaning in R what the output looks like) for future searches: probably the easiest cheat is, of course, plot (1, 1, ylab = expression (Z[i, i])) which is less convenient since you could not replace i by a dynamically calculated number, for example. good point. Thanks, I learn a lot here :-) Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] puzzled with plotmath II
Peter, Look for 'comma-separated list' on the help page!! Yes, seeing the solution I also understand why list is the solution. The special meaning of list () in plotmath was only in my passive vocabulary - and after this discussion I think it is upgraded to active ;-) I have to admit that my coming from matlab (as opposed to lisp) still catches me once in a while: though I was aware that I would somehow need to change the tree of the expression, I went astray because c() still feels to me the more basic function to put things together than list (). A second aspect that put me a bit off the track is that both create expressions that do have a meaning but don't mean in R what I want to express: Z [c(a, b)] is meaningful, but not the same as Z [a, b] Z [list (a, b)] is syntactically correct, but `[.matrix` doesn't accept lists for parameter i) Anyways, thanks a lot for the patience everyone: problem is solved, solutions (including bquote) are to be found in the Wiki, and instead of creating more fuss by unclear emails I'll fetch a coffee before I go on plotting my confusion matrix elements... Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity
On 01/07/2011 06:13 AM, Spencer Graves wrote: A more insidious problem, that may not affect the work of Jonah Lehrer, is political corruption in the way research is funded, with less public and more private funding of research Maybe I'm too pessimistic, but the term _political_ corruption reminds me that I can just as easily imagine a funding bias* in public funding. And I'm not sure it is (or would be) less of a problem just because the interests of private funding are easier to spot. * I think of bias on both sides: the funding agency selecting the studies to support and the researcher subconsciously complying to the expectations of the funding agency. On 01/07/2011 08:06 AM, Peter Langfelder wrote: From a purely statistical and maybe somewhat naive point of view, published p-values should be corrected for the multiple testing that is effectively happening because of the large number of published studies. My experience is also that people will often try several statistical methods to get the most significant p-value but neglect to share that fact with the audience and/or at least attempt to correct the p-values for the selection bias. Even if the number of all the tests were known, I have the impression that the corrected p-value would be kind of the right answer to the wrong question. I'm not particularly interested in the probability of arriving at the presented findings if the null hypothesis were true. I'd rather know the probability that the conclusions are true. Switching to the language of clinical chemistry, this is: I'm presented with the sensitivity of a test, but I really want to know the positive predictive value. What is still missing with the corrected p-values is the prevalence of good ideas of the publishing scientist (not even known for all scientists). And I'm not sure this is not decreasing if the scientist generates and tests more and more ideas. I found my rather hazy thoughts about this much better expressed in the books of Beck-Bornholdt and Dubben (which I'm afraid are only available in German). Conclusion: try to be/become a good scientist: with a high prevalence of good ideas. At least with a high prevalence of good ideas among the tested hypotheses. Including thinking first which hypotheses are the ones to test, and not giving in to the temptation to try out more and more things as one gets more familiar with the experiment/data set/problem. The latter I find very difficult. Including the experience of giving a presentation where I explicitly talked about why I did not do any data-driven optimization of my models. Yet in the discussion I was very prominently told I need to try in addition these other pre-processing techniques and these other modeling techniques - even by people whom I know to be very much aware and concerned about optimistically biased validation results. Which were of course very valid questions (and easy to comply), but I conclude it is common/natural/human to have and want to try out more ideas. Also, after several years in the field and with the same kind of samples of course I run the risk of my ideas being overfit to our kind of samples - this is a cost that I have to pay for the gain due to experience/expertise. Some more thoughts: - reproducibility: I'm analytical chemist. We have huge amounts of work going into round robin trials in order to measure the natural variability of different labs on very defined systems. - we also have huge amounts of work going into calibration transfer, i.e. making quantitative predictive models work on a different instrument. This is always a whole lot of work, and for some fields of problems at the moment considered basically impossible even between two instruments of the same model and manufacturer. The quoted results on the mice are not very astonishing to me... ;-) - Talking about (not so) astonishing differences between between replications of experiments: I find myself moving from reporting ± 1 standard deviation to reporting e.g. the 5th to 95th percentiles. Not only because my data distributions are often not symmetric, but also because I find Im not able to directly perceive the real spread of the data from a standard deviation error bar. This is all about perception, of course I can reflect about the meaning. Such a reflection also tells me that one student having a really unlikely number of right guesses is unlikely but not impossible. There is no statistical law stating that unlikely events happen only with large sample sizes/number of tests. Yet the immediate perception is completely different. - I happily agree with the ideas of publishing findings (conclusions) as well as the data and data analysis code I used to arrive there. But I'm aware that part of this agreement is due to the fact that I'm quite interested in the data analytical methods (I'd say as well as in the particular chemical-analytical problem at hand, but rather
Re: [R] Removing Corrupt file
Vikrant, if you execute the code inside a function like jpegplotfun - function (a, b){ jpeg(mygraph.jpeg) plot(a,b) dev.off() } the dev.off () is not executed if an error occurs before. So the problem is basically that the jpeg file is still open (you may noticed open devices in R as left overs of these errors). See ? try and ? on.exit for ways to deal with situations where you need to clean up after errors. HTH, Claudia On 12/17/2010 08:24 AM, vikrant wrote: Hi I am generating a graph jpeg file using a function R. I m using this script a- 1:10 b- 1:10 jpeg(mygraph.jpeg) { plot(a,b) } dev.off() If by some chance I do miss some values suppose for a , the file gets created initially and then we do not plot anything in it. This file now becomes corrupted and we cannot delete this file from current R Session. I Have tried commands like file.remove() and unlink to remove the corrupt file from current R session. Is there any other way inorder to remove such files?? Thanks Regards, Vikrant -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Summary (Re: (S|odf)weave : how to intersperse (\LaTeX{}|odf) comments in source code ? Delayed R evaluation ?)
Dear Emmanuel and dear list, Therefore, I let this problem to sleep. However, I Cc this answer (with the original question below) to Max Kuhn and Friedrich Leisch, in the (faint) hope that this feature, which does not seem to have been missed by anybody in 8 years, I've been missing it every once in a while, but till now I could always rephrase the problem with expand = FALSE or functions, and the chunk that does the actual calculation at the end. Most often, however, I'm just lazy and use R comments. If math should go in there, I use listings instead of fancyvrb with the modified Sweave.sty that hopefully is attached (if not, see below). Here's an example chunk: keep.source=TRUE= 1 / 2 # $\frac{1}{x}$ 4 + 4 # Here may come lots of explanations, that are in a \LaTeX\ paragraph\footnote{blabla}: even long lines are properly broken.\\ Though the new lines start at the beginning of the line. \\[6pt] And a line break in the chunk source will of course be interpreted as R again: so no new paragraphs inside the same comment. # But there can be new commented lines. 3 + 6 # Note that comment only lines at the end of a code chunk seem to be lost. # Not only one but all that aren't followed by R code @ (the second line should be very long, I somehow can't keep thunderbird from inserting line breaks) Hope that helps a bit, Claudia === modified Sweave.sty === \NeedsTeXFormat{LaTeX2e} \ProvidesPackage{Sweave}{} \RequirePackage{ifthen} \newboolean{swe...@gin} \setboolean{swe...@gin}{true} \newboolean{swe...@ae} \setboolean{swe...@ae}{true} \declareoption{nogin}{\setboolean{swe...@gin}{false}} \declareoption{noae}{\setboolean{swe...@ae}{false}} \ProcessOptions \RequirePackage{graphicx,listings} \IfFileExists{upquote.sty}{\RequirePackage{upquote}}{} \ifthenelse{\boolean{swe...@gin}}{\setkeys{gin}{width=0.8\textwidth}}{}% \ifthenelse{\boolean{swe...@ae}}{% \RequirePackage[T1]{fontenc} \RequirePackage{ae} }{}% \lstnewenvironment{Sinput}{\lstset{language=R,basicstyle=\sl,texcl, commentstyle=\upshape}}{} \lstnewenvironment{Soutput}{\lstset{language=R}}{} \lstnewenvironment{Scode}{\lstset{language=R,basicstyle=\sl}}{} \newenvironment{Schunk}{}{} \newcommand{\Sconcordance}[1]{% \ifx\pdfoutput\undefined% \csname newcount\endcsname\pdfoutput\fi% \ifcase\pdfoutput\special{#1}% \else\immediate\pdfobj{#1}\fi} -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it \NeedsTeXFormat{LaTeX2e} \ProvidesPackage{Sweave}{} \RequirePackage{ifthen} \newboolean{swe...@gin} \setboolean{swe...@gin}{true} \newboolean{swe...@ae} \setboolean{swe...@ae}{true} \declareoption{nogin}{\setboolean{swe...@gin}{false}} \declareoption{noae}{\setboolean{swe...@ae}{false}} \ProcessOptions \RequirePackage{graphicx,listings} \IfFileExists{upquote.sty}{\RequirePackage{upquote}}{} \ifthenelse{\boolean{swe...@gin}}{\setkeys{gin}{width=0.8\textwidth}}{}% \ifthenelse{\boolean{swe...@ae}}{% \RequirePackage[T1]{fontenc} \RequirePackage{ae} }{}% \lstnewenvironment{Sinput}{\lstset{language=R,basicstyle=\sl,texcl, commentstyle=\upshape}}{} \lstnewenvironment{Soutput}{\lstset{language=R}}{} \lstnewenvironment{Scode}{\lstset{language=R,basicstyle=\sl}}{} \newenvironment{Schunk}{}{} \newcommand{\Sconcordance}[1]{% \ifx\pdfoutput\undefined% \csname newcount\endcsname\pdfoutput\fi% \ifcase\pdfoutput\special{#1}% \else\immediate\pdfobj{#1}\fi} \documentclass{article} \begin{document} keep.source=TRUE= 1 / 2 # $\frac{1}{x}$ 4 + 4 # Here may come lots of explanations, that are in a \LaTeX\ paragraph\footnote{blabla}: even long lines are properly broken.\\ Though the new lines start at the beginning of the line. \\[6pt] And a line break in the chunk source will of course be interpreted as R again: so no new paragraphs inside the same comment. # But there can be new commented lines. 3 + 6 # Note that comment only lines at the end of a code chunk seem to be lost. # Not only one but all that aren't followed by R code @ \end{document} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using ``-'' in function argument
On 12/03/2010 06:54 AM, Berwin A Turlach wrote: On Thu, 2 Dec 2010 23:34:02 -0500 David Winsemiusdwinsem...@comcast.net wrote: [...] Erik is telling you that your use of ncol-4 got evaluated to 4 and that the name of the resulting object was ignored, howevert the value of the operation was passed on to matrix which used positional matching since = was not used. Sounds like a fair summary of what Erik said, but it is subtly wrong. R has lazy evaluation of its arguments. There is nothing that forces the assignment to be evaluated and to pass the result into the function. On the contrary, the assignment takes place when the function evaluates the argument. Let's say: as no argument name was given, the positional matching applied. And evaluation took place when argument no. 2 was required. Of course, you could give an argument name: matrix(ncol - 4) [,1] [1,]4 matrix(nrow = ncol - 4) [,1] [1,] NA [2,] NA [3,] NA [4,] NA Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Performance tuning tips when working with wide datasets
Dear Richard, Does anyone have any performance tuning tips when working with datasets that are extremely wide (e.g. 20,000 columns)? The obvious one is: use matrices – and take care that they don't get converted back to data.frames. In particular, I am trying to perform a merge like below: merged_data- merge(data1, data2, by.x=date,by.y=date,all=TRUE,sort=TRUE); This statement takes about 8 hours to execute on a pretty fast machine. The dataset data1 contains daily data going back to 1950 (20,000 rows) and has 25 columns. The dataset data2 contains annual data (only 60 observations), however there are lots of columns (20,000 of them). I have to do a lot of these kinds of merges so need to figure out a way to speed it up. I have tried a number of different things to speed things up to no avail. I've noticed that rbinds execute much faster using matrices than dataframes. However the performance improvement when using matrices (vs. data frames) on merges were negligible (8 hours down to 7). which is astonishing, as merge (matrix) uses merge.default, which boils down to merge(as.data.frame(x), as.data.frame(y), ...) I tried casting my merge field (date) into various different data types (character, factor, date). This didn't seem to have any effect. I tried the hash package, however, merge couldn't coerce the class into a data.frame. I've tried various ways to parellelize computation in the past, and found that to be problematic for a variety of reasons (runaway forked processes, doesn't run in a GUI environment, doesn't run on Macs, etc.). I'm starting to run out of ideas, anyone? Merging a 60 row dataset shouldn't take that long. Do I understand correctly that the result should be a 2 x 20025 matrix, where the additional 25 columns are from data2 and end up in the rows of e.g. every 1st of January? In that case, you may be much faster producing tmp - matrix (NA, 2, 2), fill the values of data2 into the correct rows, and then cbind data1 and tmp. Make sure you have enough RAM available: tmp is about 1.5 GB. If you manage to do this without swapping, it should be reasonably fast. If you end up writing a proper merge function for matrics, please let me know: I'd be interested in using it... Claudia Thanks, Richard __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot inside function does not work
Alex, this may be FAQ 7.22 Claudia On 11/22/2010 02:19 PM, Alaios wrote: Hello everyone, when I commit a plot using console(command line?) plot works fine. I have created a function that plots based on the input. This function is called plot_shad. When I call this function alone in the command line I get my plot. Then I tried to use another function as depicted_below to do some calculation before calling the function that does the plotting. plot_shad_map-function(f,CRagent,agentid){ � for (i in c(1:nrow(shad_map))){ ��� for (j in c(1:ncol(shad_map))){ # Do something ��� } � } � plot_shad_f(shad_map) # This plots fine when used in command line. But inside this #function does not � return(shad_map) } Unfortunately I get no plot . What might be the problem? One more question how to get more plots at the same time. It seems that when I issue a new plot replaces the old plot. I would like to thank you in advance for you help Regards Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a rounding option for \Sexpr{}?
Dear all, I don't like the current behaviour, but that change could break a lot of existing documents. Since you can easily wrap your Sexpr arguments in a call to whatever formatting function you want, why force all of those users to change their documents? I'm someone who would change a whole lot of \Sexpr{}s: I could get rid of all those round() and format()s... Currently almost don't use \Sexpr as I find the advantage of just having a tiny little R expression in the text is lost if half a line of formatting code is required. Particularly, as one has to be careful not to have a line break in the \Sexpr{} as Sweave doesn't recognize those. At the moment, I tend to use chunks with result=latex instead – which is not the nicest thing to read in the source as it breaks the flow of a sentence quite badly. But currently, it is much faster to type for me. On the other hand, maybe it's just about time to write a template/snippet for \Sexpr{format (, digits = 3)}... An alternative of course would be introducing a new kind of those commands. If that's going to happen, I'd vote for something really short like the brew syntax. But maybe I just didn't understand the advantage of \Sexpr{} and \VignetteXXX{} looking like Latex commands although they aren't (particularly as Latex source code highlighting without taking into account Sweave syntax is anyways messed up by $ in the \Sexpr{}. Also, very subjectively, I'd find a syntax with angle brackets more consistent as the code chunks start with angle brackets anyways. My 2 ct, Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] stacking consecutive columns
Dear Gregory, Is there an easier, cleaner way to do this? Thanks. There are of course several ways... (assuming yearmonth to be a data.frame) --- 1 --- year - colnames (yearmonth) [-1] year - gsub (^[^[:digit:]]*([[:digit:]]*[^[:digit:]]*$), \\1, year) year - as.numeric (year) month - yearmonth$month precip - as.matrix (yearmonth [, -1]) long.df - data.frame (month = rep (month, length (year)), year = rep (year, each = nrow (yearmonth)), precipitation = as.numeric (precip)) If you're about to do this more often: --- 2 --- package hyperSpec (outdated on CRAN, if you want to install it use the version on rforge) has a function array2df which helps with this transformation: long.df - array2df (precip, label.x = precipitation, levels = list (month = month, year = year) --- 3 --- depending on your file (are the column names numbers without the Xs?) you may be able to abuse a hyperSpec object to read your data easily: x - read.txt.wide (filename, ...more options...) then as.long.df (x) is about what you want. (You'd probably want to rename the columns) HTH Claudia Gregory A. Graves, Lead Scientist Everglades REstoration COoordination and VERification (RECOVER) Restoration Sciences Department South Florida Water Management District Phones: DESK: 561 / 682 - 2429 CELL: 561 / 719 - 8157 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Xapply question
Dear list, I'm stuck with looking for a function of the *apply family, which I suppose exists already – just I can't find it: What I'm looking for is somewhere between sweep and mapply that does a calculation vectorized over a matrix and a vector: It should work complementary to sweep: for each row of the matrix, a different value of the vector should be handed over. Close to mapply because I need to go through different variables in a parallel fashion (at the moment a matrix and a vector). Kind of a mapply that hands over array slices. Maybe it is easiest with an example. This loop does what I want, but A - matrix (rnorm (12), 3) A [,1][,2][,3] [,4] [1,] 0.1286 0.2888 -0.4435 -0.90966 [2,] -1.6000 -1.0884 1.3736 0.07754 [3,] 0.4581 1.5413 0.6133 -0.12131 v - 1 : 3 f - function (x, y) { # some function depending on vector x and skalar y +c (sum (x^2), y) + } result - matrix (NA, nrow = nrow (A), ncol = 2) for (r in 1 : nrow (A)) +result [r,] - f (A [r,], v [r]) result [,1] [,2] [1,] 1.1241 [2,] 5.6372 [3,] 2.9763 The matrix will easily be in the range of 1e4 - 1e5 rows x 1e2 - 1e3 columns, so I do not want to split it into a list and combine it afterwards. The reason why I ask for a function is partly also because I want to overload the functionality for a specific class and I don't think it's a good idea to invent a name for something that probably already exists. If this function does not exist, any ideas how I should call it? Thanks a lot, Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] arrays of arrays
Hi Sachin, I guess there are several different possibilities that are more or less handy depending on your data: - lists were mentioned already, and I think they are the most natural representation of ragged arrays. Also very flexible, e.g. you can introduce more dimensions. But they can get terribly slow and very memory consuming if you have many rows. - If you have many rows and they have almost the same number of elements, you may be better off using a normal matrix and setting the unused elements to NA. - There are also sparse matrices in package Matrix. I've never used them, but I guess they may be what you are after. This here: new(dgCMatrix , i = c(0L, 1L, 2L, 3L, 0L, 1L, 2L, 0L, 1L, 0L, 1L, 2L, 3L, 4L, 5L) , p = c(0L, 4L, 7L, 9L, 15L) , Dim = c(6L, 4L) , Dimnames = list(NULL, NULL) , x = c(0, 0, 1, 1, 1, 3, 5, 4, 4, 7, -1, 8, 9, 10, 6) , factors = list() ) Is the transposed of your example: 6 x 4 sparse Matrix of class dgCMatrix [1,] 0 1 4 7 [2,] 0 3 4 -1 [3,] 1 5 . 8 [4,] 1 . . 9 [5,] . . . 10 [6,] . . . 6 The numeric versions do not store the zeros, and will return 0 for for the elements marked with '.' in the print. You won't get any benefit from this representation in terms of memory (over a normal matrix) unless the total number of elements is smaller than nrow * max (elements per row) / 2 - nrow - some more overhead The Matrix () function will give you a hint: check whether it produces a dense or a sparse matrix. - if you are terribly tight with memory you'll program your own representation that just stores a vector of your values and start indices for each row. You index then with rowstart [i] + j Here's a comparison: # list l - structure(list(V1 = c(0, 0, 1, 1), V2 = c(1, 3, 5), V3 = c(4, 4), V4 = c(7, -1, 8, 9, 10, 6)), .Names = c(V1, V2, V3, V4)) str (l) List of 4 $ V1: num [1:4] 0 0 1 1 $ V2: num [1:3] 1 3 5 $ V3: num [1:2] 4 4 $ V4: num [1:6] 7 -1 8 9 10 6 object.size (l) 736 bytes # sparse matrix s - new(dgCMatrix , i = c(0L, 1L, 2L, 3L, 0L, 1L, 2L, 0L, 1L, 0L, 1L, 2L, 3L, 4L, 5L) , p = c(0L, 4L, 7L, 9L, 15L) , Dim = c(6L, 4L) , Dimnames = list(NULL, NULL) , x = c(0, 0, 1, 1, 1, 3, 5, 4, 4, 7, -1, 8, 9, 10, 6) , factors = list() ) s 6 x 4 sparse Matrix of class dgCMatrix [1,] 0 1 4 7 [2,] 0 3 4 -1 [3,] 1 5 . 8 [4,] 1 . . 9 [5,] . . . 10 [6,] . . . 6 object.size (s) 1640 bytes # there's a lot of overhead for the sparse matrix # matrix m - structure(c(0, 1, 4, 7, 0, 3, 4, -1, 1, 5, NA, 8, 1, NA, NA, 9, NA, NA, NA, 10, NA, NA, NA, 6), .Dim = c(4L, 6L)) m [,1] [,2] [,3] [,4] [,5] [,6] [1,]0011 NA NA [2,]135 NA NA NA [3,]44 NA NA NA NA [4,]7 -189 106 object.size (m) 392 bytes # own representation o - structure(c(0, 0, 1, 1, 1, 3, 5, 4, 4, 7, -1, 8, 9, 10, 6), rowstart = c(0, 4, 7, 9)) # index of end of row before saves subtracting 1 all the time o [1] 0 0 1 1 1 3 5 4 4 7 -1 8 9 10 6 attr(,rowstart) [1] 0 4 7 9 object.size (o) 352 bytes o [attr (o, rowstart) [2] + 3 ] [1] 5 Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] force apply not to drop the dimensions of FUN results ?
Dear Yves, You may not need to do more than set the dim attribute correctly: dim (test) - c (dim (myArray) [c (3 : 4, 1 : 2)] or dim (test) - c (dim (myArray) [c (4 : 3, 1 : 2)] Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prcomp function
I think PCA decomposes matrix A according to A'A, not to COV (A). But if A is centered then A'A = (n + 1) COV (A). So for non-centered A, you want to look at A'A instead: crossprod(A) %*% evec[,1] / (nrow (A) - 1) - eval [1] * evec [,1] [,1] [1,] 0.000e+00 [2,] 0.000e+00 [3,] 1.066e-14 If I'm telling crap, someone please correct me! Hope that helps, Claudia On 11/10/2010 02:41 PM, kicker wrote: Hello, I have a short question about the prcomp function. First I cite the associated help page (help(prcomp)): Value: ... SDEV the standard deviations of the principal components (i.e., the square roots of the eigenvalues of the covariance/correlation matrix, though the calculation is actually done with the singular values of the data matrix). ROTATION the matrix of variable loadings (i.e., a matrix whose columns contain the eigenvectors). The function princomp returns this in the element loadings. ... Now please take a look at the following easy example: first I define a matrix A A-matrix(c(0,1,4,1,0,3,4,3,0),3,3) then I apply PCA on A trans-prcomp(A,retx=T,center=F,scale.=F,tol=NULL) eval-trans$sdev*trans$sdev #eval is the vector of the eigenvalues of cov(A) (according to the cited help text above) evec-trans$rotation #evec is the matrix with the eigenvectors of cov(A) as columns (according to the cited help text above) now the eigenvalue equation should be valid, i.e. it should hold cov(A)%*%ev[,1]=ew[1]*ev[,1]. But it doesn´t, my result: cov(A)%*%ev[,1]= t(-0.8244927, -0.8325664,0.8244927) ew[1]*ev[,1]=t(-8.695427,-7.129314,-10.194816) So my question is : why does the eigenvalue equation not hold ? The eigenvalue equation holds when I set center=T in the options of the prcomp function. But as far as I know and as I understand the help text it should have no influence on the eigenvalue equation whether the data are centered or not. I know about the advantages of centered date but I want to understand how the prcomp function works in the case of uncentered data. Thank you very much for your efforts. -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R and Matlab
Dear Henrik, sorry for bothering you with a report hastily pasted together and not particularly nice for you as I used my toy data flu from a non-standard package. I should have better used e.g. the iris. I'm aware that writeMat doesn't deal with S4 objects. In fact, if I'd overlook the error message, there's the 2nd chance to see that the file size is 0B. In fact the attempt to save flu directly was a classical autopilot error, that's why I tried to save the x afterwards. So the problem here was the unnamed storing of x. I intentionally do not try to infer the name x from writeMat(flu.mat, x), basically because I think using substitute() should be avoided as far as possible, but also because it is unclear what the name should be in cases such as writeMat(flu.mat, 1:10). I was just going to suggest a patch that assigns the names of type Vnumber to the unnamed objects - but when I wanted to get the source I realized your version with the warning is already out. I think, however, you may forgot a nchar?: any (nchar (names) == 0) So here's my suggestion for l. 775-777 of writeMat.R: if (is.null(names) || any (nchar (names) == 0L)) { names [nchar (names) == 0L] - paste (V, which (nchar (names) == 0L), sep = ) names (args) - names warning(All objects written have to be named, e.g. use writeMat(..., x=a, y=y) and not writeMat(..., x=a, y): , deparse(sys.call()), \nDummy names have been assigned.); } After all, e.g. data.frame () will also rather create dummy names for unnamed columns. And, I think, a warning should make the user aware that he's doing something that _may_ not work out as intendet. But here I think it is _most likely_ not working as intended. MISCELLANEOUS: Note that writeMat() cannot write compressed MAT files. It is documented in help(readMat), and will be so in help(writeMat) in the next release. Package Rcompression, loaded or not, has no effect on writeMat(). It is only readMat() that can read them, if Rcompression is installed. You do not have to load it explicitly/yourself - if readMat() detects a compress MAT file, it will automatically try to load it; OK, good to know. Thanks a lot for your explanation in spite of my bad report. Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R and Matlab
I am looking for ways to use R and Matlab. Doing the data transformations in R and using the data in Matlab to analyze with some pre-defined scripts. Any good ways to transfer the data into matlab in its most recent version? I tried using R.matlab but the writeMat output is not readable by Matlab. It used to work, but I didn't need it for quite a while (a year or so ago, and with Matlab either 2007 or 2008a). I just tried, and neither does it work for me. You should notify the maintainer of R.matlab and include an example (code and data, e.g. with dput). I noticed that library (R.matlab) does not load the Rcompression package, but also after library (Rcompression), the resulting file was not read by Matlab. I tried loading a saved data.frame in Matlab 2008b on an Win XP computer: it doesn't find any variables inside the .mat file (and whos -file ...) doesn't show a variable. The other way round with a stupid little vector it worked. An R session (with only the 2nd try, after library (Rcompression)) is attached below. I just need to output a data.frame and read it as is into matlab where I can do any needed transformations on the variables. If you need to transfer the data right NOW, there's always csv. Claudia library (hyperSpec) Loading required package: lattice Package hyperSpec, version 0.95 To get started, try vignette (introduction, package = hyperSpec) package?hyperSpec vignette (package = hyperSpec) If you use this package please cite it appropriately. citation(hyperSpec) will give you the correct reference. The project is hosted on http://r-forge.r-project.org/projects/hyperspec/ sessionInfo () R version 2.12.0 (2010-10-15) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C LC_TIME=en_US.utf8 [4] LC_COLLATE=en_US.utf8 LC_MONETARY=C LC_MESSAGES=en_US.utf8 [7] LC_PAPER=en_US.utf8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=CLC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] hyperSpec_0.95lattice_0.19-13 R.matlab_1.3.3R.oo_1.7.4 R.methodsS3_1.2.1 loaded via a namespace (and not attached): [1] grid_2.12.0 library (Rcompression) x = flu[[]] writeMat (flu.mat, flu) Error in dim(x) - length(x) : invalid first argument writeMat (flu.mat, x) sessionInfo () R version 2.12.0 (2010-10-15) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C LC_TIME=en_US.utf8 [4] LC_COLLATE=en_US.utf8 LC_MONETARY=C LC_MESSAGES=en_US.utf8 [7] LC_PAPER=en_US.utf8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=CLC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Rcompression_0.8-0 hyperSpec_0.95 lattice_0.19-13R.matlab_1.3.3 R.oo_1.7.4 [6] R.methodsS3_1.2.1 loaded via a namespace (and not attached): [1] grid_2.12.0 -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R and Matlab
On 10/28/2010 03:16 PM, Thomas Levine wrote: Is there a particular reason you can't use csv? (Not sure whether I'm meant - as I also suggested csv to Santosh) But: - It used to work, so there may be code existing that is broken now (e.g. I do have such code, but at least for the moment it doesn't matter). Thus the information may very well be of interest for the maintainer. - csv is fine for a matrix (or a vector) or a data.frame. How about arrays, lists, more than one variable? I think the default file format changed to v7.3 (though I'm not sure whether that is just for large variables). Unfortunately -v switch of load (that used e.g. to allow reading of V4 files) is gone, and I can't see anything to specify the .mat file format version. The curious thing is that readMat does accept the file produced by Matlab 2008b. If it is a matter of writeMat writing an old file format, I'd have expected that rather load should still be able to read the writeMat generated file than readMat being able to read Matlab's .mat file. my 2 ct Claudia write.csv() in R It seems that you can read csv in Matlab with this http://www.mathworks.com/help/techdoc/ref/importdata.html Tom 2010/10/28 Claudia Beleitescbelei...@units.it: I am looking for ways to use R and Matlab. Doing the data transformations in R and using the data in Matlab to analyze with some pre-defined scripts. Any good ways to transfer the data into matlab in its most recent version? I tried using R.matlab but the writeMat output is not readable by Matlab. It used to work, but I didn't need it for quite a while (a year or so ago, and with Matlab either 2007 or 2008a). I just tried, and neither does it work for me. You should notify the maintainer of R.matlab and include an example (code and data, e.g. with dput). I noticed that library (R.matlab) does not load the Rcompression package, but also after library (Rcompression), the resulting file was not read by Matlab. I tried loading a saved data.frame in Matlab 2008b on an Win XP computer: it doesn't find any variables inside the .mat file (and whos -file ...) doesn't show a variable. The other way round with a stupid little vector it worked. An R session (with only the 2nd try, after library (Rcompression)) is attached below. I just need to output a data.frame and read it as is into matlab where I can do any needed transformations on the variables. If you need to transfer the data right NOW, there's always csv. Claudia library (hyperSpec) Loading required package: lattice Package hyperSpec, version 0.95 To get started, try vignette (introduction, package = hyperSpec) package?hyperSpec vignette (package = hyperSpec) If you use this package please cite it appropriately. citation(hyperSpec) will give you the correct reference. The project is hosted on http://r-forge.r-project.org/projects/hyperspec/ sessionInfo () R version 2.12.0 (2010-10-15) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C LC_TIME=en_US.utf8 [4] LC_COLLATE=en_US.utf8 LC_MONETARY=C LC_MESSAGES=en_US.utf8 [7] LC_PAPER=en_US.utf8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=CLC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] hyperSpec_0.95lattice_0.19-13 R.matlab_1.3.3R.oo_1.7.4 R.methodsS3_1.2.1 loaded via a namespace (and not attached): [1] grid_2.12.0 library (Rcompression) x = flu[[]] writeMat (flu.mat, flu) Error in dim(x)- length(x) : invalid first argument writeMat (flu.mat, x) sessionInfo () R version 2.12.0 (2010-10-15) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C LC_TIME=en_US.utf8 [4] LC_COLLATE=en_US.utf8 LC_MONETARY=C LC_MESSAGES=en_US.utf8 [7] LC_PAPER=en_US.utf8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=CLC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Rcompression_0.8-0 hyperSpec_0.95 lattice_0.19-13R.matlab_1.3.3 R.oo_1.7.4 [6] R.methodsS3_1.2.1 loaded via a namespace (and not attached): [1] grid_2.12.0 -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127
Re: [R] clustering on scaled dataset or not?
John, Hi, just a general question: when we do hierarchical clustering, should we compute the dissimilarity matrix based on scaled dataset or non-scaled dataset? daisy() in cluster package allow standardizing the variables before calculating dissimilarity matrix; I'd say that should depend on your data. - if your data is all (physically) different kinds of things (and thus different orders of magnitude), then you should probably scale. - On the other hand, I cluster spectra. Thus my variates are all the same unit, and moreover I'd be afraid that scaling would blow up noise-only variates (i.e. the spectra do have low or no intensity regions), thus I usually don't scale. - It also depends on your distance. E.g. Mahalanobis should do the scaling by itself, if think correctly at this time of the day... What I do frequently, though, is subtracting something like the minimum spectrum (in practice, I calculate the 5th percentile for each variate - it's less noisy). You can also center, but I'm strongly for having a physical meaning, and for my samples that's the minimum spectrum is better interpretable (it represents the matrix composition). but dist() doesn't have that option at all. Appreciate if you can share your thoughts? but you could call scale () and then dist (). Claudia Thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random Forest AUC
Dear List, Just curiosity (disclaimer: I never used random forests till now for more than a little playing around): Is there no out-of-bag estimate available? I mean, there are already ca. 1/e trees where a (one) given sample is out-of-bag, as Andy explained. If now the voting is done only over the oob trees, I should get a classical oob performance measure. Or is the oob estimate internally used up by some kind of optimization (what would that be, given that the trees are grown till the end?)? Hoping that I do not spoil the pedagogic efforts of the list in teaching Ravishankar to do his homework reasoning himself... Claudia Am 23.10.2010 20:49, schrieb Changbin Du: I think you should use 10 fold cross validation to judge your performance on the validation parts. What you did will be overfitted for sure, you test on the same training set used for your model buliding. On Sat, Oct 23, 2010 at 6:39 AM, mxkuhnmxk...@gmail.com wrote: I think the issue is that you really can't use the training set to judge this (without resampling). For example, k nearest neighbors are not known to over fit, but a 1nn model will always perfectly predict the training data. Max On Oct 23, 2010, at 9:05 AM, Liaw, Andyandy_l...@merck.com wrote: What Breiman meant is that as the model gets more complex (i.e., as the number of trees tends to infinity) the geneeralization error (test set error) does not increase. This does not hold for boosting, for example; i.e., you can't boost forever, which nececitate the need to find the optimal number of iterations. You don't need that with RF. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of vioravis Sent: Saturday, October 23, 2010 12:15 AM To: r-help@r-project.org Subject: Re: [R] Random Forest AUC Thanks Max and Andy. If the Random Forest is always giving an AUC of 1, isn't it over fitting??? If not, how do you differentiate this from over fitting??? I believe Random forests are claimed to never over fit (from the following link). http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.hthttp://www.stat.berkeley.edu/%7Ebreiman/RandomForests/cc_home.ht m#features Ravishankar R -- View this message in context: http://r.789695.n4.nabble.com/Random-Forest-AUC-tp3006649p3008157.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attachme...{{dropped:11}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] superscript characters in title with '+'
On 10/22/2010 03:15 PM, DrCJones wrote: Hi, Thanks for all of your replies! David, a slightly modified version of what you gave did the trick: hist(X,main = expression([*Ca**^paste(2,+)*]i~'onsets')) here you put the 2+ into the superscript of a superscript. compare these four: hist(X,main = expression(Ca**2~Ca^2~Ca**^2~Ca^^2)) ** and ^ both mean power But I prefer the way '2+' is italicized in the solution Dennis gave: hist(X, main = bquote('[Ca'^'2+'*']i'~'onsets'), xlab = 'sec') I think I understand it now - the '^' symbol must be followed by the '*' symbol to signify the end of font italicization; no, the ^ is the power function infix operator, as in x^2 = x². * is multiplication, but the multiplication is written without dot: x * y = x ⋅ y = xy It is abused here to connect terms into one expression. and '~' must be used to signify spaces. yes The only thing I still don't get is why square brackets rather than quotation marks surround the 'i' in the solution Claudia gave: hist(X, main=expression([ * Ca^2+ * ] [i]~'onsets'), xlab = 'sec') the square bracket operator marks indices, which are written as subscript. So x[i] produces what in LaTeX would be x_i Being a chemist, it seemed natural to me to put the i after the concentration brackets into a subscript - though you didn't say you want that. A more correct expression would be: group ([, Ca^'2+', ]) [i]~onsets where you can easier see that the [ and ] are special left and right delimiters. Note that the only term that needs to be hidden as character is the charge, as R doesn't know this way of writing ion charges and supposes + to be an infix operator. Cheers, Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Accuracy/Goodness of fit of nnet
Raji, you first need to tell us what kind of accuracy you mean. The term accuracy has different meanings in different areas of science. However, in classification it usually refers to something along the line number of correctly predicted samples / total number of samples (possibly weighted according to the number of samples per class). Procedures: You can calculate that for different types of test samples: - prediction of the training samples gives you a goodness of fit. If you have (too) many variates you have in your model, this measure is close to useless. Useless, because most people are not interested in the goodness of fit anyways but want to know the performance for new samples. - prediction of unknown (statistically independent) samples: this is usually what is of interest. You may use resampling schemes (out-of-bootstrap Co., (iterated) cross validation). There's package boot (though I never used it as it does not properly fit my data) - Resampling schemes usually cannot tell you the performance for /future/ samples: for that you need a test set that is acquired later (and as close as possible to the real data to predict). You need to do this if you want to take into account things like instrument drift etc. There's tons of literature around, what to do depends somewhat on your field. I can point you to chemometric literature. Calculating: - package ROCR calculates all sorts of classifier performance measures for binary classification . - I'm developing a package that gives performance measures directly for continuous predictions (such as predict.mulitnom with type = probs). You are welcome to be a test user: just let me know if you want to try it out. Hope that helps, Claudia On 10/21/2010 05:37 AM, Raji wrote: Hi R-Helpers , am working on nnet package.Multinom() has an option for finding the goodness of fit by giving the AIC value. Does nnet also gives some value to determine the accuracy. If not, can you guide me with some procedure to figure out the accuracy/goodness of fit of nnet model? Thanks in advance. -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about Density Plot
Dear Ignacio, if you want it hexagonal (as I gather from the hexbin_demo, have a look at the hexbin package. Otherwise, lattice's levelplot is your friend. Or, if you prefer ggplot: geom_tile or geom_hex. UIf you play a bit with findFn from package sos, e.g. findFn (plot 2d density) findFn (plot 2d histogram) you'll find more related functions. Claudia I've attached an example about something I want to do in R. This example was done in a Fortran application called ASGL. Here's an example in matplotlib http://matplotlib.sourceforge.net/examples/pylab_examples/hexbin_demo.html Basically, it's like a scatter plot, but have several additional things. One thing are the grids inside the graph, and the other is a density bar used as a reference to evaluate the frequency of the points. The command that I've always used in R for scatter plots is. plot(l1, l2) I need to know if there is something similar in a library of R, or if I could implement it on my own. Greetings Ignacio __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] confusion matrix
Dear Greg, If it is only the NA that worries you: function table can deal with that. ? table and: example (table) If you want to make a confusion matrix that works also with fractional answers (e.g. 50% A, 50% B, a.k.a soft classification) then you can contact me and become test user of a package that I'm just writing (you can also wait until it is published to CRAN, but that will take a while). Best regards, Claudia Gregory Ryslik wrote: Hi Everyone, In follow up to my previous question, I wrote some code that correctly makes a confusion matrix as I need it. However, it only works when the numbers are between 1 and n. If the possible outcomes are between 0 and n, then I can't reference row 0 of the matrix and the code breaks. Does anyone have any easy fixes for this? I've attached the entire code to this email. As always, thank you for your help! Greg Code: answers-matrix(c(4,2,1,3,2,1),nrow =6) mat1- matrix(c(3,3,4,NA,4,2),nrow = 6) mat2-matrix(c(3,2,1,4,2,3),nrow = 6) mat3-matrix(c(4,2,2,2,1,1),nrow = 6) mat4-matrix(c(4,2,1,3,1,4),nrow = 6) mat5-matrix(c(2,3,1,4,2,3),nrow = 6) matrixlist- list(mat1,mat2,mat3,mat4,mat5) predicted.values- matrix(unlist(matrixlist),nrow = dim(mat1)[1]) confusion.matrix-matrix(0, nrow = length(as.vector(unique(answers))),ncol = length(as.vector(unique(answers for(i in 1:dim(predicted.values)[1]){ for(j in 1: dim(predicted.values)[2]){ predicted.value- predicted.values[i,j] if(!is.na(predicted.value)){ true.value- answers[i,] confusion.matrix[true.value, predicted.value] - confusion.matrix[true.value,predicted.value]+1 } } } class.error- diag(1- prop.table(confusion.matrix,1)) confusion.matrix-cbind(confusion.matrix,class.error) confusion.data.frame-as.data.frame(confusion.matrix) names(confusion.data.frame)[1:length(as.vector(unique(answers)))]- 1:length(as.vector(unique(answers))) names(confusion.data.frame)[length(as.vector(unique(answers)))+1]- class.error [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] confusion matrix
Gregory Ryslik wrote: Hi, Thank you for the help! Would this imply then that if my answers and predicted are both matrices, I need to first make them into factors? I was hoping to avoid that step... Why are they matrices? What is the additional dimension? And: what should become of the additional dimension? with 2d reference and prediction, do you want to produce 3d or 4d confusion matrices? Thank you again! You are welcome. Claudia Kind regards, Greg On Oct 8, 2010, at 10:04 AM, Claudia Beleites wrote: Gregory Ryslik wrote: Hi, I played with the table option but I seem to be only able to give me counts for numbers that exist. For example, if I don't have any 4's that are predicted, that number is skipped! Well, you need to tell the function that there _could_ be a 4 : ref - factor (1 : 3) ref - factor (1 : 4) pred - factor (c (1 : 3, 1), levels = levels (ref)) ref [1] 1 2 3 4 Levels: 1 2 3 4 pred [1] 1 2 3 1 Levels: 1 2 3 4 table (ref, pred) pred ref 1 2 3 4 1 1 0 0 0 2 0 1 0 0 3 0 0 1 0 4 1 0 0 0 Claudia Thanks, Greg Sent via BlackBerry by ATT -Original Message- From: Claudia Beleites cbelei...@units.it Date: Fri, 08 Oct 2010 15:38:31 To: Gregory Ryslikrsa...@comcast.net Cc: R Helpr-help@r-project.org Subject: Re: [R] confusion matrix Dear Greg, If it is only the NA that worries you: function table can deal with that. ? table and: example (table) If you want to make a confusion matrix that works also with fractional answers (e.g. 50% A, 50% B, a.k.a soft classification) then you can contact me and become test user of a package that I'm just writing (you can also wait until it is published to CRAN, but that will take a while). Best regards, Claudia Gregory Ryslik wrote: Hi Everyone, In follow up to my previous question, I wrote some code that correctly makes a confusion matrix as I need it. However, it only works when the numbers are between 1 and n. If the possible outcomes are between 0 and n, then I can't reference row 0 of the matrix and the code breaks. Does anyone have any easy fixes for this? I've attached the entire code to this email. As always, thank you for your help! Greg Code: answers-matrix(c(4,2,1,3,2,1),nrow =6) mat1- matrix(c(3,3,4,NA,4,2),nrow = 6) mat2-matrix(c(3,2,1,4,2,3),nrow = 6) mat3-matrix(c(4,2,2,2,1,1),nrow = 6) mat4-matrix(c(4,2,1,3,1,4),nrow = 6) mat5-matrix(c(2,3,1,4,2,3),nrow = 6) matrixlist- list(mat1,mat2,mat3,mat4,mat5) predicted.values- matrix(unlist(matrixlist),nrow = dim(mat1)[1]) confusion.matrix-matrix(0, nrow = length(as.vector(unique(answers))),ncol = length(as.vector(unique(answers for(i in 1:dim(predicted.values)[1]){ for(j in 1: dim(predicted.values)[2]){ predicted.value- predicted.values[i,j] if(!is.na(predicted.value)){ true.value- answers[i,] confusion.matrix[true.value, predicted.value] - confusion.matrix[true.value,predicted.value]+1 } } } class.error- diag(1- prop.table(confusion.matrix,1)) confusion.matrix-cbind(confusion.matrix,class.error) confusion.data.frame-as.data.frame(confusion.matrix) names(confusion.data.frame)[1:length(as.vector(unique(answers)))]- 1:length(as.vector(unique(answers))) names(confusion.data.frame)[length(as.vector(unique(answers)))+1]- class.error [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ROCR predictions
Dear Assa, I am having a problem building a ROC curve with my data using the ROCR package. I have 10 lists of proteins such as attached (proteinlist.xls). each of the your file didn't make it to the list. lists was calculated with a different p-value. The goal is to find the optimal p-value for the highest number of true positives as well as lowaest number of false positives. As far as I understood the explanations from the vignette of ROCR, my data of TP and FP are the labels of the prediction function. But I don't know how to assign the right predictions to these labels. I assume the p-values are different cutoffs that you use for hardening (= making yes/no predictions) from some soft (= continuous class membership) output of your classifier. Usually, ROCR calculates the curves as function of the cutoff/threshold itself from the continuos predictions. If you have these soft predictions, let ROCR do the calculation for you. If you don't have them, ROCR can calculate your characteristics (sens, spec, precision, recall, whatever) for each of the p-values. While you could combine the results by hand into a ROCR-performance object and let ROCR do the plotting, it is then probably easier if you plot directly yourself. Don't be shy to look into the prediction and performance objects, I find them pretty obvious. Maybe start with the objects produced by the examples. Also, note ROCR works with binary validation data only. If your data has more than one class, you need to make two-class-problems first (e.g. protein xy ./. not protein xy). BTW, Is there a way of finding the optimum in the curve? I mean to find the exact value in the ROC curve (see sheet 2 in the excel file for the ROC curve). Someone asked for optimum on ROC a couple of months ago, RSiteSearch on the mailing list with ROC and optimal or optimum should get you answers. I would like to thank for any help in advance You're welcome. Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ROCR data input
Anneley, Sorry, I'm new to R, and relatively new to statistics too so I'm still a bit unclear. That's OK - everyone started some time and was new. However, it is really important to post a reproducible example here. If you are so new that you don't know how to do that exactly, you should probably write into your email that you tried but don't know how to do. Your chances to get an answer will probably increase quite a bit by that. Also, I'd suggest you to go thoroughly through some introduction for R. There's a lot available on cran, the web and in many libraries. E.g. a collection divided into more or less than 100 pages http://cran.r-project.org/other-docs.html r-project.org also has links to books, and to non-english material. The values in the post were only a sample of around 8400 rows. The label has 1 or 0 (I thought this was the two classes needed). yes. Each label row has an equivalent probability. This is the data that I output from the logistic regression analysis, but it is seemingly not the right format for ROC curve analysis. It is the right format. There is a difference in how R displays the data, when I type ROCR.simple it is in the format: $predictions [1] 0.612547843 0.364270971 0.432136142... $labels [1] 1 1 0 0 0 1 1 1 1 0 1 0 1 0 0 0 1 1 1 0 0 0 0 ... etc. whereas mine is in columns, e.g. ID, labels, probs 8930 0 0.00070 8931 0 0.00036 8932 1 0.0 8933 1 0.2 8934 0 0.1 etc. Look up the difference between list and data.frame. Also: you can find out a lot about variables with class () and str (), and maybe summary () That is why I think it is a format issue, but being new to R, I'm not sure what I need to do to rectify it. I have attached the text file if this helps. No, we don't need it to reproduce your error - I think it's all more or less about typos: prediction(prob$probabilities, prob$label) Error in prediction(prob$probabilities, prob$label) : Number of classes is not equal to 2. ROCR currently supports only evaluation of binary classification tasks. Now, if you need to trace down such an error, it is really a good idea to check what the arguments are that you hand over: As many errors come from typos, it is a good idea to copy and paste literally what you put into the function: prob$probabilities [1] prob$probabilities prob$label [1] prob$label See the difference between what your argument evaluates to and what you thought to hand over? Does this get you on the right track? I don't want to be nasty, but if you discover the mistakes yourself, you'll be much faster finding such things next time. So: try with these hints, and if it doesn't work, you can ask again. HTH, Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ROCR predictions
Dear Assa, you need to call prediction with continuous predictions and a _binary_ true class label. You are the only one who can tell whether the p-values are actually predictions and what the class labels are. For the list readers p is just the name of whatever variable, and you didn't even vaguely say what you try to classify, nor did you offer any explanation of what the columns are. The only information we get from your table is that p-value has small and continuous values. From what I see the p-values could also be fitting errors of the predictions (e.g. expressed as a probability that the similarity to the predicted class is random). Claudia Assa Yeroslaviz wrote: Dear Claudia, thank you for your fast answer. I add again the table of the data as an example. Protein ID Pfam Domain p-value Expected Is Expected True Postive False Negative False Positive True Negative NP_11.2 APH 1.15E-05APH TRUE1 0 0 0 NP_11.2 MutS_V 0.0173 APH FALSE 0 0 1 0 NP_62.1 CBS 9.40E-08CBS TRUE1 0 0 0 NP_66.1 APH 3.83E-06APH TRUE1 0 0 0 NP_66.1 CobU0.009 APH FALSE 0 0 1 0 NP_66.1 FeoA0.3975 APH FALSE 0 0 1 0 NP_66.1 Phage_integr_N 0.0219 APH FALSE 0 0 1 0 NP_000161.2 Beta_elim_lyase 6.25E-12Beta_elim_lyase TRUE1 0 0 0 NP_000161.2 Glyco_hydro_6 0.002 Beta_elim_lyase FALSE 0 0 1 0 NP_000161.2 SurE0.0059 Beta_elim_lyase FALSE 0 0 1 0 NP_000161.2 SapB_2 0.0547 Beta_elim_lyase FALSE 0 0 1 0 NP_000161.2 Runt0.1034 Beta_elim_lyase FALSE 0 0 1 0 NP_000204.3 EGF 0.004666118 EGF TRUE1 0 0 0 NP_000229.1 PAS 3.13E-06PAS TRUE1 0 0 0 NP_000229.1 zf-CCCH 0.2067 PAS FALSE 0 1 1 0 NP_000229.1 E_raikovi_mat 0.0206 PAS FALSE 0 0 0 0 NP_000388.2 NAD_binding_1 8.21E-24NAD_binding_1 TRUE1 0 0 0 NP_000388.2 ABM 1.40E-08NAD_binding_1 FALSE 0 0 1 0 NP_000483.3 MMR_HSR11.98E-05MMR_HSR1TRUE1 0 0 0 NP_000483.3 DEAD2.30E-05MMR_HSR1FALSE 0 0 1 0 NP_000483.3 APS_kinase 1.80E-09MMR_HSR1FALSE 0 0 1 0 NP_000483.3 CbiA0.0003 MMR_HSR1FALSE 0 0 1 0 NP_000483.3 CoaE1.28E-07MMR_HSR1FALSE 0 0 1 0 NP_000483.3 FMN_red 4.61E-08MMR_HSR1FALSE 0 0 1 0 NP_000483.3 Fn_bind 0.3855 MMR_HSR1FALSE 0 0 1 0 NP_000483.3 Invas_SpaK 0.2431 MMR_HSR1FALSE 0 0 1 0 NP_000483.3 PEP-utilizers 0.127 MMR_HSR1FALSE 0 0 1 0 NP_000483.3 NIR_SIR_ferr0.1661 MMR_HSR1FALSE 0 0 1 0 NP_000483.3 AAA 0.0031 MMR_HSR1FALSE 0 0 1 0 NP_000483.3 DUF448 0.0021 MMR_HSR1FALSE 0 0 1 0 NP_000483.3 CBF_beta0.1201 MMR_HSR1FALSE 0 0 1 0 NP_000483.3 zf-C3HC40.0959 MMR_HSR1FALSE 0 0 1 0 NP_000560.5 ig 5.69E-39ig TRUE1 0 0 0 NP_000704.1 Epimerase 4.40E-21Epimerase TRUE1 0 0 0 NP_000704.1 Lipase_GDSL 6.63E-11Epimerase FALSE 0 0 1 0 ... this is a shorted list from one of the 10 lists I have for different p-values. As you can see I have separate p-value experiments and probably need to calculate for each of them a separate ROC. But I don't know how to calculate these characteristics for the p-values. How do I assign the predictions to each of the single p-value experiments? I would appreciate any help Thanks Assa On Tue, Aug 17, 2010 at 12:55, Claudia Beleites cbelei...@units.it mailto:cbelei...@units.it wrote: Dear Assa, I am having a problem building a ROC curve with my data using the ROCR package. I have 10 lists of proteins such as attached (proteinlist.xls). each of the your file didn't make it to the list. lists was calculated with a different p-value. The goal is to find the optimal p-value for the highest number of true positives as well as lowaest number of false positives
Re: [R] cacheSweave / pgfSweave driver for package vignette
Dear all, Maybe we should move the discussion to r-devel? So please excuse the cross-posting, it is to tell people at r-help where to find the rest of the discussion (in case you agree with me). I've been wondering about that, too. Gabor, I use fake vignettes along your lines, too. In order to provide meaningful samples, I have both bulky data and bulky calculations (at least too long to have any fun in running R CMD check frequently). As I do not want to burden my package with lots ( 60 MB) of raw data in various file formats, two vignettes do their real work extra (and the source is available for separate download). So for the development work it would be good to have caching for speed-up. For the testing purposes of R CMD CHECK, however, the whole thing needs to be calculated: afaik the caching mechanism checks for changes in the respective chunks. Which is great for data-analysis work. However, in a package development scenario the changes are rather expected in the package. I suspect that the caching cannot check this. Thus a cached vignette does greatly reduce the calculation time, but also knocks out part of the testing. This would be without concern, if the package is well behaved and does its testing in the tests and has the vignettes as manuals. I have to admit, though, that my package is not (yet) at this point. So I personally find myself with a shell script that automatically builds all vignettes first, transfers some files into the package (the data sets coming with the package are constructed in vignettes), and then check and build the package. In the end, this dependency of the package on the results of its vignettes needs much more calculation. I'm talking of ca. 10 - 15 min for the whole process (i.e. 5 - 7 min for one check cycle). This is awkward for development, but I think it's OK for something to be done occasionally on a nightly check on the server. My conclusion is, that a cached Sweave driver should only be specified in certain situations. I.e. it would be very helpful for developing to do this at home, but I'm afraid it is not the best idea to reduce the work in checking the package in general (e.g. during nightly checks). I also say this because I have been running into trouble with the nighly build on r-forge (due to some LaTeX packages that I thought to be fairly standard, which they weren't). Another error I like to produce is to forget adding a new source file to the version control. Both cases are only found in checks during the nightly build on the server. There may be other mistakes that would be masked by the caching. Of course, it is also not nice to keep the servers calculating examples for hours. I presume, however, that this case is quite rare (compared to situations where the regular building and checking is too long for a fluent development cycle), and I'd say that in this case Gabor's procedure is OK. For my work it would be much more helpful, if R CMD CHECK had also positive flags (e.g. --tests as abbreviation for --no-codoc --no-examples --no-install --no-vignettes --no-latex) I know hardly anything about make files and never wrote one myself. I think they could be helpful here to switch between the development checks and a complete build check. So I'd be very curious to see some make files. HTH, Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sweep / mapply question
Dear list, I have a matrix, spc, that should row-wise be interpolated: E.g. spc - matrix (1:1e6, ncol = 250, nrow = 4000, byrow = TRUE) spc [1:10, 1:10] shifts - seq_len (nrow (spc)) wl - seq_len (ncol (spc)) interpolate - function (spc.row, shift, wl) spline (wl + shift, spc.row, xout = wl, method = natural)$y interpolate (spc [1,], shift = shifts [1], wl = wl) [1:10] # works naively, I wanted to use sweep to vectorize this: sweep (spc, 1, shifts, interpolate, wl = wl) This doesn't work, as sweep basically repeats the STATS in the correct way, and hands two matrices (arrays) to the function. While this is fine and fast for + - * / etc., but doesn't help my interpolation. Of course, I can calculate what I need: system.time ( t (mapply (interpolate, as.data.frame (t (spc)), shift = shifts, MoreArgs= list (wl = wl))) ) system.time ( sapply (1 : nrow (spc), function (i) interpolate (spc [i, ], shifts [i], wl = wl)) ) tmp - spc system.time ({ for (i in 1 : nrow (spc)) tmp [i,] - interpolate (spc [i, ], shifts [i], wl = wl) }) On my computer the for loop is fastest (slightly faster than sapply, a bit less than half of the time of mapply). However, as I expect this to be a fairly common situation, I want to share this experience, and the question is: is there a better / faster / nicer / more elegant way to do this? Comments? Thanks, Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sweave: infelicities with lattice graphics
Dear Michael, I know this situation from writing vignettes and I usually cheat a bit I redefine the functions along these lines: plotmap - function (...) print (hyperSpec:::plotmap (...)) (plotmap is a lattice-function for hyperSpec objects) plotmap can tehn be used without the print in the vignettes - this works fine for almost all cases. Only, if you have a structure that one of the redefined functions call another one of them, you get e.g. a pdf with 2 pages. Have a look at the vignettes and particularly the file vignettes.defs of package hyperSpec (https://r-forge.r-project.org/scm/viewvc.php/Vignettes/?root=hyperspec). BTW: I find it polite to mention that some definitions etc. are executed silently and where people can find it. Cheers, Claudia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Backslash \ in string
Jannis, You are right, it does not seem to matter. When the R string contains two \\, xtable prints it as only one \. I should have looked into the Latex output before posting! '\\' is just _one_ character in R: nchar (\\) [1] 1 Just like '\n' etc. It is just the `print`ed (as opposed to cat) output that mislead you: the print function displays a bunch of special characters in their backslash-escaped fashion: print (someting\tblah\\blubb\n) [1] someting\tblah\\blubb\n cat (someting\tblah\\blubb\n) sometingblah\blubb print (\12) [1] \n Claudia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sweave: infelicities with lattice graphics
Dear David, you can use Gin: \setkeys{Gin}{width=0.5\linewidth} just before the chunk that actually produces the figure. and, cool, I hadn't realized, that fig = TRUE, echo = FALSE= print ( chunk-with-lattice-function ) @ works. with {} inside the print there can be even more than one statement in the chunk-with-lattice-function. However, that's not a good idea. There may be surprises due to the question how often the chunk-with-lattice-function is actually executed. Claudia I have wondered about this too. The approach I use isn't pretty but does have a couple of advantages - there is only one set of code to run and I have control over the figure size. The first part of the code below is what is shown in the document (but not run), and the second part actually runs the code and makes the plot. no2hist, eval=FALSE= hist(mydata$no2) no2hist1, echo = FALSE, results=hide= pdf(no2hist.pdf) no2hist dev.off() @ \begin{figure} \centering \includegraphics[width=0.5\textwidth]{no2hist} \caption{The caption.} \label{fig:hist} \end{figure} I'd be interested to know if there are neater ways of doing this. Regards, David __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge
apropos (merge) Cheers, Claudia n.via...@libero.it wrote: Dear list I have two different data frame the first one is like this CLUSTER year variablevalue m1 2006 EC01 4 m1 2007 EC01 5 m2 2006 EC01 42 m2 2007 EC019 and other variables this data frame has 800 number of rows and 14 number of columns the second data frame has more or less the same structure CLUSTER year m1 2005 m1 2006 m1 2007 m2 2005 m2 2006 m2 2007 This data frame has 548833 number of rows and 18 number of columns What im trying to do is to merge the Year columns of the second data frame with the whole First data frame in order to get the following new data frame CLUSTER year variablevalue m1 2005 EC01 / m1 2006 EC01 4 m1 2007 EC01 5 m2 2005EC01/ m2 2006 EC0142 m2 2007 EC019 Someone could help me??? Thanks a lot __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logistic regression with 50 varaibales
Dear all, (this first part of the email I sent to John earlier today, but forgot to put it to the list as well) Dear John, Hi, this is not R technical question per se. I know there are many excellent statisticians in this list, so here my questions: I have dataset with ~1800 observations and 50 independent variables, so there are about 35 samples per variable. Is it wise to build a stable multiple logistic model with 50 independent variables? Any problem with this approach? Thanks First: I'm not a statistician, but a spectroscopist. But I do build logistic Regression models with far less than 1800 samples and far more variates (e.g. 75 patients / 256 spectral measurement channels). Though I have many measurements per sample: typically several hundred spectra per sample. Question: are the 1800 real, independent samples? Model stability is something you can measure. Do a honest validation of your model with really _independent_ test data and measure the stability according to what your stability needs are (e.g. stable parameters or stable predictions?). (From here on reply to Joris) Marcs explanation is valid to a certain extent, but I don't agree with his conclusion. I'd like to point out the curse of dimensionality(Hughes effect) which starts to play rather quickly. No doubt. The curse of dimensionality is easily demonstrated looking at the proximity between your datapoints. Say we scale the interval in one dimension to be 1 unit. If you have 20 evenly-spaced observations, the distance between the observations is 0.05 units. To have a proximity like that in a 2-dimensional space, you need 20^2=400 observations. in a 10 dimensional space this becomes 20^10 ~ 10^13 datapoints. The distance between your observations is important, as a sparse dataset will definitely make your model misbehave. But won't also the distance between groups grow? No doubt, that high-dimensional spaces are _very_ unintuitive. However, the required sample size may grow substantially slower, if the model has appropriate restrictions. I remember the recommendation of at least 5 samples per class and variate for linear classification models. I.e. not to get a good model, but to have a reasonable chance of getting a stable model. Even with about 35 samples per variable, using 50 independent variables will render a highly unstable model, Am I wrong thinking that there may be a substantial difference between stability of predictions and stability of model parameters? BTW: if the models are unstable, there's also aggregation. At least for my spectra I can give toy examples with physical-chemical explanation that yield the same prediction with different parameters (of course because of correlation). as your dataspace is about as sparse as it can get. On top of that, interpreting a model with 50 variables is close to impossible, No, not necessary. IMHO it depends very much on the meaning of the variables. E.g. for the spectra a set of model parameters may be interpreted like spectra or difference spectra. Of course this has to do with the fact, that a parallel coordinate plot is the more natural view of spectra compared to a point in so many dimensions. and then I didn't even start on interactions. No point in trying I'd say. If you really need all that information, you might want to take a look at some dimension reduction methods first. Which puts to my mind a question I've had since long: I assume that all variables that I know beforehand to be without information are already discarded. The dimensionality is then further reduced in a data-driven way (e.g. by PCA or PLS). The model is built in the reduced space. How much less samples are actually needed, considering the fact that the dimension reduction is a model estimated on the data? ...which of course also means that the honest validation embraces the data-driven dimensionality reduction as well... Are there recommendations about that? The other curious question I have is: I assume that it is impossible for him to obtain the 10^xy samples required for comfortable model building. So what is he to do? Cheers, Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ROC curve
Dear Changbin, I want to know how to select the optimal decision threshold from the ROC curve? Depends on what optimal means. I think there are a bunch of different criteria used: - point closest to the ideal model - point furthest from the guessing model - these criteria may include costs, i.e. a FP/FN ratio != 1 - ... More practical: If you use ROCR: the help of the performance class explains the slots in the object. You find there the data of the curve, incl. the thresholds. At what threshold will give the highest accuracy? to know that, optmize the accuracy as function of the threshold. Remember: finding the optimal threshold from a ROC curve is a data-driven optimization. You need to validate the resulting model with independent test data afterwards. -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help on getting help from manuals
ManInMoon wrote: Hi, A number of people have suggested I read the manuals... Could someone help me by telling me where the primary start point is please? In R, type help.start () this should open a browser window with links to - the packages - the manuals - a search engine Please note: this is written in section 1.7 Getting help with functions and features of Introduction to R In the same section, you learn about help.search Note also: ? help leads you to the man page describing the help system. In section see also you find a list of other useful commands to find help If you look them up and look a again what alternatives they suggest and actually try them out (again with topic help) you will come across all informations about finding help on R topics that is written in this email. - There also exists apropos (). - In addition, e.g. reading this mailing list, you learn about the sos package. - You can also use the internet resources: on r-project.org - manuals - I personally use a lot: http://finzi.psych.upenn.edu/cgi-bin/namazu.cgi (which is where RSiteSearch () gets you). You can nicely decide where to search: documentation of R and CRAN packages, and/or the mailing list archives. Homework try out read the results of: RSiteSearch (help) For example, I am interested in writing functions with variable number of arguments - where should I start to look? An introduction to R only show a brief example - with no pointer to where to find further data. I can't do ?xxx from R console in most cases - as I don't know what the function name is that I am looking for!!! Then do ??xxx or ???xxx (needs sos) or RSiteSearch (xxx) or apropos (xxx) ... which you could have found out by reading ? help People have helped me find substitute to get some metadata out - BUT how could I have found that without guidance from nice people in Nabble? Any help on this very much appreciated. Sometimes it _is_ difficult to find the correct search terms. However, I think that people in this list will appreciate if you - show that you did search before asking, and also tell then with which terms you did the search - particularly for questions about the meaning of commands: Try them out! Put the command into pieces and look what each piece does - people will appreciate if you ask what the correct search terms are for your problem (as opposed to ask them doing your homework) Learning R is learning a language. Including vocabulary (i.e. terms for the different concepts). Asking for help with searching is like asking How do you say in R for concept xyz? instead of Could anyone do the translation I got as homework? HTH, Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Getting multiple matrix-values using a single command
use a matrix of n x 2 to index. For details: sec. 5.3 Index matrices in the introduction. HTH Claudia Nils Rüfenacht wrote: Dear all! I'm trying to get multiple values from a matrix by using a single command. Given a matrix A A - matrix(seq(1,9),nrow=3,ncol=3) How can I get e.g. the values A[1,2] = 4 and A[3,3] = 9 with a single command and without using any loop? My first idea was to generate a row- and a column vector for the indices, i.e. c(1,3) indicating row number 1 (for A[1,2]) and row number 3 (for A[3,3]) and similar for column-indices. Then I've tried to call A[c(1,3),c(2,3)] but instead of 4 , 9 the result is [,1] [,2] [1,]47 [2,]69 Any suggestions? Regards, Nils __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data frame question
Andy, Did you run into any kind of trouble? I'm asking because I'm maintaining a package for spectroscopic data that heavily uses I (spectra.matrix) ... However, once you have the matrix safe inside the data.frame, you can delete the AsIs: a - matrix (1:9, 3) str (a) int [1:3, 1:3] 1 2 3 4 5 6 7 8 9 df - data.frame (a = I (a)) str (df) 'data.frame': 3 obs. of 1 variable: $ a: 'AsIs' int [1:3, 1:3] 1 2 3 4 5 6 7 8 9 df$a - unclass (df$a) str (df) 'data.frame': 3 obs. of 1 variable: $ a: int [1:3, 1:3] 1 2 3 4 5 6 7 8 9 df$a [,1] [,2] [,3] [1,]147 [2,]258 [3,]369 dim (df) [1] 3 1 However, I don't know whether something can now trigger a conversion to data.frame that the AsIs would have stopped. Cheers, Claudia apjawor...@mmm.com wrote: Hi, I have the following question about creating data frames. I want to create a data frame with 2 components: a vector and a matrix. Let me use a simple example: y - rnorm(10) x - matrix(rnorm(150), nrow=10) Now if I do dd - data.frame(x=x, y=y) I get a data frame with 16 colums, but if, according to the documentation, I do dd - data.frame(x=I(x), y=y) then str(dd) gives: 'data.frame': 10 obs. of 2 variables: $ x: AsIs [1:10, 1:15] 0.700073 -0.44371 -0.46625 0.977337 0.509786 ... $ y: num 0.4676 -1.4343 -0.3671 0.0637 -0.231 ... This looks and works OK. Now, there exists a CRAN package called pls. It has a yarn data set in it. data(yarn) str(yarn) 'data.frame': 28 obs. of 3 variables: $ NIR: num [1:28, 1:268] 3.07 3.07 3.08 3.08 3.1 ... ..- attr(*, dimnames)=List of 2 .. ..$ : NULL .. ..$ : NULL $ density: num 100 80.2 79.5 60.8 60 ... $ train : logi TRUE TRUE TRUE TRUE TRUE TRUE ... This looks almost the same, except the matrix component in my example has the AsIs instead of num. Is this just some older behavior of the data.frame function producing this difference? If not, how can I get my data frame (dd) to look like yarn? I read the help pages for data.frame and as.data.frame and found this paragraph If a list is supplied, each element is converted to a column in the data frame. Similarly, each column of a matrix is converted separately. This can be overridden if the object has a class which has a method for as.data.frame: two examples are matrices of class model.matrix (which are included as a single column) and list objects of class POSIXlt which are coerced to class POSIXct. If I do methods(as.data.frame) [1] as.data.frame.aovproj*as.data.frame.array [3] as.data.frame.AsIsas.data.frame.character [5] as.data.frame.complex as.data.frame.data.frame [7] as.data.frame.Dateas.data.frame.default [9] as.data.frame.difftimeas.data.frame.factor [11] as.data.frame.ftable* as.data.frame.integer [13] as.data.frame.listas.data.frame.logical [15] as.data.frame.logLik* as.data.frame.matrix [17] as.data.frame.model.matrixas.data.frame.numeric [19] as.data.frame.numeric_version as.data.frame.ordered [21] as.data.frame.POSIXct as.data.frame.POSIXlt [23] as.data.frame.raw as.data.frame.table [25] as.data.frame.ts as.data.frame.vector so it looks like there is a matrix method for as.data.frame. The question then is how can I override the default behavior for the matrix object (converting columns separately). Any hint will be appreciated, Andy __ Andy Jaworski 518-1-01 Process Laboratory 3M Corporate Research Laboratory - E-mail: apjawor...@mmm.com Tel: (651) 733-6092 Fax: (651) 736-3122 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data frame question
apjawor...@mmm.com wrote: Thanks for the quick reply. No, I did not run into any problems so far. I have been using the PLS package and the modelling functions seem to work just fine. In fact, even if I let the data.frame convert the x matrix to separate column, the y ~ x modeling syntax still seems to work fine. I don't see that behaviour: rm (x) # make sure there is no leftover x in the workspace mat - matrix (1 : 9, 3) df - data.frame (y = 1 : 3, x = mat) str (df) df coef (plsr (y ~ x, data = df, ncomp = 1)) # error coef (plsr (y ~ x.1 + x.2 + x.3, data = df, ncomp = 1)) # works df$x - I (-mat) str (df) df coef (plsr (y ~ x, data = df, ncomp = 1)) # works Claudia PS: May I be curious: what kind of data do you analyze with PLS? Thanks again, Andy __ Andy Jaworski 518-1-01 Process Laboratory 3M Corporate Research Laboratory - E-mail: apjawor...@mmm.com Tel: (651) 733-6092 Fax: (651) 736-3122 From: Claudia Beleites cbelei...@units.it To: apjawor...@mmm.com Cc: r-help@r-project.org Date: 03/12/2010 02:13 PM Subject:Re: [R] Data frame question Andy, Did you run into any kind of trouble? I'm asking because I'm maintaining a package for spectroscopic data that heavily uses I (spectra.matrix) ... However, once you have the matrix safe inside the data.frame, you can delete the AsIs: a - matrix (1:9, 3) str (a) int [1:3, 1:3] 1 2 3 4 5 6 7 8 9 df - data.frame (a = I (a)) str (df) 'data.frame': 3 obs. of 1 variable: $ a: 'AsIs' int [1:3, 1:3] 1 2 3 4 5 6 7 8 9 df$a - unclass (df$a) str (df) 'data.frame': 3 obs. of 1 variable: $ a: int [1:3, 1:3] 1 2 3 4 5 6 7 8 9 df$a [,1] [,2] [,3] [1,]147 [2,]258 [3,]369 dim (df) [1] 3 1 However, I don't know whether something can now trigger a conversion to data.frame that the AsIs would have stopped. Cheers, Claudia apjawor...@mmm.com wrote: Hi, I have the following question about creating data frames. I want to create a data frame with 2 components: a vector and a matrix. Let me use a simple example: y - rnorm(10) x - matrix(rnorm(150), nrow=10) Now if I do dd - data.frame(x=x, y=y) I get a data frame with 16 colums, but if, according to the documentation, I do dd - data.frame(x=I(x), y=y) then str(dd) gives: 'data.frame': 10 obs. of 2 variables: $ x: AsIs [1:10, 1:15] 0.700073 -0.44371 -0.46625 0.977337 0.509786 ... $ y: num 0.4676 -1.4343 -0.3671 0.0637 -0.231 ... This looks and works OK. Now, there exists a CRAN package called pls. It has a yarn data set in it. data(yarn) str(yarn) 'data.frame': 28 obs. of 3 variables: $ NIR: num [1:28, 1:268] 3.07 3.07 3.08 3.08 3.1 ... ..- attr(*, dimnames)=List of 2 .. ..$ : NULL .. ..$ : NULL $ density: num 100 80.2 79.5 60.8 60 ... $ train : logi TRUE TRUE TRUE TRUE TRUE TRUE ... This looks almost the same, except the matrix component in my example has the AsIs instead of num. Is this just some older behavior of the data.frame function producing this difference? If not, how can I get my data frame (dd) to look like yarn? I read the help pages for data.frame and as.data.frame and found this paragraph If a list is supplied, each element is converted to a column in the data frame. Similarly, each column of a matrix is converted separately. This can be overridden if the object has a class which has a method for as.data.frame: two examples are matrices of class model.matrix (which are included as a single column) and list objects of class POSIXlt which are coerced to class POSIXct. If I do methods(as.data.frame) [1] as.data.frame.aovproj*as.data.frame.array [3] as.data.frame.AsIsas.data.frame.character [5] as.data.frame.complex as.data.frame.data.frame [7] as.data.frame.Dateas.data.frame.default [9] as.data.frame.difftimeas.data.frame.factor [11] as.data.frame.ftable* as.data.frame.integer [13] as.data.frame.listas.data.frame.logical [15] as.data.frame.logLik* as.data.frame.matrix [17] as.data.frame.model.matrixas.data.frame.numeric [19] as.data.frame.numeric_version as.data.frame.ordered [21] as.data.frame.POSIXct as.data.frame.POSIXlt [23] as.data.frame.raw as.data.frame.table [25] as.data.frame.ts as.data.frame.vector so it looks like there is a matrix method for as.data.frame. The question then is how can I override the default behavior for the matrix object (converting columns separately). Any hint will be appreciated, Andy __ Andy Jaworski 518-1-01 Process Laboratory 3M Corporate Research Laboratory - E-mail: apjawor...@mmm.com Tel: (651) 733-6092
Re: [R] colname of ... arguments
what about: niceplot-function(...) { arg.names - as.list (match.call () [-1]) for (a in seq_along (arg.names)) cat (as.character (as.expression (arg.names [[a]])), \n\n) } niceplot (greeneye, log (greeneye), 1:3) note that this works also if there is no greeneye Disclaimer: I don't know whether I'm suggesting something bad, but I'd like to learn about better ways. So I really appreciate comments. Claudia ManInMoon wrote: That is quite helpful David niceplot-function(...) { parms=list(...) for (x in parms) { xname - paste(deparse(substitute(x), 500), collapse = \n) cat(xname) } } GreenEyes=c(1,2,3,4) niceplot(GreenEyes) c(1, 2, 3, 4) BUT what I want is: GreenEyes=c(1,2,3,4) niceplot(GreenEyes) GreenEyes I will use the vector for plotting too, but I need it's name to produce a legend automatically On 10 March 2010 23:32, David Scott-6 [via R] ml-node+1588213-620034400-180...@n4.nabble.comml-node%2b1588213-620034400-180...@n4.nabble.com wrote: ManInMoon wrote: I have writtn a function where I pass a variable number of arguments. I They are vectors and I can manipulate them, but I need to get hold of the name for a legend. niceplot-function(...) { parms=list(...) for (x in parms) { DoSomethingWith(x) } } BUT how how can I get something like namestring(...) of nameofvector(x)? I use the following syntax to get the name of a data object to use in a title, label or whatever. xname - paste(deparse(substitute(x), 500), collapse = \n) This is taken from hist.default so at least has some provenance as an appropriate method. David Scott -- _ David Scott Department of Statistics The University of Auckland, PB 92019 Auckland 1142,NEW ZEALAND Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055 Email: [hidden email]http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1588213i=0, Fax: +64 9 373 7018 Director of Consulting, Department of Statistics __ [hidden email]http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1588213i=1mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View message @ http://n4.nabble.com/colname-of-arguments-tp1588146p1588213.html To unsubscribe from colname of ... arguments, click here (link removed) ==. -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] modifying the dots argument - how?
Mark, dots - list (...) gives you a list with the dots arguments if innerFoo expects not a list but normal arguments, do.call is your friend: do.call (innerFoo, dots) HTH schönen Tag, Claudia Mark Heckmann wrote: Is there a way to modify the dots-argument? Let's consider I have a function that passes on its ... arguments to another function. For some reason I know, that some elements in ... will cause problems. Can I modify the ... structure somehow, e.g. delete elements? foo - function(...){ innerFoo - function(...){ } AT THIS POINT I WANT TO MODIFY THE CONTENT OF ... BEFORE IT IS PASSED ON innerFoo(...) } Thanks, Mark ––– Mark Heckmann Dipl. Wirt.-Ing. cand. Psych. Vorstraße 93 B01 28359 Bremen Blog: www.markheckmann.de R-Blog: http://ryouready.wordpress.com -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with lattice boxplots...
you can use parameters: trellis.par.set (box.rectangle = modifyList (trellis.par.get (box.rectangle), list (col = black))) bwplot(y~x, data=ex, pch = |) I think the others go along the same line. Look into panel.bwplot to see what parameters are used to produce what. HTH Claudia Kim Jung Hwa wrote: Hi All, I need a small help with following code: I'm trying to convert dashed lines to regular ones; and changing default blue border color to say black... but I'm doing it wrong and its not working. Can anyone help please. Thanks, Code: require(lattice) ex - data.frame(x=1:10, y=rep(c(A,B), 5)) bwplot(y~x, data=ex, panel=function(x,y,...) { panel.bwplot(x, y, pch=|, border=black, lty=1,...) } ) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help question: How can we enable useRs to contribute corrections to help files faster ?
What about the short-term solution of having a function package.bug.report - along the lines of bug.report? E.g. see attachment Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Valerio 2 I-34127 Trieste ITALY email: cbelei...@units.it phone: +39 (0 40) 5 58-34 68 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] two questions for R beginners
Dear Patrick (and all) I'm now working with R a couple of years, before working mostly in Matlab Lazy impatient is both true for me :-) * What were your biggest misconceptions or stumbling blocks to getting up and running with R? * What documents helped you the most in this initial phase? I especially want to hear from people who are lazy and impatient. Feel free to write to me off-list. Definitely write off-list if you are just confirming what has been said on-list. Stumbling: * It took me long to remember getwd () and setwd () (instead of pwd and cd / chdir or the like) * I still discover very useful functions that I would have needed for a long time. Latest discoveries: mapply and ave I knew aggregate. And was always a little angry that it needs a grouping list. I even decided that the aggregate method for my hyperSpec class should work with factors as well as with lists. Some day I read in this mailing list that ave does what I need... I like the crosslinks in the help (see also) very much. Maybe I rely too much on them. So: not lazy today, I attach a patch for aggregate.Rd that adds the seealso to ave. Reading this mailing list once in a while gives me nice new ideas. However, 50 emails / d is somewhat scary for me, so I read only occasionally. * Vecorization: I like the *apply functions. but I'd really appreciate a comprehensive page/vignette here. I remember that it took me a while to realize that the rule for MARGIN in sweep is use the same number as in the apply that created the STATS * I never found the pdf manuals helpful (help pages are easier to access, and there is nothing in the pdf that the help doesn't have. At the beginning I expected the pdf manual to be something that the vignettes are. * I did not arrive at a comfortable debugging cycle for a long time. But now there's the debug package and setBreakpoint and I'm happy * As I now start teaching I notice that many students react to error messages uhh! an error! (panic). Few realizing that the error message actually gives information on what went wrong. A list with common causes of different error messages would be helpful here, I think. In case someone agrees: I started one at the Wiki: http://rwiki.sciviews.org/doku.php?id=tips:errormessages Cheers, Claudia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help me about data format.
I want to make a matrix and vector in same data frame. You need to protect your matrix by I () Btw: I'm actually writing a package for handling spectra that I plan to release in some weeks. It contains a vignette showing how pls calibration can be done. If you want to give it a try, let me know. Claudia Beleites -- Claudia Beleites DMRN, Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2 and lattice
Am Dienstag 16 Dezember 2008 17:13:33 schrieb Wayne F: stephen sefick wrote: yes a parallel coordinates plot- I understand that it is for multivariate data, but I am having a hard time figuring out what it is telling me. Thanks for your help. In the lattice book, the author mentions that static parallel plots aren't very useful, in general. While for some data they are just natural: e.g. when spectra are treated as multidimensional data. Then the parallel coordinate plot just gives you the spectrum. Of course, in this situation it is maybe the treatment as high-dimensional data that is somewhat weird for spectra. However, this offers a way, that might help understanding what's going on. I have a data set of p dimensions. E.g. spectra measured with p channels. Now, we can either think of such a spectrum as a point in p-d. E.g. a spectrum consisting of red, green, blue intensity is at a certain point in rgb-space. On the other hand, here the p dimensions have something to do with each other (e.g. an intrinsic order, let's say, by the wavelength). So it does make sense to plot the intensity over the p dimensions. That's the parallel coordinate plot. What you can tell from such a plot, depends very much on your data, and how you treated it. Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 (0 40) 5 58-34 47 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The end of Matlab
Dear list, Learning to use the power of R's indexing and functios like head() and tail() (which are just syntactic sugar) will probably lead you not to miss this. However, how do I exclude the last columns of a data.frame or matrix (or, in general, head and tail for given dimensions of an array)? I.e. something nicer than t (head (t (x), -n)) for excluding the last n columns of matrix x THX, Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 (0 40) 5 58-34 47 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The end of Matlab
Am Freitag 12 Dezember 2008 13:10:20 schrieb Patrick Burns: How about: x[, -seq(to=ncol(x), length=n)] Doing it is not my problem. I just agree with Mike in that I would like if I could do shorter than: x[, 1 : (ncol(x) - n)] which I btw prefer to your solution. Also, I don't have a problem writing generalized versions of head and tail to work along other/more dimensions. Or a combined function, taking first and last arguments. Still, they would not be as convenient to use as matlab's: 3 : end - 4 which btw. also does not need parentheses. I guess the general problem is that there is only one thing with integers that can easily be (ab)used as a flag: the negative sign. But there are (at least) 2 possibly useful special ways of indexing: - exclusion (as in R) - using -n for end - n (as in perl) Now we enjoy having a shortcut for exclusion (at least I do), but still feel that marking from the end would be useful. As no other signs (in the sense of flag) are available for integers, we won't be able to stop typing somewhat more in R. Wacek: x[3:] instead of x[3:length(x)] x[3:end] I don't think that would help: what to use for end - 3 within the convention that negative values mean exclusion? --- now I start dreaming --- However, it is possible to define new binary operators (operators are great for lazy typing...). Let's say %:% should be a new operator to generate proper indexing sequences to be used inside [ : e.g. an.array [ 1:3, -2 %:% -5, ...] If we now find an.array which is x inside [ (and also inside [[) - which is possible but maybe a bit fiddly and if we can also find out which of the indices is actually evaluated (which I don't know how to do) then we could use something* as a flag for from the end and calculate the proper sequence. something* could e.g. be either an attribute to the operators (convenient if we can define an unary operator that allows setting it, e.g. § 3 [§ is the easy-to-type sign on my keyboard that is not yet used...]) or i (the imaginary one) if there is no other convenient unary operator e.g. 3i = easy part of the solution: make.index - function (x, along.dim = 1, from, to){ if (is.null (dim (x))) dim - length (x) else dim - dim (x)[along.dim] if (is.complex (from)){ from - dim - from # 0i means end ## warning if re (from) != 0 ? } if (is.complex (to)){ to - dim - to # 0i means end ## warning if re (to) != 0 ? } from : to } %:% - function (e1, e2) ## using a new operator does not mess up : make.index (x = find.x (), along.dim = find.dim (), e1, e2) now, the heavy part are the still missing find.x () and find.dim () functions... I'm not sure whether this would be worth the work, but maybe someone is around who just knows how to do this. Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 (0 40) 5 58-34 47 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fwd: Re: The end of Matlab (sorry, I messed up a sentence)
-- Weitergeleitete Nachricht -- Betreff: Re: [R] The end of Matlab Datum: Freitag 12 Dezember 2008 Von: Claudia Beleites cbelei...@units.it An: r-help@r-project.org Am Freitag 12 Dezember 2008 13:10:20 schrieb Patrick Burns: How about: x[, -seq(to=ncol(x), length=n)] Doing it is not my problem. I just agree with Mike in that I would like if I could do shorter than: x[, 1 : (ncol(x) - n)] which I btw prefer to your solution. Also, I don't have a problem writing generalized versions of head and tail to work along other/more dimensions. Or a combined function, taking head-n and tail-n arguments. Still, they would not be as convenient to use as matlab's: 3 : end - 4 which btw. also does not need parentheses. I guess the general problem is that there is only one thing with integers that can easily be (ab)used as a flag: the negative sign. But there are (at least) 2 possibly useful special ways of indexing: - exclusion (as in R) - using -n for end - n (as in perl) Now we enjoy having a shortcut for exclusion (at least I do), but still feel that marking from the end would be useful. As no other signs (in the sense of flag) are available for integers, we won't be able to stop typing somewhat more in R. Wacek: x[3:] instead of x[3:length(x)] x[3:end] I don't think that would help: what to use for end - 3 within the convention that negative values mean exclusion? --- now I start dreaming --- However, it is possible to define new binary operators (operators are great for lazy typing...). Let's say %:% should be a new operator to generate proper indexing sequences to be used inside [ : e.g. an.array [ 1:3, -2 %:% -5, ...] If we now find an.array which is x inside [ (and also inside [[) - which is possible but maybe a bit fiddly and if we can also find out which of the indices is actually evaluated (which I don't know how to do) then we could use something* as a flag for from the end and calculate the proper sequence. something* could e.g. be either an attribute to the operators (convenient if we can define an unary operator that allows setting it, e.g. § 3 [§ is the easy-to-type sign on my keyboard that is not yet used...]) or i (the imaginary one) if there is no other convenient unary operator e.g. 3i = easy part of the solution: make.index - function (x, along.dim = 1, from, to){ if (is.null (dim (x))) dim - length (x) else dim - dim (x)[along.dim] if (is.complex (from)){ from - dim - from # 0i means end ## warning if re (from) != 0 ? } if (is.complex (to)){ to - dim - to # 0i means end ## warning if re (to) != 0 ? } from : to } %:% - function (e1, e2) ## using a new operator does not mess up : make.index (x = find.x (), along.dim = find.dim (), e1, e2) now, the heavy part are the still missing find.x () and find.dim () functions... I'm not sure whether this would be worth the work, but maybe someone is around who just knows how to do this. Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 (0 40) 5 58-34 47 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. --- -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 (0 40) 5 58-34 47 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The end of Matlab
I just realized that my idea of doing something without going into the extraction functions itself won't work :-( it was a nice dream, though. The reason is that there is no general way to find out what the needed length is: At least I'm just writing a class where 2 kinds of columns are involved. I don't give a dim attribute, though. But I could, and then: how to know how it should be interpreted? on the other hand, another possible solution would be to have ':' mean, inside range selection expressions, not the usual sequence generation, but rather specification of start and end indices: ... this is daydreaming, of course, because such modifications would break much old code, nothing would break if some other sign instead of : would be used. Maybe something like end... and the benefit may not outweigh the effort. This might be true in any case. If I only think of how many lines of nrow, ncol, length Co I could have written instead of posting wrong proposals Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 (0 40) 5 58-34 47 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] character count
nchar (c(convert this to 47 because it has 47 characters, this one has 26 characters, 13 characters)) HTH Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 (0 40) 5 58-34 47 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The end of Matlab
evens() last(5) wouldn't x[evens()][last(5)] do the already? or is different, though. Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 (0 40) 5 58-34 47 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] for loop query
Hi, Why isn't my loop incrementing i - the outer loop to 2 and then resetting j=3? It is. It runs out of bounds with j 26 Am I missing something obvious? for (i in 1:25) + { + for (j in i+1:26) You miss parentheses. i + 1 : 26 is i + (1 : 26) as the vector 1 :26 is calculated first what happens is that for i = 1 j goes over 2 : 27, with i = 2 over 3 : 28, ... what you want is (i + 1) : 26: for (i in 1 : 25) for (j in (i + 1) : 26) cat (i, j, \n) HTH Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 (0 40) 5 58-34 47 email: [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 1-Pearson's R Distance
Hi Rodrigo, afaik, (1 - r_Pearson)/2 is used rather than 1 - r_Pearson. This gives a distance measure ranging between 0 and 1 rather than 0 and 2. But after all, dies does not change anything substantial. see e.g. Theodoridis Koutroumbas: Pattern Recognition. I didn't know of the proxy package, but the calculation it straightforward (though a bit wasteful I suspect: first the whole matrix is produced, and as.dist cuts it down again to a triangular matrix): as.dist (0.5 - cor (t(x) / 2)) Take care wheter you want to use x or t(x). HTH Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 (0 40) 5 58-34 47 email: [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simplify this instruction
in addition to %in% and depending on what the general principle behind the setup is: you may want to have a look into switch (e.g. if there happen to be Cs and Ds...). or, of course you can check for B being between 0 and 9 rather than being the respective integers: ifelse ((B 0) (B = 9)), A, B) HTH Claudia Is there a way to simplify this instruction: ifelse(B==0,A, ifelse(B==1,A, ifelse(B==2,A, ifelse(B==3,A, ifelse(B==4,A, ifelse(B==5,A, ifelse(B==6,A, ifelse(B==7,A, ifelse(B==8,A, ifelse(B==9,A,B)) i am looking for something like this: ifelse(B==(0:9),A,B) Best regards -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 (0 40) 5 58-34 47 email: [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fourier Transform with irregularly spaced x
Dear all, I work with (vibrational) spectra: some kind of intensity (I) over frequency (nu), wavelength or the like. I want to do fourier transform for interpolation, smoothing, etc. My problem is that the spectra are often irregularly spaced in nu: the difference between 2 neighbouring nu varies across the spectrum, and data points may be missing. Searching for discrete fourier transform I found lots of information and functions - but I didn't see anything that just works with irregularly spaced signals: all functions I found take only the signal, not its x-axis. Where should I look? Or am I lacking some math that tells how to do without the frequency axis? Thanks a lot for your help, Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 (0 40) 5 58-34 47 email: [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fourier Transform with irregularly spaced x
Try http://finzi.psych.upenn.edu/R/library/nlts/html/spec.lomb.html or http://finzi.psych.upenn.edu/R/library/cts/html/spec.ls.html (do RSiteSearch(Lomb periodogram) -- the Lomb periodogram does a discrete (although not fast) Fourier transform of unevenly sampled (1D/time-series) data, accounting for the sampling distribution of points (which will the bias the results if you try to do a naive Fourier sum). Thanks Ben, that looks like a good start point. Stephen, my aim are neither spline nor linear approximation but something in the line of matlab's interpfft I do have the vibrational spectrum. Such spectra are frequently computed by ft from their (measured) interferograms. I.e. if you use an FT-spectrometer. However, the spectra can also be measured directly with a dispersive instrument. The difference between neighbouring frequencies of such spectra varies over the spectrum. E.g. I measure from 600 cm^-1 to 1800 cm^-1: at 600 cm^-1 I have a data point spacing of 1.04 cm^-1, while at 1800 cm^-1 it is only 0.85 cm^-1. So doing a ft (like spec.pgram ()) only on the signal means that I do not use periodic functions (sin x), but something rather like sin (x^2) - the sinus changes its frequency. This does not help. The idea is to calculate the interferogram (space or time domain) taking into account this variation of delta nu. Then do a backtransform to evenly spaced frequencies. The next step will then be to do other interesting things like downsampling, denoising etc. using the interferogram. Thanks, Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 (0 40) 5 58-34 47 email: [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Row and Column positions
Am Freitag 31 Oktober 2008 12:17:30 schrieb Shubha Vishwanath Karanth: m=data.frame(a=c(1,NA,5,5),b=c(4,5,6,7),c=c(NA,NA,NA,5)) ? which HTH Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 (0 40) 5 58-34 47 email: [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trying to pass arrays as arguments to a function
I'd like to avoid looping through an array in order to change values in the array as it takes too long. I red from an earlier post it can be done by do.call but never got it to work. The Idea is to change the value of y according to values in x. Wherever x holds the value 3, the corresponding value in y should be set to 1. y [x == 3] - 1 -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 (0 40) 5 58-34 47 email: [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.