Re: [R] introducing R to high school students
Bert, What you are saying - is a problem with people who are using Excel. It is not Excel's problem that people are sending data in an unstructured way. I agree - Excel may not be the right tool when you are doing some complicated data analysis (like for e.g. statistical modeling) - but that is not what Excel was built for. The power of Excel lies in being able to use it to explore data, represent and present your analysis. When exploring data, yes it may not be very useful beyond univariates and bivariates - but that is your starting point in EDA where you need to generate hypotheses about your data. I have been in the field of analytics for almost 7 years now, though we have embraced technologies like SAS, R, SPSS, Spotfire, etc., the power and importance of Excel in our lives has never been lost to us. Its a question of whether are you capable enough to use it. Regards, Indrajit From: Bert Gunter gunter.ber...@gene.com Cc: Rolf Turner rolf.tur...@xtra.co.nz Sent: Sunday, April 22, 2012 11:07 AM Subject: Re: [R] introducing R to high school students I would like to slightly clarify and echo Rolf's comment: Excel is a terrible tool for data analysis. Maybe it's a good tool for keeping track of your car's repair history... but not for data analysis. I could go on at great length why, but let me just focus on one aspect that drives me and other statisticians in my group crazy when we deal with scientists who send us data in Excel: the data are frequently a mess! By this I mean that they are often stored in crazy ways, with plots and summaries sprinkled around, capital letters and small letters mixed, missing values coded arbitrarily e.g.(9 ), and so forth. As someone I know once commented, it's a puzzle to get the data extracted in a form susceptible to analysis. Why is this? -- because Excel enforces no structure. It's **cell-based** (du), so users can throw in the data anyway they see fit, which frequently is pretty unfit. This is not just a minor issue, imho. Not having data in a reasonable structure limits what one can do for data analysis and graphics. This promulgates the inadequate and frequently awful paradigms that one sees throughout science (e.g. bar charts with 1 se bars sticking up out of them). The widespread use of Excel for serious' scientific and engineering data analysis is a near tragedy. All IMHO, of course. Cheers, Bert On Sat, Apr 21, 2012 at 9:45 PM, Indrajit Sengupta Why do you think Excel is a terrible tool? In what ways have you tried to use Excel and it has failed you? Regards, Indrajit From: Rolf Turner rolf.tur...@xtra.co.nz Cc: R-help R-help@r-project.org Sent: Sunday, April 22, 2012 9:25 AM Subject: Re: [R] introducing R to high school students On 22/04/12 15:29, Indrajit Sengupta wrote: SNIP 1. At school we seldom deal with lot of data - the focus is more on concepts. Excel is an excellent tool That is at best debatable, and IMHO just plain incorrect. I firmly believe that Excel is a ***TERRIBLE*** tool. and no matter how much we love or hate it - we will be using Excel a lot in our lives. This is not (unfortunately IMHO) debatable. It is all too sadly true. For most people at least. (Not for my very good self. I can get away with eschewing Excel. Most people are not lucky enough to have that option.) SNIP I think much of the remainder of the post was highly disputable as well, but I will desist at this point. cheers, Rolf Turner [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to remove $ (Dollar sign) from string
In data martedì 10 aprile 2012 13:34:13, Nevil Amos ha scritto: How do I remove a $ character from a string sub() and gsub() with $ or \$ as pattern do not work. sub($,,ABC$DEF) [1] ABC$DEF sub(\$,,ABC$DEF) Error: '\$' is an unrecognized escape in character string starting \$ sub(\$,,ABC$DEF) Error: unexpected input in sub(\ Thanks You just need a double backslash: sub(\\$,,ABC$DEF) [1] ABCDEF __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unexpected plot behavior
Thank you for the replies, Uwe and Marc. These are explanations that make perfect sense. However, shouldn't the behavior of plot.factor include the option of type = n for consistency with the default plot function? Best, Martin On 21 Apr 2012, at 08:18 , Marc Schwartz wrote: On Apr 21, 2012, at 9:49 AM, Martin Renner wrote: When plotting a numerical vector against a factor, 'type=n' seems to have no affect, e.g. plot (1:10~factor (1:10), type = n) looks just like plot (1:10~factor (1:10)) Plotting a numerical against itself works as expected: plot (1:10, type = n) I see the same behavior under debian gnu/linux, Mac OS X, and Win7 (all current versions, see below). Is this a bug? Regards, Martin This has to do with method dispatch. See ?plot.formula, which is the plot method called you pass a formula, as opposed to passing a vector as in your third example. In this case, ?plot.factor is called when the 'x' part of the formula (RHS) is a factor. When plot.factor is called, it internally calls ?boxplot and of course, there is no type = 'n' for boxplots, hence it is ignored. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] contour algorithm
On 12-04-21 9:21 PM, Stoch astic wrote: First time user, so sorry if I don't understand protocol.. Anyway, I have created a data frame consisting of pearson's R values at various x and y coordinates and then plotted this using filled.contour. My data is similar to fMRI data except that it is a surface map reconstructed from histological sections. I like the results but would like to know how contours were detected. Google search provides me various sources claiming the algorithm used is undocumented. For example: http://wipaed.wiso.uni-goettingen.de/~holdenb1/R/library/base/html/contour.html Draws contour lines for the desired levels. There is currently no documentation about the algorithm. The source code is in `$RHOME/src/main/plot.c'. That's a very old copy of the help page. You're generally better off using the ones installed with R, or the ones on CRAN (cran.r-project.org/web/packages) rather than what Google finds somewhere. Duncan Murdoch Does anyone know where or how I can find the method by which contours are calculated? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to avoid newlines tabs in file opening?
i have uploaded file,but when i am opening it in R,using u-file(file.choose(),r) k-readLines(u) k k[1:120] is has all /t (tabs) newlines, how to avoid it, can i take first 3 columns only in table form (lines starts with # not important for me) uploaded file:- http://r.789695.n4.nabble.com/file/n4577757/rabata.txt rabata.txt -- View this message in context: http://r.789695.n4.nabble.com/how-to-avoid-newlines-tabs-in-file-opening-tp4577757p4577757.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to cut files from any folder to another folder?
i want to cut file from e.g. abc folder put it into another location with folder name e.g. xyz how should i proceed? -- View this message in context: http://r.789695.n4.nabble.com/how-to-cut-files-from-any-folder-to-another-folder-tp4577818p4577818.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Standard error
Hello, I have tried obtaining the value of standard error from the code below but i get different values when i compare it with the standard error obtained from the hessian matrix. Can somebody help me out? Thank you n=100;rr=1000 p1=1.2;b=1.5 sq11=sq21=0 for (i in 1:rr){ t-rweibull(n,shape=p1,scale=b) meantrue-gamma(1+(1/p1))*b meantrue d-meantrue/0.40 cen- runif(n,min=0,max=d) s-ifelse(t=cen,1,0) q-c(t,cen) z-function(data, p){ beta-p[1] eta-p[2] log1-(n*sum(s)*log(p[1])-n*sum(s)*(p[1])*log(p[2])+sum(s)*(p[1]-1)*sum(log(t))-n*sum((t/(p[2]))^(p[1]))) return(-log1) } start - c(1,1) zz-optim(start,fn=z,data=q,hessian=T) zz m1-zz$par[2] p-zz$par[1] sq11-sq11+(1/rr*(sum((q-m1)^2))) sq21-sq21+(1/rr*(sum((q-Lm1)^2))) } se11-sqrt(sq11)/(rr-1) se11 se21-sqrt(sq21)/(rr-1) se21 f-solve(zz$hessian) se-sqrt(diag(f)) se Chris Guure Researcher Institute for Mathematical Research UPM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to remove $ (Dollar sign) from string
Why you need a double backslash is alluded to in Circle 8.1.23 of 'The R Inferno'. http://www.burns-stat.com/pages/Tutor/R_inferno.pdf Pat On 22/04/2012 10:18, Giuseppe Marinelli wrote: In data martedì 10 aprile 2012 13:34:13, Nevil Amos ha scritto: How do I remove a $ character from a string sub() and gsub() with $ or \$ as pattern do not work. sub($,,ABC$DEF) [1] ABC$DEF sub(\$,,ABC$DEF) Error: '\$' is an unrecognized escape in character string starting \$ sub(\$,,ABC$DEF) Error: unexpected input in sub(\ Thanks You just need a double backslash: sub(\\$,,ABC$DEF) [1] ABCDEF __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Patrick Burns pbu...@pburns.seanet.com twitter: @portfolioprobe http://www.portfolioprobe.com/blog http://www.burns-stat.com (home of 'Some hints for the R beginner' and 'The R Inferno') __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to take ID of number 7.
I figured out something new that I would like to see if I can do this more easy with R then Excel. I have these huge files with data. For example: DataFile.csv ID Name log2 1 Fantasy 5.651 2 New 7.60518 3 Finding 8.9532 4 Looeka -0.248652 5 Vani 0.3548 With like header1: ID, header 2: Name, header 3: log2 Now I need to get the $ID out who have a log2 value higher then 7. I know ho to grab the $log2 values with 7+ numbers. Log2HigherSeven = DataFile$log2 [ DataFile$log2 = 7] But how can I take thise ID numbers also? -- View this message in context: http://r.789695.n4.nabble.com/How-to-take-ID-of-number-7-tp4577998p4577998.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to take ID of number 7.
On 22-04-2012, at 13:03, Yellow wrote: I figured out something new that I would like to see if I can do this more easy with R then Excel. I have these huge files with data. For example: DataFile.csv ID Name log2 1 Fantasy 5.651 2 New 7.60518 3 Finding 8.9532 4 Looeka -0.248652 5 Vani 0.3548 With like header1: ID, header 2: Name, header 3: log2 Now I need to get the $ID out who have a log2 value higher then 7. I know ho to grab the $log2 values with 7+ numbers. Log2HigherSeven = DataFile$log2 [ DataFile$log2 = 7] How about DataFile[DataFile$log2 = 7, c(ID,Log2)] to get a dataframe with two columns ID and log2. Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] compare mean
I am also compairing 2 things with each other. x = c(1, 5, 7, 9) y = c(2, 7, 9, 10, 11) intersect(x, y) Output will be: 7, 9. Hope it helped. :) -- View this message in context: http://r.789695.n4.nabble.com/compare-mean-tp4576372p4578007.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RE how to cut files from any folder to another folder?
?file.copy Or ?system From: sagarnikam123 sagarnikam123_at_gmail.com Date: Sun, 22 Apr 2012 01:25:21 -0700 (PDT) i want to cut file from e.g. abc folder put it into another location with folder name e.g. xyz how should i proceed? -- Sent from my Cray XK6 Quidvis recte factum, quamvis humile, praeclarum. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to take ID of number 7.
Hello, Berend Hasselman wrote On 22-04-2012, at 13:03, Yellow wrote: I figured out something new that I would like to see if I can do this more easy with R then Excel. I have these huge files with data. For example: DataFile.csv ID Name log2 1 Fantasy 5.651 2 New 7.60518 3 Finding 8.9532 4 Looeka -0.248652 5 Vani 0.3548 With like header1: ID, header 2: Name, header 3: log2 Now I need to get the $ID out who have a log2 value higher then 7. I know ho to grab the $log2 values with 7+ numbers. Log2HigherSeven = DataFile$log2 [ DataFile$log2 = 7] How about DataFile[DataFile$log2 = 7, c(ID,Log2)] to get a dataframe with two columns ID and log2. Berend __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Or maybe create an index vector into the rows of the data frame. This would be more flexible, later any columns could be extracted. The index can be a logical or integer vector. inx.log - DataFile$log2 = 7 inx.int - which(DataFile$log2 = 7) DataFile[inx.one.of.them, needed.cols] As a side effect, it might also save some memory. Both indexes are internally integers. Hope this helps, Rui Barradas -- View this message in context: http://r.789695.n4.nabble.com/How-to-take-ID-of-number-7-tp4577998p4578162.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] introducing R to high school students
I have to agree that Excel is a poor tool for serious scientific and engineering data analysis (love the phrase.) I too have spent way too much time beating Excel files into submission, with workarounds and manipulations, just to be able to do anything useful with them. I'm told that one can to some degree impose structure on Excel data entry, but I don't know how, and no users ever seem to set up their spreadsheets that way. Somehow, a reasonable tool for business (I suppose, not being a businessman), has infiltrated the scientific world as well. That's really the motivation for my proposal to my science teacher colleague. I want to introduce budding scientists to the idea that there is a better tool for data analysis, even for exploratory analysis and univariates and bivariates, which R does very handily. Why start an analysis in Excel only to have to switch to something else for the latter half? And this will lead inevitably into conversations about better ways to record, store, and share data. And it ties into concepts of collaboration and reproducible research. --Chris Ryan SUNY Upstate Clinical Campus Binghamton, NY __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unexpected plot behavior
On Apr 22, 2012, at 1:25 AM, Martin Renner wrote: Thank you for the replies, Uwe and Marc. These are explanations that make perfect sense. However, shouldn't the behavior of plot.factor include the option of type = n for consistency with the default plot function? Best, Martin I don't believe so. The use of type = n is to facilitate the creation of a plotting environment, into which you can, in a piecemeal fashion, create a new plot from a blank canvas. The nature of that plot could be virtually anything with symbols, lines/curves, shapes and perhaps even pure text. Since plot.factor() internally calls one of several specific plot functions (eg. boxplot(), barplot(), spineplot() or plot()) depending upon the nature of the argument(s) passed, you need to understand exactly what you intend to do with a blank plotting device as each of those functions has it's own set of characteristics, defaults and intents. Thus, having plot() or more specifically, plot.default(), support the type = n paradigm, is sufficient in creating a plot device with desired axis ranges, parameters and so forth, to then enable you to then add whatever additional content you require. There is no a priori expectation that a function's child methods inherit all of the parent's functionality, because the generic default method's functionality may not be apropos to the child classes. Similarly, the child classes may implement specific functionality not apropos to the generic parent class because more specific information is known about the structure of the child. Regards, Marc On 21 Apr 2012, at 08:18 , Marc Schwartz wrote: On Apr 21, 2012, at 9:49 AM, Martin Renner wrote: When plotting a numerical vector against a factor, 'type=n' seems to have no affect, e.g. plot (1:10~factor (1:10), type = n) looks just like plot (1:10~factor (1:10)) Plotting a numerical against itself works as expected: plot (1:10, type = n) I see the same behavior under debian gnu/linux, Mac OS X, and Win7 (all current versions, see below). Is this a bug? Regards, Martin This has to do with method dispatch. See ?plot.formula, which is the plot method called you pass a formula, as opposed to passing a vector as in your third example. In this case, ?plot.factor is called when the 'x' part of the formula (RHS) is a factor. When plot.factor is called, it internally calls ?boxplot and of course, there is no type = 'n' for boxplots, hence it is ignored. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to avoid newlines tabs in file opening?
How about, don't avoid them, use them? dta - read.table( http://r.789695.n4.nabble.com/file/n4577757/rabata.txt rabata.txt, as.is=TRUE, skip=4, sep=\t ) --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. sagarnikam123 sagarnikam...@gmail.com wrote: i have uploaded file,but when i am opening it in R,using u-file(file.choose(),r) k-readLines(u) k k[1:120] is has all /t (tabs) newlines, how to avoid it, can i take first 3 columns only in table form (lines starts with # not important for me) uploaded file:- http://r.789695.n4.nabble.com/file/n4577757/rabata.txt rabata.txt -- View this message in context: http://r.789695.n4.nabble.com/how-to-avoid-newlines-tabs-in-file-opening-tp4577757p4577757.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Transform dataframe
Hi everyone! I have to following question: I have three items that had to be ordered (e.g. three persons were rating var1 on the first rank): var1 var2 var3 123 213 132 123 Now I'd like to have the data.frame the other way round, so that the ranks are in the columns: rank1 rank2 rank3 var1 var2 var3 var2 var1 var3 var1 var3 var2 var1 var2 var3 Can anyone help me achieving this? # code: var1-c(1,2,1,1) var2-c(2,1,3,2) var3-c(3,3,2,3) df-as.data.frame(cbind(var1,var2,var3,var4)) ?? Thank you very much! David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to avoid newlines tabs in file opening?
Hello, sagarnikam123 wrote i have uploaded file,but when i am opening it in R,using u-file(file.choose(),r) k-readLines(u) k k[1:120] is has all /t (tabs) newlines, how to avoid it, can i take first 3 columns only in table form (lines starts with # not important for me) uploaded file:- http://r.789695.n4.nabble.com/file/n4577757/rabata.txt rabata.txt Try the following. # Read from the link you gave fl - file(http://r.789695.n4.nabble.com/file/n4577757/rabata.txt;, rb) bin - readBin(fl, what=character) close(fl) # Get rid of tabs and '\r' if any bin - gsub([[:blank:]], , bin) bin - gsub(\\r, , bin) # Split in lines of text and keep those not starting with '#' txt - unlist(strsplit(bin, \\n)) txt - txt[substr(txt, 1, 1) != #] # Now make a data.frame of it, cols 1:3 only lst - lapply(strsplit(txt, ), function(x) x[1:3]) df1 - data.frame(do.call(rbind, lst), stringsAsFactors=FALSE) # See what we have str(df1) head(df1) # And revamp col 1 df1$X1 - as.integer(df1$X1) str(df1) head(df1) # Final clean-up # rm(bin, txt, lst) Hope this helps, Rui Barradas -- View this message in context: http://r.789695.n4.nabble.com/how-to-avoid-newlines-tabs-in-file-opening-tp4577757p4578360.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to cut files from any folder to another folder?
The cut/copy/paste paradigm is not common in programmed file manipulation under various operating systems... due to cross-platform compatibility, be prepared to work on files with a copy(=duplicate)/remove approach. ?files --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. sagarnikam123 sagarnikam...@gmail.com wrote: i want to cut file from e.g. abc folder put it into another location with folder name e.g. xyz how should i proceed? -- View this message in context: http://r.789695.n4.nabble.com/how-to-cut-files-from-any-folder-to-another-folder-tp4577818p4577818.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] difficulty in Formatting time series data
Dear R-Gurus I have a data frame (from CSV file) which has its first column called Date. The Date is in the format mm/dd/. I was trying to get the weekday for these dates and I tried using wday() and day.of.week() functions and both of them gave me precisely the wrong answers. I think the issue lies in the proper formatting of dates. The class of this column is a factor class and hence I tried converting into POSIXlt, xts, zoo objects and yet I could not get the weekday correctly. Anyone has any suggestions please? Many thanks Raghu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] difficulty in Formatting time series data
Yes dput() for a reproducible example with some minimal reproducible code (and the packages day.of.week and wday() come from...) x - xts(10, Sys.Date()) wday(x) seems fine for me. precisely the wrong answers -- interesting turn of phrase. Michael On Sun, Apr 22, 2012 at 12:53 PM, Raghuraman Ramachandran optionsra...@gmail.com wrote: Dear R-Gurus I have a data frame (from CSV file) which has its first column called Date. The Date is in the format mm/dd/. I was trying to get the weekday for these dates and I tried using wday() and day.of.week() functions and both of them gave me precisely the wrong answers. I think the issue lies in the proper formatting of dates. The class of this column is a factor class and hence I tried converting into POSIXlt, xts, zoo objects and yet I could not get the weekday correctly. Anyone has any suggestions please? Many thanks Raghu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Transform dataframe
On Sun, 22 Apr 2012, David Studer wrote: Hi everyone! I have to following question: I have three items that had to be ordered (e.g. three persons were rating var1 on the first rank): var1 var2 var3 123 213 132 123 Now I'd like to have the data.frame the other way round, so that the ranks are in the columns: rank1 rank2 rank3 var1 var2 var3 var2 var1 var3 var1 var3 var2 var1 var2 var3 Can anyone help me achieving this? # code: var1-c(1,2,1,1) var2-c(2,1,3,2) var3-c(3,3,2,3) df-as.data.frame(cbind(var1,var2,var3,var4)) ?? Thank you very much! David [[alternative HTML version deleted]] Please fix your email settings to send text to this list... tc - textConnection( var1 var2 var3 123 213 132 123 ) dta - read.table( tc, as.is=TRUE, header=TRUE ) close( tc ) dta$respondent - letters[1:4] library(reshape2) dtalong - melt( dta, id=respondent ) # levels specified to guard against sorting problems if more # than 9 rankings dtalong$rank - factor( paste( rank, dtalong$value, sep= ) , levels=paste( rank , sort( unique( dtalong$value ) ) , sep= ) , ordered=TRUE ) dta2 - dcast( dtalong, respondent ~ rank, value.var=variable ) --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] difficulty in Formatting time series data
Raghu, On 22 April 2012 09:53, Raghuraman Ramachandran optionsra...@gmail.comwrote: I have a data frame (from CSV file) which has its first column called Date. The Date is in the format mm/dd/. I was trying to get the weekday for these dates and I tried using wday() and day.of.week() functions and both of them gave me precisely the wrong answers. I think the issue lies in the proper formatting of dates. The class of this column is a factor class and hence I tried converting into POSIXlt, xts, zoo objects and yet I could not get the weekday correctly. Anyone has any suggestions please? Try this: # assume dataIn is where the CSV files data is... dataIn$Date - as.POSIXct(dataIn$Date, format='%m/%d/%y') dataIn - cbind(dataIn, day.of.week = format(dataIn$Date, format='%A') -- Sent from my mobile device Envoyait de mon portable [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to take ID of number 7.
O_o This is kinda interesting I have 267 log2 values = 7. And 295 ID numbers. I don't see any problems in my code also: ID_Log2_Above_7 = DataFile[DataFile$log2 = 7, c(ID, Log2] # Take ID out. ID_Above_7 = ID_Log2_Above_7$ID # Only numbers, no na or inf. ID_Above_7_NO_NA = ID_Above_7[is.na(ID_Above_7)] ID_Above_7_FINAL = ID_Above_7_NO_NA[is.finite(ID_Above_7_NO_NA)] I also did the same thing for these log2, and those are 267, as it should be. But why do I have 295 ID numbers? I seriously don't get it? -- View this message in context: http://r.789695.n4.nabble.com/How-to-take-ID-of-number-7-tp4577998p4578532.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] difficulty in Formatting time series data
On Sun, 22 Apr 2012, Hasan Diwan wrote: Raghu, On 22 April 2012 09:53, Raghuraman Ramachandran optionsra...@gmail.comwrote: I have a data frame (from CSV file) which has its first column called Date. The Date is in the format mm/dd/. I was trying to get the weekday for these dates and I tried using wday() and day.of.week() functions and both of them gave me precisely the wrong answers. I think the issue lies in the proper formatting of dates. The class of this column is a factor class and hence I tried converting into POSIXlt, xts, zoo objects and yet I could not get the weekday correctly. Anyone has any suggestions please? Try this: # assume dataIn is where the CSV files data is... dataIn$Date - as.POSIXct(dataIn$Date, format='%m/%d/%y') By far the most common error I see is failing to import the Date column as character, instead allowing the import function to convert it to factor, after which computations (such as the above suggestion) use the hidden factor index instead of the visible character representation, which further mystifies beginners. The conversion above will only work correctly if the column was imported as character. E.g. dataIn - read.csv( file=yourdatafile, as.is=TRUE ) OP: Use the str() function to see what types you are working with, and in future R-help queries send dput() of the data and code you have tried if we are to be able to reproduce your attempts effectively rather than reading your mind. dataIn - cbind(dataIn, day.of.week = format(dataIn$Date, format='%A') Why not just dataIn$day.of.week - weekdays( dataIn$Date ) ? --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to take ID of number 7.
Please provide self-contained, reproducible examples. On Sun, 22 Apr 2012, Yellow wrote: O_o This is kinda interesting I have 267 log2 values = 7. And 295 ID numbers. I don't see any problems in my code also: ID_Log2_Above_7 = DataFile[DataFile$log2 = 7, c(ID, Log2] Missing a parenthesis, and see below. # Take ID out. ID_Above_7 = ID_Log2_Above_7$ID # Only numbers, no na or inf. ID_Above_7_NO_NA = ID_Above_7[is.na(ID_Above_7)] ID_Above_7_FINAL = ID_Above_7_NO_NA[is.finite(ID_Above_7_NO_NA)] I also did the same thing for these log2, and those are 267, as it should be. But why do I have 295 ID numbers? I seriously don't get it? You stopped working with data frames midway through, and now there is no well-defined correspondence between ID numbers and log2 numbers (whatever they are). The troublesome values you eliminate in one column must also eliminate values in the other column. Modify the line above to select rows in the data frame that are not null and are not finite, and you will end up with a single dataframe of data that meets your quality criteria. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] difficulty in Formatting time series data
I tried downloading using as.is and have also provided the dput below. The date for example is 20/4/2012 and wday gives 2 instead of 6? Thanks for all your help. str(test1) 'data.frame': 1825 obs. of 7 variables: $ Date : chr 20/04/2012 19/04/2012 18/04/2012 17/04/2012 ... $ Open : num 2.33 2.35 2.35 2.34 2.32 2.34 2.3 2.28 2.29 2.28 ... $ High : num 2.34 2.36 2.38 2.34 2.35 2.36 2.32 2.29 2.33 2.3 ... $ Low : num 2.31 2.33 2.34 2.3 2.31 2.32 2.29 2.25 2.28 2.28 ... $ Close: num 2.31 2.35 2.36 2.33 2.31 2.32 2.31 2.26 2.3 2.29 ... $ Volume : int 5366000 5382000 9606000 9596000 5941000 10332000 700 9636000 6019000 3279000 ... $ Adj.Close: num 2.31 2.35 2.36 2.33 2.31 2.32 2.31 2.26 2.3 2.29 ... wday(test$Date[1]) [1] 2 wday(test1$Date[1]) [1] 2 test1%Date[1] Error: unexpected input in test1%Date[1] test1$Date[1] [1] 20/04/2012 dput(test1) structure(list(Date = c(20/04/2012, 19/04/2012, 18/04/2012, 17/04/2012, 16/04/2012, 13/04/2012, 12/04/2012, 11/04/2012, 10/04/2012, 09/04/2012, 05/04/2012, 04/04/2012, 03/04/2012, 02/04/2012, 30/03/2012, 29/03/2012, 28/03/2012, 27/03/2012, 26/03/2012, 23/03/2012, 21/03/2012, 20/03/2012, 19/03/2012, 16/03/2012, 15/03/2012, 14/03/2012, 13/03/2012, 12/03/2012, 09/03/2012, 08/03/2012, 07/03/2012, 06/03/2012, 05/03/2012, 02/03/2012, 01/03/2012, 29/02/2012, 28/02/2012, 27/02/2012, 24/02/2012, 23/02/2012, 22/02/2012, 21/02/2012, 20/02/2012, 17/02/2012, 16/02/2012, 15/02/2012, 14/02/2012, 13/02/2012, 10/02/2012, 09/02/2012, 08/02/2012, 07/02/2012, 06/02/2012, 03/02/2012, 02/02/2012, 01/02/2012, 31/01/2012, 30/01/2012, 27/01/2012, 26/01/2012, 25/01/2012, 20/01/2012, 19/01/2012, 18/01/2012, 17/01/2012, 16/01/2012, 13/01/2012, 12/01/2012, 11/01/2012, 10/01/2012, 09/01/2012, 06/01/2012, 05/01/2012, 04/01/2012, 03/01/2012, 30/12/2011, 29/12/2011, 28/12/2011, 27/12/2011, 23/12/2011, 22/12/2011, 21/12/2011, 20/12/2011, 19/12/2011, 16/12/2011, 15/12/2011, 14/12/2011, 13/12/2011, 12/12/2011, 09/12/2011, 08/12/2011, 07/12/2011, 06/12/2011, 05/12/2011, 02/12/2011, 01/12/2011, 30/11/2011, 29/11/2011, 28/11/2011, 25/11/2011, 24/11/2011, 23/11/2011, 22/11/2011, 21/11/2011, 18/11/2011, 17/11/2011, 16/11/2011, 15/11/2011, 14/11/2011, 11/11/2011, 10/11/2011, 09/11/2011, 08/11/2011, 04/11/2011, 03/11/2011, 02/11/2011, 01/11/2011, 31/10/2011, 28/10/2011, 27/10/2011, 25/10/2011, 24/10/2011, 21/10/2011, 20/10/2011, 19/10/2011, 18/10/2011, 17/10/2011, 14/10/2011, 13/10/2011, 12/10/2011, 11/10/2011, 10/10/2011, 07/10/2011, 06/10/2011, 05/10/2011, 04/10/2011, 03/10/2011, 30/09/2011, 29/09/2011, 28/09/2011, 27/09/2011, 26/09/2011, 23/09/2011, 22/09/2011, 21/09/2011, 20/09/2011, 19/09/2011, 16/09/2011, 15/09/2011, 14/09/2011, 13/09/2011, 12/09/2011, 09/09/2011, 08/09/2011, 07/09/2011, 06/09/2011, 05/09/2011, 02/09/2011, 01/09/2011, 31/08/2011, 29/08/2011, 26/08/2011, 25/08/2011, 24/08/2011, 23/08/2011, 22/08/2011, 19/08/2011, 18/08/2011, 17/08/2011, 16/08/2011, 15/08/2011, 12/08/2011, 11/08/2011, 10/08/2011, 08/08/2011, 05/08/2011, 04/08/2011, 03/08/2011, 02/08/2011, 01/08/2011, 29/07/2011, 28/07/2011, 27/07/2011, 26/07/2011, 25/07/2011, 22/07/2011, 21/07/2011, 20/07/2011, 19/07/2011, 18/07/2011, 15/07/2011, 14/07/2011, 13/07/2011, 12/07/2011, 11/07/2011, 08/07/2011, 07/07/2011, 06/07/2011, 05/07/2011, 04/07/2011, 01/07/2011, 30/06/2011, 29/06/2011, 28/06/2011, 27/06/2011, 24/06/2011, 23/06/2011, 22/06/2011, 21/06/2011, 20/06/2011, 17/06/2011, 16/06/2011, 15/06/2011, 14/06/2011, 13/06/2011, 10/06/2011, 09/06/2011, 08/06/2011, 07/06/2011, 06/06/2011, 03/06/2011, 02/06/2011, 01/06/2011, 31/05/2011, 30/05/2011, 27/05/2011, 26/05/2011, 25/05/2011, 24/05/2011, 23/05/2011, 20/05/2011, 19/05/2011, 18/05/2011, 16/05/2011, 13/05/2011, 12/05/2011, 11/05/2011, 10/05/2011, 09/05/2011, 06/05/2011, 05/05/2011, 04/05/2011, 03/05/2011, 29/04/2011, 28/04/2011, 27/04/2011, 26/04/2011, 25/04/2011, 21/04/2011, 20/04/2011, 19/04/2011, 18/04/2011, 15/04/2011, 14/04/2011, 13/04/2011, 12/04/2011, 11/04/2011, 08/04/2011, 07/04/2011, 06/04/2011, 05/04/2011, 04/04/2011, 01/04/2011, 31/03/2011, 30/03/2011, 29/03/2011, 28/03/2011, 25/03/2011, 24/03/2011, 23/03/2011, 22/03/2011, 21/03/2011, 18/03/2011, 17/03/2011, 16/03/2011, 15/03/2011, 14/03/2011, 11/03/2011, 10/03/2011, 09/03/2011, 08/03/2011, 07/03/2011, 04/03/2011, 03/03/2011, 02/03/2011, 01/03/2011, 28/02/2011, 25/02/2011, 24/02/2011, 23/02/2011, 22/02/2011, 21/02/2011, 18/02/2011, 17/02/2011, 16/02/2011, 15/02/2011, 14/02/2011, 11/02/2011, 10/02/2011, 09/02/2011, 08/02/2011, 07/02/2011, 02/02/2011, 01/02/2011, 31/01/2011, 28/01/2011, 27/01/2011, 26/01/2011, 25/01/2011, 24/01/2011, 21/01/2011, 20/01/2011, 19/01/2011, 18/01/2011, 17/01/2011, 14/01/2011, 13/01/2011, 12/01/2011, 11/01/2011, 10/01/2011, 07/01/2011, 06/01/2011, 05/01/2011, 04/01/2011, 03/01/2011, 31/12/2010, 30/12/2010, 29/12/2010, 28/12/2010, 27/12/2010, 24/12/2010, 23/12/2010, 22/12/2010, 21/12/2010, 20/12/2010, 17/12/2010, 16/12/2010, 15/12/2010, 14/12/2010,
Re: [R] how to cut files from any folder to another folder?
sagarnikam123 sagarnikam...@gmail.com writes: i want to cut file from e.g. abc folder put it into another location with folder name e.g. xyz how should i proceed? See ?files -- View this message in context: http://r.789695.n4.nabble.com/how-to-cut-files-from-any-folder-to-another-folder-tp4577818p4577818.html Sent from the R help mailing list archive at Nabble.com. -- Charles C. BerryDept of Family/Preventive Medicine cberry at ucsd edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] difficulty in Formatting time series data
I also tried: test$Date=as.POSIXct(test$Date,format=%m%d%y) test=cbind(test,day.of.week=format(test$Date,format=%A)) head(test) Date Open High Low Close Volume Adj.Close day.of.week 1 NA 2.33 2.34 2.31 2.31 5366000 2.31NA 2 NA 2.35 2.36 2.33 2.35 5382000 2.35NA 3 NA 2.35 2.38 2.34 2.36 9606000 2.36NA 4 NA 2.34 2.34 2.30 2.33 9596000 2.33NA 5 NA 2.32 2.35 2.31 2.31 5941000 2.31NA 6 NA 2.34 2.36 2.32 2.32 10332000 2.32 It didnt help. Thx Raghu On Sun, Apr 22, 2012 at 6:41 PM, Jeff Newmiller jdnew...@dcn.davis.ca.uswrote: On Sun, 22 Apr 2012, Hasan Diwan wrote: Raghu, On 22 April 2012 09:53, Raghuraman Ramachandran optionsra...@gmail.com wrote: I have a data frame (from CSV file) which has its first column called Date. The Date is in the format mm/dd/. I was trying to get the weekday for these dates and I tried using wday() and day.of.week() functions and both of them gave me precisely the wrong answers. I think the issue lies in the proper formatting of dates. The class of this column is a factor class and hence I tried converting into POSIXlt, xts, zoo objects and yet I could not get the weekday correctly. Anyone has any suggestions please? Try this: # assume dataIn is where the CSV files data is... dataIn$Date - as.POSIXct(dataIn$Date, format='%m/%d/%y') By far the most common error I see is failing to import the Date column as character, instead allowing the import function to convert it to factor, after which computations (such as the above suggestion) use the hidden factor index instead of the visible character representation, which further mystifies beginners. The conversion above will only work correctly if the column was imported as character. E.g. dataIn - read.csv( file=yourdatafile, as.is=TRUE ) OP: Use the str() function to see what types you are working with, and in future R-help queries send dput() of the data and code you have tried if we are to be able to reproduce your attempts effectively rather than reading your mind. dataIn - cbind(dataIn, day.of.week = format(dataIn$Date, format='%A') Why not just dataIn$day.of.week - weekdays( dataIn$Date ) ? --**--** --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] difficulty in Formatting time series data
On 22-04-2012, at 20:12, Raghuraman Ramachandran wrote: I tried downloading using as.is and have also provided the dput below. The date for example is 20/4/2012 and wday gives 2 instead of 6? Thanks for all your help. dt - 20/04/2012 as.Date(dt) [1] 0020-04-20 as.Date(dt,format=%d/%m/%Y) [1] 2012-04-20 weekdays(as.Date(dt,format=%d/%m/%Y)) [1] Friday Read the help for as.Date Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] difficulty in Formatting time series data
On Apr 22, 2012, at 2:12 PM, Raghuraman Ramachandran wrote: I tried downloading using as.is and have also provided the dput below. The date for example is 20/4/2012 and wday gives 2 instead of 6? Thanks for all your help. str(test1) 'data.frame': 1825 obs. of 7 variables: $ Date : chr 20/04/2012 19/04/2012 18/04/2012 17/04/2012 ... $ Open : num 2.33 2.35 2.35 2.34 2.32 2.34 2.3 2.28 2.29 2.28 ... $ High : num 2.34 2.36 2.38 2.34 2.35 2.36 2.32 2.29 2.33 2.3 ... $ Low : num 2.31 2.33 2.34 2.3 2.31 2.32 2.29 2.25 2.28 2.28 ... $ Close: num 2.31 2.35 2.36 2.33 2.31 2.32 2.31 2.26 2.3 2.29 ... $ Volume : int 5366000 5382000 9606000 9596000 5941000 10332000 700 9636000 6019000 3279000 ... $ Adj.Close: num 2.31 2.35 2.36 2.33 2.31 2.32 2.31 2.26 2.3 2.29 ... wday(test$Date[1]) [1] 2 You are skipping a couple of essential steps. The wday() function is unable to infer that you are using a non-standard date format. (I don't think it would even work if you were using -MM-DD.) Read up on : ?DateTimeClasses ?as.Date ?strptime -- David. wday(test1$Date[1]) [1] 2 test1%Date[1] Error: unexpected input in test1%Date[1] test1$Date[1] [1] 20/04/2012 dput(test1) structure(list(Date = c(20/04/2012, 19/04/2012, 18/04/2012, 17/04/2012, 16/04/2012, 13/04/2012, 12/04/2012, 11/04/2012, 10/04/2012, 09/04/2012, 05/04/2012, 04/04/2012, 03/04/2012, 02/04/2012, 30/03/2012, 29/03/2012, 28/03/2012, 27/03/2012, 26/03/2012, 23/03/2012, 21/03/2012, 20/03/2012, 19/03/2012, 16/03/2012, 15/03/2012, 14/03/2012, 13/03/2012, 12/03/2012, 09/03/2012, 08/03/2012, 07/03/2012, 06/03/2012, 05/03/2012, 02/03/2012, 01/03/2012, 29/02/2012, 28/02/2012, 27/02/2012, 24/02/2012, 23/02/2012, 22/02/2012, 21/02/2012, 20/02/2012, 17/02/2012, 16/02/2012, 15/02/2012, 14/02/2012, 13/02/2012, 10/02/2012, 09/02/2012, 08/02/2012, 07/02/2012, 06/02/2012, 03/02/2012, 02/02/2012, 01/02/2012, 31/01/2012, 30/01/2012, 27/01/2012, 26/01/2012, 25/01/2012, 20/01/2012, 19/01/2012, 18/01/2012, 17/01/2012, 16/01/2012, 13/01/2012, 12/01/2012, 11/01/2012, 10/01/2012, 09/01/2012, 06/01/2012, 05/01/2012, 04/01/2012, 03/01/2012, 30/12/2011, 29/12/2011, 28/12/2011, 27/12/2011, 23/12/2011, 22/12/2011, 21/12/2011, 20/12/2011, 19/12/2011, 16/12/2011, 15/12/2011, 14/12/2011, 13/12/2011, 12/12/2011, 09/12/2011, 08/12/2011, 07/12/2011, 06/12/2011, 05/12/2011, 02/12/2011, 01/12/2011, 30/11/2011, 29/11/2011, 28/11/2011, 25/11/2011, 24/11/2011, 23/11/2011, 22/11/2011, 21/11/2011, 18/11/2011, 17/11/2011, 16/11/2011, 15/11/2011, 14/11/2011, 11/11/2011, 10/11/2011, 09/11/2011, 08/11/2011, 04/11/2011, 03/11/2011, 02/11/2011, 01/11/2011, 31/10/2011, 28/10/2011, 27/10/2011, 25/10/2011, 24/10/2011, 21/10/2011, 20/10/2011, 19/10/2011, 18/10/2011, 17/10/2011, 14/10/2011, 13/10/2011, 12/10/2011, 11/10/2011, 10/10/2011, 07/10/2011, 06/10/2011, 05/10/2011, 04/10/2011, 03/10/2011, 30/09/2011, 29/09/2011, 28/09/2011, 27/09/2011, 26/09/2011, 23/09/2011, 22/09/2011, 21/09/2011, 20/09/2011, 19/09/2011, 16/09/2011, 15/09/2011, 14/09/2011, 13/09/2011, 12/09/2011, 09/09/2011, 08/09/2011, 07/09/2011, 06/09/2011, 05/09/2011, 02/09/2011, 01/09/2011, 31/08/2011, 29/08/2011, 26/08/2011, 25/08/2011, 24/08/2011, 23/08/2011, 22/08/2011, 19/08/2011, 18/08/2011, 17/08/2011, 16/08/2011, 15/08/2011, 12/08/2011, 11/08/2011, 10/08/2011, 08/08/2011, 05/08/2011, 04/08/2011, 03/08/2011, 02/08/2011, 01/08/2011, 29/07/2011, 28/07/2011, 27/07/2011, 26/07/2011, 25/07/2011, 22/07/2011, 21/07/2011, 20/07/2011, 19/07/2011, 18/07/2011, 15/07/2011, 14/07/2011, 13/07/2011, 12/07/2011, 11/07/2011, 08/07/2011, 07/07/2011, 06/07/2011, 05/07/2011, 04/07/2011, 01/07/2011, 30/06/2011, 29/06/2011, 28/06/2011, 27/06/2011, 24/06/2011, 23/06/2011, 22/06/2011, 21/06/2011, 20/06/2011, 17/06/2011, 16/06/2011, 15/06/2011, 14/06/2011, 13/06/2011, 10/06/2011, 09/06/2011, 08/06/2011, 07/06/2011, 06/06/2011, 03/06/2011, 02/06/2011, 01/06/2011, 31/05/2011, 30/05/2011, 27/05/2011, 26/05/2011, 25/05/2011, 24/05/2011, 23/05/2011, 20/05/2011, 19/05/2011, 18/05/2011, 16/05/2011, 13/05/2011, 12/05/2011, 11/05/2011, 10/05/2011, 09/05/2011, 06/05/2011, 05/05/2011, 04/05/2011, 03/05/2011, 29/04/2011, 28/04/2011, 27/04/2011, 26/04/2011, 25/04/2011, 21/04/2011, 20/04/2011, 19/04/2011, 18/04/2011, 15/04/2011, 14/04/2011, 13/04/2011, 12/04/2011, 11/04/2011, 08/04/2011, 07/04/2011, 06/04/2011, 05/04/2011, 04/04/2011, 01/04/2011, 31/03/2011, 30/03/2011, 29/03/2011, 28/03/2011, 25/03/2011, 24/03/2011, 23/03/2011, 22/03/2011, 21/03/2011, 18/03/2011, 17/03/2011, 16/03/2011, 15/03/2011, 14/03/2011, 11/03/2011, 10/03/2011, 09/03/2011, 08/03/2011, 07/03/2011, 04/03/2011, 03/03/2011, 02/03/2011, 01/03/2011, 28/02/2011, 25/02/2011, 24/02/2011, 23/02/2011, 22/02/2011, 21/02/2011, 18/02/2011, 17/02/2011, 16/02/2011, 15/02/2011, 14/02/2011, 11/02/2011, 10/02/2011, 09/02/2011, 08/02/2011, 07/02/2011, 02/02/2011, 01/02/2011, 31/01/2011, 28/01/2011, 27/01/2011, 26/01/2011, 25/01/2011, 24/01/2011, 21/01/2011,
Re: [R] difficulty in Formatting time series data
On Apr 22, 2012, at 2:18 PM, Raghuraman Ramachandran wrote: I also tried: test$Date=as.POSIXct(test$Date,format=%m%d%y) Well, as became apparent when you eventually offered an example, you have dates in dd/mm/ format, so it's hardly surprising that it didn't work with a format that didn't match your data. ?strptime ?as.Date test=cbind(test,day.of.week=format(test$Date,format=%A)) head(test) Date Open High Low Close Volume Adj.Close day.of.week 1 NA 2.33 2.34 2.31 2.31 5366000 2.31NA 2 NA 2.35 2.36 2.33 2.35 5382000 2.35NA 3 NA 2.35 2.38 2.34 2.36 9606000 2.36NA 4 NA 2.34 2.34 2.30 2.33 9596000 2.33NA 5 NA 2.32 2.35 2.31 2.31 5941000 2.31NA 6 NA 2.34 2.36 2.32 2.32 10332000 2.32 It didnt help. Thx Raghu On Sun, Apr 22, 2012 at 6:41 PM, Jeff Newmiller jdnew...@dcn.davis.ca.us wrote: On Sun, 22 Apr 2012, Hasan Diwan wrote: Raghu, On 22 April 2012 09:53, Raghuraman Ramachandran optionsra...@gmail.com wrote: I have a data frame (from CSV file) which has its first column called Date. The Date is in the format mm/dd/. I was trying to get the weekday for these dates and I tried using wday() and day.of.week() functions and both of them gave me precisely the wrong answers. I think the issue lies in the proper formatting of dates. The class of this column is a factor class and hence I tried converting into POSIXlt, xts, zoo objects and yet I could not get the weekday correctly. Anyone has any suggestions please? Try this: # assume dataIn is where the CSV files data is... dataIn$Date - as.POSIXct(dataIn$Date, format='%m/%d/%y') By far the most common error I see is failing to import the Date column as character, instead allowing the import function to convert it to factor, after which computations (such as the above suggestion) use the hidden factor index instead of the visible character representation, which further mystifies beginners. The conversion above will only work correctly if the column was imported as character. E.g. dataIn - read.csv( file=yourdatafile, as.is=TRUE ) OP: Use the str() function to see what types you are working with, and in future R-help queries send dput() of the data and code you have tried if we are to be able to reproduce your attempts effectively rather than reading your mind. dataIn - cbind(dataIn, day.of.week = format(dataIn$Date, format='%A') Why not just dataIn$day.of.week - weekdays( dataIn$Date ) ? --**--** --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] need advice on using excel to check data for import into R
I have created an S4 object type for conducting fire department data analysis. The object includes validity check that ensures certain fields are present and that duplicate records don't exist for certain combinations of columns (e.g. no duplicate incident number / incident data / unit ID ensures that the data does not show the same fire engine responding twice on the same call). I am finding that I spend a lot of time taking client data, converting it to my S4 object, and then sending it back to the client to correct data validity issues. I am trying to figure out a clever way to have excel (typically the program used by my clients) check client data prior to them submitting it to me. I have been working with somebody on trying to develop an excel toolbar add-in with limited success. My question is whether anybody can think of clever alternatives for clients to validate their data for example, is their a R excel plugin (that would be easily installed by a client) where I might be able write some lines of R to check the data and output messages or maybe some sort of server where they could upload their data and I could have some lines of R code that would check the code and send back potential error messages? I realize this is a fairly open ended question just looking for some general ideas and directions to go. Getting a little frustrated with spending most of my work time dealing with data cleaning issues guessing this is a problem shared by many of us that use R! Thanks, Markus [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] difficulty in Formatting time series data
Hello, SMALL, reproducible examples... Anyway, it's not that difficult. Try this d.of.w - as.integer(format(as.Date(test1$Date, format=%d/%m/%Y), %w)) str(d.of.w) head(d.of.w) Note that the format '%w' gives days in 0-6, where Sunday == 0. See ?strftime. Your Friday is therefore 5. (Or use d.of.w - d.of.w + 1) Hope this helps, Rui Barradas -- View this message in context: http://r.789695.n4.nabble.com/difficulty-in-Formatting-time-series-data-tp4578461p4578655.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Assignment problems
The text below is a part of, some work I have to do, which is due in 2 days and I am strung up with a lot of other stuff, so I was hoping someone would take 5 mins and help me ?? Here is a part of my data.frame: year country1 country2 contig comlangpop1gdp1 pop2 gdp2 rtadist avgflow 11992 AUS AUT 0 0 17.4950008 321708.281 7.7825189 194684.078 0 15608.4 1.075999e+02 21992 AUS BEL 0 0 17.4950008 321708.281 10.0450001 231762.094 0 16319.2 4.767162e+02 31992 AUS CAN 0 1 17.4950008 321708.281 28.5195980 570291.188 0 15391.1 7.456945e+02 41992 AUS CHE 0 0 17.4950008 321708.281 6.875 249471.422 0 16170.1 4.625214e+02 51992 AUS DEU 0 0 17.4950008 321708.281 80.6240005 2062141.500 0 15935.1 2.047573e+03 61992 AUS DNK 0 0 17.4950008 321708.281 5.171 150195.484 0 15725.5 1.453406e+02 71992 AUS ESP 0 0 17.4950008 321708.281 39.0677490 612585.250 0 17072.9 2.106880e+02 81992 AUS FIN 0 0 17.4950008 321708.281 5.0419998 109859.438 0 14849.5 2.025125e+02 91992 AUS FRA 0 0 17.4950008 321708.281 57.2422981 1371706.000 0 16513.0 1.070802e+03 10 1992 AUS GBR 0 1 17.4950008 321708.281 57.9023476 1071537.375 0 16602.3 2.279130e+03 11 1992 AUS GRC 0 0 17.4950008 321708.281 10.369 102022.352 0 14845.6 4.164985e+01 12 1992 AUS IRL 0 1 17.4950008 321708.281 3.549099954272.410 0 16895.0 1.076323e+02 13 1992 AUS ISL 0 0 17.4950008 321708.281 0.2611000 6976.168 0 16443.6 2.190602e+01 14 1992 AUS ITA 0 0 17.4950008 321708.281 56.7976494 1265800.125 0 15855.4 9.683720e+02 15 1992 AUS JPN 0 0 17.4950008 321708.281 124.2289963 3766884.000 0 7827.1 1.026065e+04 16 1992 AUS NLD 0 0 17.4950008 321708.281 15.1780005 348224.562 0 16227.5 6.510009e+02 17 1992 AUS NOR 0 0 17.4950008 321708.281 4.2863998 127170.328 0 15646.2 9.357240e+01 18 1992 AUS NZL 0 1 17.4950008 321708.281 3.531699940706.199 1 2736.4 2.267670e+03 19 1992 AUS PRT 0 0 17.4950008 321708.281 9.9630003 102890.258 0 17625.3 2.611476e+02 20 1992 AUS SWE 0 0 17.4950008 321708.281 8.6680002 264822.875 0 15385.4 4.653388e+02 there is 3400 observations. 3.1.1. Construct a dummy variable, EMU, that in any given year takes the value 1 if both countries are members of the EMU and 0 otherwise. How big a proportion of the observations are among EMU member countries? This problem is solved with: euro-c(AUT,BEL,DEU,ESP,FIN,FRA,GRC,IRL,ITA,NLD,PRT) countries-data.frame(country1,country2,stringsAsFactors=FALSE) data1-cbind(data,EMU=Reduce(``, lapply(countries, function(x) x %in% euro))) data1[EMU==TRUE,13] a-table(EMU) 3.1.2. Are the member and non-member country-pairs alike? What I need here is: I want to find the mean of avgflow, but only for the data where 2 countries are in the euro vector/if EMU=TRUE ? I have tried with: avgflowONLY-cbind(avgflow,EMU) NEWavgflow-rep(0,nrow(avgflowONLY)) for (i in 1:nrow(avgflowONLY)){if (EMU==1){NEWavgflow[i]-mean(avgflow[i])}} BUT it gives me: Warning messages: 1: In if (EMU == 1) { ... : the condition has length 1 and only the first element will be used etc. ??? -- View this message in context: http://r.789695.n4.nabble.com/Assignment-problems-tp4578672p4578672.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Solve an ordinary or generalized eigenvalue problem in R?
Thanks all (particularly to you, Berend) -- I'll push forward with these solutions and integrate them into my code. I did come across geigen while rooting around in the CCA code but its not formally documented (it just says for internal use or something along those lines) and as you found out above, it does not produce the same solution as the dggev. It would be nice to have a more complete set of formal packages for doing LA in R (rather than having to hand-write .Fortran calls) but I'll leave that to someone with more expertise in linear algebra than me. Something that perhaps matches the SciPy set of functions (both in terms of input and output): http://docs.scipy.org/doc/scipy/reference/linalg.html Some of these are already implemented, but clearly not all of them. --j On Sat, Apr 21, 2012 at 1:31 PM, Berend Hasselman b...@xs4all.nl wrote: On 21-04-2012, at 20:20, peter dalgaard wrote: The eigenvalues are identical upto the printed 9 digits but the eigenvectors appear to be quite different. Maybe this is what Luke meant. Berend They look quite similar to me: ev - eigen(solve(B,A) )$vectors ge - geigen(A, B, TRUE , TRUE) ev / ge$vl [,1] [,2] [,3] [1,] 0.9324603 0.813422 -0.7423694 [2,] 0.9324603 0.813422 -0.7423694 [3,] 0.9324603 0.813422 -0.7423694 ev / ge$vr [,1] [,2] [,3] [1,] 0.9324603 0.813422 -0.7423694 [2,] 0.9324603 0.813422 -0.7423694 [3,] 0.9324603 0.813422 -0.7423694 (and of course, eigenvectors of any sort are only defined up to a constant multiplier) Correct. I should have checked your way and not optically. Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jonathan A. Greenberg, PhD Assistant Professor Department of Geography and Geographic Information Science University of Illinois at Urbana-Champaign 607 South Mathews Avenue, MC 150 Urbana, IL 61801 Phone: 415-763-5476 AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007 http://www.geog.illinois.edu/people/JonathanGreenberg.html [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Solve an ordinary or generalized eigenvalue problem in R?
On 22-04-2012, at 21:08, Jonathan Greenberg wrote: Thanks all (particularly to you, Berend) -- I'll push forward with these solutions and integrate them into my code. I did come across geigen while rooting around in the CCA code but its not formally documented (it just says for internal use or something along those lines) and as you found out above, it does not produce the same solution as the dggev. It would be nice to have a more complete set of formal packages for doing LA in R (rather than having to hand-write .Fortran calls) but I'll leave that to someone with more expertise in linear algebra than me. Something that perhaps matches the SciPy set of functions (both in terms of input and output): http://docs.scipy.org/doc/scipy/reference/linalg.html Some of these are already implemented, but clearly not all of them. Package CCA has package fda as dependency. And package fda defines a function geigen. The first 14 lines of this function are geigen - function(Amat, Bmat, Cmat) { # solve the generalized eigenanalysis problem # #max {tr L'AM / sqrt[tr L'BL tr M'CM] w.r.t. L and M # # Arguments: # AMAT ... p by q matrix # BMAT ... order p symmetric positive definite matrix # CMAT ... order q symmetric positive definite matrix # Returns: # VALUES ... vector of length s = min(p,q) of eigenvalues # LMAT ... p by s matrix L # MMAT ... q by s matrix M It's not clear to me how it is used and exactly what it is doing and how that compares with Lapack. Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assignment problems
Look at ?ifelse, a combination of logical subscripting and mean(), or even better ?ave -- I can't say too much more; there's a no homework policy on this list and I recognize that first solution as mine already... (I should have noted that the first time) Michael On Apr 22, 2012, at 2:54 PM, phillip03 phillipbrig...@hotmail.com wrote: The text below is a part of, some work I have to do, which is due in 2 days and I am strung up with a lot of other stuff, so I was hoping someone would take 5 mins and help me ?? Here is a part of my data.frame: year country1 country2 contig comlangpop1gdp1 pop2 gdp2 rtadist avgflow 11992 AUS AUT 0 0 17.4950008 321708.281 7.7825189 194684.078 0 15608.4 1.075999e+02 21992 AUS BEL 0 0 17.4950008 321708.281 10.0450001 231762.094 0 16319.2 4.767162e+02 31992 AUS CAN 0 1 17.4950008 321708.281 28.5195980 570291.188 0 15391.1 7.456945e+02 41992 AUS CHE 0 0 17.4950008 321708.281 6.875 249471.422 0 16170.1 4.625214e+02 51992 AUS DEU 0 0 17.4950008 321708.281 80.6240005 2062141.500 0 15935.1 2.047573e+03 61992 AUS DNK 0 0 17.4950008 321708.281 5.171 150195.484 0 15725.5 1.453406e+02 71992 AUS ESP 0 0 17.4950008 321708.281 39.0677490 612585.250 0 17072.9 2.106880e+02 81992 AUS FIN 0 0 17.4950008 321708.281 5.0419998 109859.438 0 14849.5 2.025125e+02 91992 AUS FRA 0 0 17.4950008 321708.281 57.2422981 1371706.000 0 16513.0 1.070802e+03 10 1992 AUS GBR 0 1 17.4950008 321708.281 57.9023476 1071537.375 0 16602.3 2.279130e+03 11 1992 AUS GRC 0 0 17.4950008 321708.281 10.369 102022.352 0 14845.6 4.164985e+01 12 1992 AUS IRL 0 1 17.4950008 321708.281 3.549099954272.410 0 16895.0 1.076323e+02 13 1992 AUS ISL 0 0 17.4950008 321708.281 0.2611000 6976.168 0 16443.6 2.190602e+01 14 1992 AUS ITA 0 0 17.4950008 321708.281 56.7976494 1265800.125 0 15855.4 9.683720e+02 15 1992 AUS JPN 0 0 17.4950008 321708.281 124.2289963 3766884.000 0 7827.1 1.026065e+04 16 1992 AUS NLD 0 0 17.4950008 321708.281 15.1780005 348224.562 0 16227.5 6.510009e+02 17 1992 AUS NOR 0 0 17.4950008 321708.281 4.2863998 127170.328 0 15646.2 9.357240e+01 18 1992 AUS NZL 0 1 17.4950008 321708.281 3.531699940706.199 1 2736.4 2.267670e+03 19 1992 AUS PRT 0 0 17.4950008 321708.281 9.9630003 102890.258 0 17625.3 2.611476e+02 20 1992 AUS SWE 0 0 17.4950008 321708.281 8.6680002 264822.875 0 15385.4 4.653388e+02 there is 3400 observations. 3.1.1. Construct a dummy variable, EMU, that in any given year takes the value 1 if both countries are members of the EMU and 0 otherwise. How big a proportion of the observations are among EMU member countries? This problem is solved with: euro-c(AUT,BEL,DEU,ESP,FIN,FRA,GRC,IRL,ITA,NLD,PRT) countries-data.frame(country1,country2,stringsAsFactors=FALSE) data1-cbind(data,EMU=Reduce(``, lapply(countries, function(x) x %in% euro))) data1[EMU==TRUE,13] a-table(EMU) 3.1.2. Are the member and non-member country-pairs alike? What I need here is: I want to find the mean of avgflow, but only for the data where 2 countries are in the euro vector/if EMU=TRUE ? I have tried with: avgflowONLY-cbind(avgflow,EMU) NEWavgflow-rep(0,nrow(avgflowONLY)) for (i in 1:nrow(avgflowONLY)){if (EMU==1){NEWavgflow[i]-mean(avgflow[i])}} BUT it gives me: Warning messages: 1: In if (EMU == 1) { ... : the condition has length 1 and only the first element will be used etc. ??? -- View this message in context: http://r.789695.n4.nabble.com/Assignment-problems-tp4578672p4578672.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assignment problems
Hello, phillip03 wrote The text below is a part of, some work I have to do, which is due in 2 days and I am strung up with a lot of other stuff, so I was hoping someone would take 5 mins and help me ?? Here is a part of my data.frame: year country1 country2 contig comlangpop1gdp1 pop2 gdp2 rtadist avgflow 11992 AUS AUT 0 0 17.4950008 321708.281 7.7825189 194684.078 0 15608.4 1.075999e+02 21992 AUS BEL 0 0 17.4950008 321708.281 10.0450001 231762.094 0 16319.2 4.767162e+02 31992 AUS CAN 0 1 17.4950008 321708.281 28.5195980 570291.188 0 15391.1 7.456945e+02 41992 AUS CHE 0 0 17.4950008 321708.281 6.875 249471.422 0 16170.1 4.625214e+02 51992 AUS DEU 0 0 17.4950008 321708.281 80.6240005 2062141.500 0 15935.1 2.047573e+03 61992 AUS DNK 0 0 17.4950008 321708.281 5.171 150195.484 0 15725.5 1.453406e+02 71992 AUS ESP 0 0 17.4950008 321708.281 39.0677490 612585.250 0 17072.9 2.106880e+02 81992 AUS FIN 0 0 17.4950008 321708.281 5.0419998 109859.438 0 14849.5 2.025125e+02 91992 AUS FRA 0 0 17.4950008 321708.281 57.2422981 1371706.000 0 16513.0 1.070802e+03 10 1992 AUS GBR 0 1 17.4950008 321708.281 57.9023476 1071537.375 0 16602.3 2.279130e+03 11 1992 AUS GRC 0 0 17.4950008 321708.281 10.369 102022.352 0 14845.6 4.164985e+01 12 1992 AUS IRL 0 1 17.4950008 321708.281 3.549099954272.410 0 16895.0 1.076323e+02 13 1992 AUS ISL 0 0 17.4950008 321708.281 0.2611000 6976.168 0 16443.6 2.190602e+01 14 1992 AUS ITA 0 0 17.4950008 321708.281 56.7976494 1265800.125 0 15855.4 9.683720e+02 15 1992 AUS JPN 0 0 17.4950008 321708.281 124.2289963 3766884.000 0 7827.1 1.026065e+04 16 1992 AUS NLD 0 0 17.4950008 321708.281 15.1780005 348224.562 0 16227.5 6.510009e+02 17 1992 AUS NOR 0 0 17.4950008 321708.281 4.2863998 127170.328 0 15646.2 9.357240e+01 18 1992 AUS NZL 0 1 17.4950008 321708.281 3.531699940706.199 1 2736.4 2.267670e+03 19 1992 AUS PRT 0 0 17.4950008 321708.281 9.9630003 102890.258 0 17625.3 2.611476e+02 20 1992 AUS SWE 0 0 17.4950008 321708.281 8.6680002 264822.875 0 15385.4 4.653388e+02 there is 3400 observations. 3.1.1. Construct a dummy variable, EMU, that in any given year takes the value 1 if both countries are members of the EMU and 0 otherwise. How big a proportion of the observations are among EMU member countries? This problem is solved with: euro-c(AUT,BEL,DEU,ESP,FIN,FRA,GRC,IRL,ITA,NLD,PRT) countries-data.frame(country1,country2,stringsAsFactors=FALSE) data1-cbind(data,EMU=Reduce(``, lapply(countries, function(x) x %in% euro))) data1[EMU==TRUE,13] a-table(EMU) 3.1.2. Are the member and non-member country-pairs alike? What I need here is: I want to find the mean of avgflow, but only for the data where 2 countries are in the euro vector/if EMU=TRUE ? I have tried with: avgflowONLY-cbind(avgflow,EMU) NEWavgflow-rep(0,nrow(avgflowONLY)) for (i in 1:nrow(avgflowONLY)){if (EMU==1){NEWavgflow[i]-mean(avgflow[i])}} BUT it gives me: Warning messages: 1: In if (EMU == 1) { ... : the condition has length 1 and only the first element will be used etc. ??? You're forgeting the index in the conditon, EMU[i] == 1. Note that since EMU is a logical vector, you don't need the explicit comparison. If you just want the mean of avgflow where EMU == TRUE, this is much simpler, but returns one value, not a vector. mean(avgflow[ EMU ]) Hope this helps, Rui Barradas -- View this message in context: http://r.789695.n4.nabble.com/Assignment-problems-tp4578672p4578739.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] need advice on using excel to check data for import into R
This looks like a perfect case for an RExcel solution. RExcel is an addin that allows you, among other things, to place an arbitrary R function inside the Excel automatic recalculation mode. For details see rcom.univie.ac.at There are many references item listed on the wiki page in the left panel. For further followup, please sign up for the rcom mailing list, again with the details on the web site. Rich On Sun, Apr 22, 2012 at 2:34 PM, Markus Weisner r...@themarkus.com wrote: I have created an S4 object type for conducting fire department data analysis. The object includes validity check that ensures certain fields are present and that duplicate records don't exist for certain combinations of columns (e.g. no duplicate incident number / incident data / unit ID ensures that the data does not show the same fire engine responding twice on the same call). I am finding that I spend a lot of time taking client data, converting it to my S4 object, and then sending it back to the client to correct data validity issues. I am trying to figure out a clever way to have excel (typically the program used by my clients) check client data prior to them submitting it to me. I have been working with somebody on trying to develop an excel toolbar add-in with limited success. My question is whether anybody can think of clever alternatives for clients to validate their data for example, is their a R excel plugin (that would be easily installed by a client) where I might be able write some lines of R to check the data and output messages or maybe some sort of server where they could upload their data and I could have some lines of R code that would check the code and send back potential error messages? I realize this is a fairly open ended question just looking for some general ideas and directions to go. Getting a little frustrated with spending most of my work time dealing with data cleaning issues guessing this is a problem shared by many of us that use R! Thanks, Markus [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to take ID of number 7.
On Sun, Apr 22, 2012 at 7:03 AM, Yellow s1010...@student.hsleiden.nl wrote: I figured out something new that I would like to see if I can do this more easy with R then Excel. I have these huge files with data. For example: DataFile.csv ID Name log2 1 Fantasy 5.651 2 New 7.60518 3 Finding 8.9532 4 Looeka -0.248652 5 Vani 0.3548 With like header1: ID, header 2: Name, header 3: log2 Now I need to get the $ID out who have a log2 value higher then 7. I know ho to grab the $log2 values with 7+ numbers. Log2HigherSeven = DataFile$log2 [ DataFile$log2 = 7] But how can I take thise ID numbers also? Seems like there were already a few suggestions in this thread, but I'm surprised no one has suggested the use of `subset` yet, see ?subset: R interesting - subset(DataFile, log2 = 7)$ID Now play with the `interesting` data.frame to get the data you need -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] using a loop with an integration
Hi, all. I've written a function that returns the survival function for a Gompertz mortality model. I've specified the two model parameters. Using a simple integration, I can calculate the life expectancy at any age. Is there a way I can use a loop with the integration that will quickly return life expectancy over a range of ages, say 0 to 80, so that I don't have to manually type in the age in which I'm interested? Please see the code below. Thanks so much. --Trey hk.bothsex_Gompsurv - function (t) { x=c(0.02342671, 0.05837508) a3-x[1] b3-x[2] shift-15 S.t-exp(a3/b3*(1-exp(b3*(t-shift return-S.t } integrate(hk.bothsex_Gompsurv,0,Inf)$value/hk.bothsex_Gompsurv(0) # life expectancy at birth (change lower limit of integral and corresponding t in denominator to calculate life expectancy at any age - Trey Batey---Anthropology Instructor Division of Social Sciences Mt. Hood Community College Gresham, OR 97030 Alt. Email: trey.batey[at]mhcc[dot]edu -- View this message in context: http://r.789695.n4.nabble.com/using-a-loop-with-an-integration-tp4578752p4578752.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assignment problems
I have tried ifelse: trade-data.frame(avgflow,EMU,stringsAsFactors=FALSE) avgflowEURO-rep(0,nrow(trade)) trade1-(for (i in 1:nrow(trade)){ifelse(EMU[i]==1,avgflowEURO[i]-avgflow[i],NA)}) -- View this message in context: http://r.789695.n4.nabble.com/Assignment-problems-tp4578672p4578754.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assignment problems
Does mean(avgflow[EMU]) sum the avgflows for all countrypairs where EMU[i]==TRUE and take the mean ? Practical question: is mean(avgflow[EMU]) = mean(avgflow[EMU==TRUE]) ??? -- View this message in context: http://r.789695.n4.nabble.com/Assignment-problems-tp4578672p4578761.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assignment problems
phillip03 wrote Does mean(avgflow[EMU]) sum the avgflows for all countrypairs where EMU[i]==TRUE and take the mean ? Practical question: is mean(avgflow[EMU]) = mean(avgflow[EMU==TRUE]) ??? Answer: yes. Rui Barradas -- View this message in context: http://r.789695.n4.nabble.com/Assignment-problems-tp4578672p4578772.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] need advice on using excel to check data for import into R
If I go to wiki - how to install it looks like a rather complicated installation that involves installing R followed by several command line prompts. It looks like it might be too much of an installation process to make sense for a client to conduct a one-time data check. Looks like a great tool though. Is there a simpler way of deploying Rexcel that I am not seeing? Thanks, Markus On Sun, Apr 22, 2012 at 3:43 PM, Richard M. Heiberger r...@temple.eduwrote: This looks like a perfect case for an RExcel solution. RExcel is an addin that allows you, among other things, to place an arbitrary R function inside the Excel automatic recalculation mode. For details see rcom.univie.ac.at There are many references item listed on the wiki page in the left panel. For further followup, please sign up for the rcom mailing list, again with the details on the web site. Rich On Sun, Apr 22, 2012 at 2:34 PM, Markus Weisner r...@themarkus.com wrote: I have created an S4 object type for conducting fire department data analysis. The object includes validity check that ensures certain fields are present and that duplicate records don't exist for certain combinations of columns (e.g. no duplicate incident number / incident data / unit ID ensures that the data does not show the same fire engine responding twice on the same call). I am finding that I spend a lot of time taking client data, converting it to my S4 object, and then sending it back to the client to correct data validity issues. I am trying to figure out a clever way to have excel (typically the program used by my clients) check client data prior to them submitting it to me. I have been working with somebody on trying to develop an excel toolbar add-in with limited success. My question is whether anybody can think of clever alternatives for clients to validate their data for example, is their a R excel plugin (that would be easily installed by a client) where I might be able write some lines of R to check the data and output messages or maybe some sort of server where they could upload their data and I could have some lines of R code that would check the code and send back potential error messages? I realize this is a fairly open ended question just looking for some general ideas and directions to go. Getting a little frustrated with spending most of my work time dealing with data cleaning issues guessing this is a problem shared by many of us that use R! Thanks, Markus [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using a loop with an integration
On Apr 22, 2012, at 3:41 PM, piltdownpunk wrote: Hi, all. I've written a function that returns the survival function for a Gompertz mortality model. I've specified the two model parameters. Using a simple integration, I can calculate the life expectancy at any age. Is there a way I can use a loop with the integration that will quickly return life expectancy over a range of ages, say 0 to 80, so that I don't have to manually type in the age in which I'm interested? Please see the code below. ?Vectorize (Essentially a wrapper to mapply. No data example so no tested code.) -- David. --Trey hk.bothsex_Gompsurv - function (t) { x=c(0.02342671, 0.05837508) a3-x[1] b3-x[2] shift-15 S.t-exp(a3/b3*(1-exp(b3*(t-shift return-S.t } integrate(hk.bothsex_Gompsurv,0,Inf)$value/hk.bothsex_Gompsurv(0) # life expectancy at birth (change lower limit of integral and corresponding t in denominator to calculate life expectancy at any age - Trey Batey---Anthropology Instructor Division of Social Sciences Mt. Hood Community College Gresham, OR 97030 Alt. Email: trey.batey[at]mhcc[dot]edu -- View this message in context: http://r.789695.n4.nabble.com/using-a-loop-with-an-integration-tp4578752p4578752.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Issue with message()
Dear List, I built a package under both Mac and Win 7 (both on R 2.12.0) . One of the functions in the package is set up to print a status message using the code below: if (verbose) if ((i %% 10) == 0 i ntree) message( , i, out of, ntree, trees so far...) This works perfectly on the Mac. However, on Win 7 the message is not printed while the function is executing, but all when it finished running. Any hint what might be the issue? Thanks, Axel. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Issue with message()
On 23/04/12 09:36, Axel Urbiz wrote: Dear List, I built a package under both Mac and Win 7 (both on R 2.12.0) . One of the functions in the package is set up to print a status message using the code below: if (verbose) if ((i %% 10) == 0 i ntree) message( , i, out of, ntree, trees so far...) This works perfectly on the Mac. However, on Win 7 the message is not printed while the function is executing, but all when it finished running. Any hint what might be the issue? This has something to do with the Windoze system not (by default) flushing the buffer. This behaviour can be changed, but I forget the details. I don't use Windoze. A bit of searching/googling should lead you fairly quickly to the appropriate procedure for re-setting the behaviour. HTH cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assignment problems
Thank you Rui Can you help me with my ifelse problem - I would like to add a list to my data.frame where avgflow in those rows where ONLY my country pair both are in euro -- View this message in context: http://r.789695.n4.nabble.com/Assignment-problems-tp4578672p4578806.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] CRAN (and crantastic) updates this week
CRAN (and crantastic) updates this week New packages * appell (0.0-3) Maintainer: Daniel Sabanes Bove Author(s): Daniel Sabanes Bove daniel.sabanesb...@ifspm.uzh.ch with contributions by F. D. Colavecchia, R. C. Forrey, G. Gasaneo, N. L. J. Michel, L. F. Shampine, M. V. Stoitsov and H. A. Watts. License: GPL (= 3) http://crantastic.org/packages/appell This package wraps Fortran code by F. D. Colavecchia and G. Gasaneo for computing the Appell's F1 hypergeometric function. Their program uses Fortran code by L. F. Shampine and H. A. Watts. Moreover, the hypergeometric function with complex arguments is computed with Fortran code by N. L. J. Michel and M. V. Stoitsov or with Fortran code by R. C. Forrey. See the function documentations for the references and please cite them accordingly. * bayesPop (0.2-2) Maintainer: Hana Sevcikova Author(s): Hana Sevcikova, Adrian Raftery License: GPL (= 2) http://crantastic.org/packages/bayesPop The package allows to generate population projections for all countries of the world using several probabilistic components, such as total fertility rate (TFR) and life expectancy. * cec2005benchmark (1.0.0) Maintainer: Yasser González-Fernández Author(s): Yasser González-Fernández ygonzalezfernan...@gmail.com and Marta Soto mr...@icimaf.cu License: GPL (= 3) http://crantastic.org/packages/cec2005benchmark This package is a wrapper for the C implementation of the 25 benchmark functions for the CEC 2005 Special Session on Real-Parameter Optimization. The original C code by Santosh Tiwari and related documentation are available at http://www.ntu.edu.sg/home/EPNSugan/. * compound.Cox (1.0) Maintainer: Takeshi Emura, Graduate Institute of Statistics, National Central University, Taiwan Author(s): Takeshi Emura Yi-Hau Chen License: GPL-2 http://crantastic.org/packages/compound-Cox Calculate regression coefficients and their standard errors under the Cox proportional hazard model with the large number of covariates. * dgmb (1.0) Maintainer: Alba Martinez-Ruiz Author(s): Alba Martinez-Ruiz amart...@ucsc.cl and Claudia Martinez-Araneda cmarti...@ucsc.cl License: GPL (= 2) http://crantastic.org/packages/dgmb Random data generation for PLS structural models. * diffEq (1.0) Maintainer: Karline Soetaert Author(s): Karline Soetaert karline.soeta...@nioz.nl License: GPL http://crantastic.org/packages/diffEq Functions and examples from the book Solving Differential Equations in R by Karline Soetaert, Jeff R Cash and Francesca Mazzia. Springer, 2012. * dkDNA (0.1.0) Maintainer: Gota Morota Author(s): Gota Morota and Masanori Koyama License: GPL-2 http://crantastic.org/packages/dkDNA Compute diffusion kernels on DNA polymorphisms, including SNP and bi-allelic genotypes. * frmqa (0.1-0) Maintainer: Thanh T. Tran Author(s): Thanh T. Tran License: GPL (= 2) http://crantastic.org/packages/frmqa R and C++ functions for financial risk management and quantative analysis, using the generalized perbolic and its related distributions. * geospt (0.4-9) Maintainer: Alí Santacruz Author(s): Carlos Melo cm...@udistrital.edu.co, Alí Santacruz, Oscar Melo oome...@unal.edu.co and others License: GPL (= 2) http://crantastic.org/packages/geospt This package contains functions for: estimation of the variogram through trimmed mean, radial basis functions (optimization, prediction and cross-validation), summary statistics from cross-validation, pocket plot, and design of optimal sampling networks through sequential and simultaneous points methods * JohnsonDistribution (0.24) Maintainer: A.I. McLeod Author(s): A.I. McLeod and Leanna King License: GPL (= 2) http://crantastic.org/packages/JohnsonDistribution Johnson curve distributions. Implementation of AS100 and AS99. * labeledLoop (0.1) Maintainer: Kohske Takahashi Author(s): Kohske Takahashi License: MIT http://crantastic.org/packages/labeledLoop Support labeled loop and escape from nested loop * logitnorm (0.8.26) Maintainer: Thomas Wutzler Author(s): Thomas Wutzler License: GPL-2 http://crantastic.org/packages/logitnorm Density, distribution, quantile and random generation function for the logitnormal distribution. Estimation of the mode and the first two moments. Estimation of distribution parameters. * LOST (1.0) Maintainer: Jessica Arbour Author(s): J. Arbour and C. Brown License: GPL (= 2) http://crantastic.org/packages/LOST LOST includes functions for simulating missing morphometric data randomly, with taxonomic bias and with anatomical bias. This package also includes functions for estimating missing morphometric data based on regression. * NHPoisson (1.0) Maintainer: Ana C. Cebrian Author(s): Ana C. Cebrian License: GPL (= 2)
Re: [R] Issue with message()
On 12-04-22 5:36 PM, Axel Urbiz wrote: Dear List, I built a package under both Mac and Win 7 (both on R 2.12.0) . One of the functions in the package is set up to print a status message using the code below: if (verbose) if ((i %% 10) == 0 i ntree) message( , i, out of, ntree, trees so far...) This works perfectly on the Mac. However, on Win 7 the message is not printed while the function is executing, but all when it finished running. Any hint what might be the issue? Buffered output. Use Ctrl-W or menu item Misc|Buffered output to change it. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PCA sensitive to outliers?
I actually tried robustPca in pcaMethods on bioconductor. It keeps giving me the warning Input data is not complete... Reading into the function: When there is no NAs, it will give this warning... It seems that there is a bug in this code... Is it reliable at all? - robustPcafunction (Matrix, nPcs = 2, verbose = interactive(), ...) { nas - is.na(Matrix) if (!any(nas) verbose) { cat(Input data is not complete.\n) cat(Scores, R2 and R2cum may be inaccurate, handle with care\n) } On Fri, Apr 20, 2012 at 9:58 AM, Kevin Wright kw.s...@gmail.com wrote: You can also have a look at the pcaMethods package on Bioconductor. Kevin On Thu, Apr 19, 2012 at 11:20 PM, Michael comtech@gmail.com wrote: Hi all, I found that the PCA gave chaotic results when there are big changes in a few data points. Are there improved versions of PCA in R that can help with this problem? Please give me some pointers... Thank you! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Kevin Wright [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PCA sensitive to outliers?
Any thoughts on this error in robustSVD? Thanks a lot! Error in if (!all(tmp)) { : missing value where TRUE/FALSE needed Enter a frame number, or 0 to exit 1: #73: pca(dTmp, method = robustPca, nPcs = nNumFactors, center = FALSE) 2: robustPca(prepres$data, nPcs = nPcs, ...) 3: robustSvd(Matrix) 4: apply(x, 1, L1RegCoef, bk) 5: FUN(newX[, i], ...) 6: weightedMedian(x[keep]/a, abs(a), interpolate = FALSE) 7: weightedMedian.default(x[keep]/a, abs(a), interpolate = FALSE) On Sun, Apr 22, 2012 at 6:43 PM, Michael comtech@gmail.com wrote: I actually tried robustPca in pcaMethods on bioconductor. It keeps giving me the warning Input data is not complete... Reading into the function: When there is no NAs, it will give this warning... It seems that there is a bug in this code... Is it reliable at all? - robustPcafunction (Matrix, nPcs = 2, verbose = interactive(), ...) { nas - is.na(Matrix) if (!any(nas) verbose) { cat(Input data is not complete.\n) cat(Scores, R2 and R2cum may be inaccurate, handle with care\n) } On Fri, Apr 20, 2012 at 9:58 AM, Kevin Wright kw.s...@gmail.com wrote: You can also have a look at the pcaMethods package on Bioconductor. Kevin On Thu, Apr 19, 2012 at 11:20 PM, Michael comtech@gmail.com wrote: Hi all, I found that the PCA gave chaotic results when there are big changes in a few data points. Are there improved versions of PCA in R that can help with this problem? Please give me some pointers... Thank you! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Kevin Wright [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PCA sensitive to outliers?
On Sun, Apr 22, 2012 at 4:43 PM, Michael comtech@gmail.com wrote: I actually tried robustPca in pcaMethods on bioconductor. It keeps giving me the warning Input data is not complete... Reading into the function: When there is no NAs, it will give this warning... It seems that there is a bug in this code... Is it reliable at all? - robustPcafunction (Matrix, nPcs = 2, verbose = interactive(), ...) { nas - is.na(Matrix) if (!any(nas) verbose) { cat(Input data is not complete.\n) cat(Scores, R2 and R2cum may be inaccurate, handle with care\n) } that seems to issue the notes when there are *not any missing* and verbose is TRUE. I would submit a bug report to the author. On Fri, Apr 20, 2012 at 9:58 AM, Kevin Wright kw.s...@gmail.com wrote: You can also have a look at the pcaMethods package on Bioconductor. Kevin On Thu, Apr 19, 2012 at 11:20 PM, Michael comtech@gmail.com wrote: Hi all, I found that the PCA gave chaotic results when there are big changes in a few data points. Are there improved versions of PCA in R that can help with this problem? Please give me some pointers... Thank you! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Kevin Wright [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PCA sensitive to outliers?
Even in R, there are so many of robust PCA... any survey or review of all these different methods? On Sun, Apr 22, 2012 at 6:58 PM, Joshua Wiley jwiley.ps...@gmail.comwrote: On Sun, Apr 22, 2012 at 4:43 PM, Michael comtech@gmail.com wrote: I actually tried robustPca in pcaMethods on bioconductor. It keeps giving me the warning Input data is not complete... Reading into the function: When there is no NAs, it will give this warning... It seems that there is a bug in this code... Is it reliable at all? - robustPcafunction (Matrix, nPcs = 2, verbose = interactive(), ...) { nas - is.na(Matrix) if (!any(nas) verbose) { cat(Input data is not complete.\n) cat(Scores, R2 and R2cum may be inaccurate, handle with care\n) } that seems to issue the notes when there are *not any missing* and verbose is TRUE. I would submit a bug report to the author. On Fri, Apr 20, 2012 at 9:58 AM, Kevin Wright kw.s...@gmail.com wrote: You can also have a look at the pcaMethods package on Bioconductor. Kevin On Thu, Apr 19, 2012 at 11:20 PM, Michael comtech@gmail.com wrote: Hi all, I found that the PCA gave chaotic results when there are big changes in a few data points. Are there improved versions of PCA in R that can help with this problem? Please give me some pointers... Thank you! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html http://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Kevin Wright [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PCA sensitive to outliers?
As I believe I already told you, look at the CRAN Robust task view. -- Bert On Sun, Apr 22, 2012 at 6:29 PM, Michael comtech@gmail.com wrote: Even in R, there are so many of robust PCA... any survey or review of all these different methods? On Sun, Apr 22, 2012 at 6:58 PM, Joshua Wiley jwiley.ps...@gmail.comwrote: On Sun, Apr 22, 2012 at 4:43 PM, Michael comtech@gmail.com wrote: I actually tried robustPca in pcaMethods on bioconductor. It keeps giving me the warning Input data is not complete... Reading into the function: When there is no NAs, it will give this warning... It seems that there is a bug in this code... Is it reliable at all? - robustPcafunction (Matrix, nPcs = 2, verbose = interactive(), ...) { nas - is.na(Matrix) if (!any(nas) verbose) { cat(Input data is not complete.\n) cat(Scores, R2 and R2cum may be inaccurate, handle with care\n) } that seems to issue the notes when there are *not any missing* and verbose is TRUE. I would submit a bug report to the author. On Fri, Apr 20, 2012 at 9:58 AM, Kevin Wright kw.s...@gmail.com wrote: You can also have a look at the pcaMethods package on Bioconductor. Kevin On Thu, Apr 19, 2012 at 11:20 PM, Michael comtech@gmail.com wrote: Hi all, I found that the PCA gave chaotic results when there are big changes in a few data points. Are there improved versions of PCA in R that can help with this problem? Please give me some pointers... Thank you! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html http://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Kevin Wright [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] linear model benchmarking
I cleaned up my old benchmarking code and added checks for missing data to compare various ways of finding OLS regression coefficients. I thought I would share this for others. the long and short of it is that I would recommend ols.crossprod = function (y, x) { x - as.matrix(x) ok - (!is.na(y))(!is.na(rowSums(x))) y - y[ok]; x - subset(x, ok) x - cbind( 1, x) XtX - crossprod(x) Xty - crossprod(x, y) solve(XtX, Xty) } for fast and stable coefficients. (yes, stable using double precision, even though not as stable as lm(). it works just fine with X variables that have 99.99% correlation with as few as 100 observations. if your situation is worse than this, you probably have an error in your data---or you are looking for the Higgs Boson.) I added the code below. feel free to ignore. /iaw ### ### code to test alternatives how fast OLS coefficients can be obtained. ### including tests to exclude missing observations where necessary. ### ### for a more informed article (and person), see Bates, 'Least Squares ### Calculations in R', Rnews 2004-1. the code here does not test his sparse ### matrix examples, or his geMatrix/poMatrix examples. ### ### Basic Results: for the examples that I tried, typical relative time ### factors of the algorithms were about ### ### lm lmfit solve crossprod cholesky (special-case 2vars) ### 1.0 0.5 0.30.15 0.17 0.1 ### ### there was no advantage to cholesky, so you may as well use the simpler ### crossprod. ### ### I was also interested in algorithm scaling N and K. yes, there were ### some changes in the factors across algorithms, but the general pattern ### wasn't too different. for the cholesky decomposition, ### ###N=1000 N=1 N=10 N=20 ###K=1 1.0 780 160 ###K=10 2.5 26 ###K=50 16 ###K=1004370 ###K=200 140 ### ### some of this may well be swap/memory access, etc. roughly speaking, we ### scale ten-times N takes twice as long. 10 and ten-times K takes 25 times ### as long. ### ### of course, ols.crossprod and ols.cholesky are not as stable as ols.lm, ### but they are still amazingly stable, given the default double precision. ### even with a correlation of 0.99(!) between the two final columns, ### they still produce exactly the same result as ols.lm with 1000 ### observations. frankly, the ill-conditioning worry is overblown with ### most real-world data. if you really have data THIS bad, you should ### already know it; and you probably just have some measurement errors in ### your observations, and your regression is giving you garbage either way. ### ### if I made the R core decision, I would switch away from lm()'s default ### method, and make it a special option. my guess is that it is status-quo ### bias that keeps the current method. or, at least I would say loudly in ### the R docs that for common use, here is a much faster method... ### MC - 100 N - 1000 K - 10 SD - 1e-3 ols - list( ols.lm = function (y, x) { coef(lm(y ~ x)) }, ols.lmfit = function (y, x) { x - as.matrix(x) ok - (!is.na(y))(!is.na(rowSums(x))) y - y[ok]; x - subset(x, ok) x - as.matrix(cbind( 1, x)) lm.fit(x, y)$coefficients }, ols.solve = function (y, x) { x - as.matrix(x) ok - (!is.na(y))(!is.na(rowSums(x))) y - y[ok]; x - subset(x, ok) x - cbind(1, x) xy - t(x)%*%y xxi - solve(t(x)%*%x) b - as.vector(xxi%*%xy) b }, ols.crossprod = function (y, x) { x - as.matrix(x) ok - (!is.na(y))(!is.na(rowSums(x))) y - y[ok]; x - subset(x, ok) x - cbind( 1, x) XtX - crossprod(x) Xty - crossprod(x, y) solve(XtX, Xty) }, ols.cholesky = function (y, x) { x - as.matrix(x) ok - (!is.na(y))(!is.na(rowSums(x))) y - y[ok]; x - subset(x, ok) x - cbind( 1, x) ch - chol( crossprod(x) ) backsolve(ch, forwardsolve(ch, crossprod(x,y), upper=TRUE, trans=TRUE)) } ) set.seed(0) y - matrix(rnorm(N*MC), N, MC) x - array(rnorm(MC*K*N), c(N, K, MC)) cat(N=, N, K=, K, (MC=, MC, )) if (K1) { sum.cor - 0 for (mc in 1:MC) { x[,K,mc] - x[,K-1,mc]+rnorm(N, sd=SD) sum.cor - sum.cor + cor(x[,K,mc], x[,K-1,mc], use=pair) } options(digit=10) cat( sd=, SD, The bad corr= 1+, sum.cor/MC-1 ) } else { ols$ols.xy.Kis1 - function(y, x) {
[R] Scrape data from Scopus: login through R?
Hello, The Scopus bibliographic database allows one to manually download batches of 2000 publications. The data is rich but does not provide one with a field containing the author id. However, author id's can be retrieved through the hyperlinks on the Scopus website. I have two questions: 1. My institution has a Scopus license, so I need to login. How do I do that in R (through Rcurl, XML?)? 2. How do I scrape hyperlinks? Your help is appreciated. Thanks Math -- View this message in context: http://r.789695.n4.nabble.com/Scrape-data-from-Scopus-login-through-R-tp4579261p4579261.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ROCR for combination of markers
Hi Eik or other who might help: I got this error: Error in roc.formula(form = y1 ~ x + z, plot = ROC) : Invalid formula: exactly 1 predictor is required in a formula of type response~predictor. when I ran out=ROC( form = y1 ~ x + z, plot=ROC) from your code. How to fix it? Thanks. -- View this message in context: http://r.789695.n4.nabble.com/ROCR-for-combination-of-markers-tp3480010p4579092.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] more boa plots questions
boa.plot('trace') -- View this message in context: http://r.789695.n4.nabble.com/more-boa-plots-questions-tp3330312p4579163.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PCA sensitive to outliers?
yes, but that is not a good Review or Survey... thx On Sun, Apr 22, 2012 at 9:47 PM, Bert Gunter gunter.ber...@gene.com wrote: As I believe I already told you, look at the CRAN Robust task view. -- Bert On Sun, Apr 22, 2012 at 6:29 PM, Michael comtech@gmail.com wrote: Even in R, there are so many of robust PCA... any survey or review of all these different methods? On Sun, Apr 22, 2012 at 6:58 PM, Joshua Wiley jwiley.ps...@gmail.com wrote: On Sun, Apr 22, 2012 at 4:43 PM, Michael comtech@gmail.com wrote: I actually tried robustPca in pcaMethods on bioconductor. It keeps giving me the warning Input data is not complete... Reading into the function: When there is no NAs, it will give this warning... It seems that there is a bug in this code... Is it reliable at all? - robustPcafunction (Matrix, nPcs = 2, verbose = interactive(), ...) { nas - is.na(Matrix) if (!any(nas) verbose) { cat(Input data is not complete.\n) cat(Scores, R2 and R2cum may be inaccurate, handle with care\n) } that seems to issue the notes when there are *not any missing* and verbose is TRUE. I would submit a bug report to the author. On Fri, Apr 20, 2012 at 9:58 AM, Kevin Wright kw.s...@gmail.com wrote: You can also have a look at the pcaMethods package on Bioconductor. Kevin On Thu, Apr 19, 2012 at 11:20 PM, Michael comtech@gmail.com wrote: Hi all, I found that the PCA gave chaotic results when there are big changes in a few data points. Are there improved versions of PCA in R that can help with this problem? Please give me some pointers... Thank you! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html http://www.r-project.org/posting-guide.html http://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Kevin Wright [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html http://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PCA sensitive to outliers?
On Mon, Apr 23, 2012 at 12:01 AM, Michael comtech@gmail.com wrote: yes, but that is not a good Review or Survey... thx But the packages listed there do have their own documentation and vignettes. For instance the rrcov package seems to have a nice vignette about its design as well as methods it implements, and references to these methods for further reading: http://cran.r-project.org/web/packages/rrcov/vignettes/rrcov.pdf You'll see at least a few mentions of PCA, which will lead you to other package/papers/etc. Enjoy, -steve On Sun, Apr 22, 2012 at 9:47 PM, Bert Gunter gunter.ber...@gene.com wrote: As I believe I already told you, look at the CRAN Robust task view. -- Bert On Sun, Apr 22, 2012 at 6:29 PM, Michael comtech@gmail.com wrote: Even in R, there are so many of robust PCA... any survey or review of all these different methods? On Sun, Apr 22, 2012 at 6:58 PM, Joshua Wiley jwiley.ps...@gmail.com wrote: On Sun, Apr 22, 2012 at 4:43 PM, Michael comtech@gmail.com wrote: I actually tried robustPca in pcaMethods on bioconductor. It keeps giving me the warning Input data is not complete... Reading into the function: When there is no NAs, it will give this warning... It seems that there is a bug in this code... Is it reliable at all? - robustPcafunction (Matrix, nPcs = 2, verbose = interactive(), ...) { nas - is.na(Matrix) if (!any(nas) verbose) { cat(Input data is not complete.\n) cat(Scores, R2 and R2cum may be inaccurate, handle with care\n) } that seems to issue the notes when there are *not any missing* and verbose is TRUE. I would submit a bug report to the author. On Fri, Apr 20, 2012 at 9:58 AM, Kevin Wright kw.s...@gmail.com wrote: You can also have a look at the pcaMethods package on Bioconductor. Kevin On Thu, Apr 19, 2012 at 11:20 PM, Michael comtech@gmail.com wrote: Hi all, I found that the PCA gave chaotic results when there are big changes in a few data points. Are there improved versions of PCA in R that can help with this problem? Please give me some pointers... Thank you! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html http://www.r-project.org/posting-guide.html http://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Kevin Wright [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html http://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] slanted stacked bar graphs?
Hi Barry, Thanks so much for the Junk Charts link. Maybe it'll help me make my case for why we shouldn't present our data like this. Susanna On Mon, Apr 9, 2012 at 1:07 PM, Barry Rowlingson b.rowling...@lancaster.ac.uk wrote: On Mon, Apr 9, 2012 at 7:29 AM, Susanna Makela susanna.m.mak...@gmail.com wrote: Hello R users, I would like to generate slanted stacked bar graphs like those on the bottom of pages 1 and 2 in this document: http://www.wssinfo.org/fileadmin/user_upload/resources/JMP-Snapshot-SWA-HLM.pdf . I've also attached the file to this email (pdf). Does anyone know if this is possible in R? I have tried googling and searching the R help archives, and it seems like ggplot2 might be able to make such graphs, but I'm not familiar enough with graphics in R to know for sure. (I personally don't feel that these slanted bar graphs - not sure if they have an actual name - convey the intended information very well, but I have to try and make them all the same. However, I am open to alternative suggestions for visualizing similar data if anyone has ideas.) These exact charts have been critiqued on the Junk Charts blog: http://junkcharts.typepad.com/junk_charts/2010/02/cousin-misfit.html and you'll even find some ggplot code in the comments for doing them. If you still want to... I just did a google image search for 'ggplot stacked' and there they were. Barry -- blog: http://geospaced.blogspot.com/ web: http://www.maths.lancs.ac.uk/~rowlings web: http://www.rowlingson.com/ twitter: http://twitter.com/geospacedman pics: http://www.flickr.com/photos/spacedman [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.