[R] help.search not working - returns readRDS(file) : unknown input format
Hi all, I'm using R 2.15.1 under RStudio on a WinXP computer. This morning as I started up RStudio, I noticed that there was something wrong with the help database. I entered and received the following: ??'xls' #Error in readRDS(file) : unknown input format writing help.search('xls') returned the same error. a plain help(plot) worked fine/ Some googling led me to close RStudio, remove the .RData and .RHistory I was using, but the same error remained. Another googling suggested that I'd use the rebuild=T option, so I ran lapply(rownames(installed.packages()),function(x)help.search('',package=X-x,rebuild=T)) Now there is no error, but help.search cannot find anything - it seems as if the database is empty. Does anyone have experience with a similar error? Best regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +44(0)74 253 760 42 address:St John's hill 18/5 EH8 9UQ Edinburgh, UK skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Remove space from string
gsub( ,,a) /Gustaf On Fri, Jan 13, 2012 at 12:24 PM, Vikram Bahure economics.vik...@gmail.comwrote: Dear R users, I have some trivial query. I have a string, I want to remove space from the string. For eg. Input: a - Remove space Output required: Removespace I tried using str_trim but only removes end spaces. library(stringr). Regards Vikram [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gustaf Rydevik, M.Sci. tel: +44(0)74 253 760 42 address:St John's hill 18/5 EH8 9UQ Edinburgh, UK skype:gustaf_rydevik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fwd: WHO Anthro growth curve macros and R
On Tue, Oct 11, 2011 at 1:21 AM, David Winsemius dwinsem...@comcast.netwrote: On Oct 10, 2011, at 4:48 PM, Gustaf Rydevik wrote: Hi all, some years ago, I sent a question to the mailing list regarding the WHO anthro macros. Since I've now received three mails asking how I solved it, I thought I'd cc R-help in for future reference. Attaching a zip file with the relevant code parts that I used that I'm not sure gets through (if anyone has recommendations on how to manage such files for the list, I'd be grateful. What I ended up doing was importing the data in SPSS format, and adapting the Splus function igrowup.standard slightly. igrowup.standard2.R is the adapted function, while the ssc files are original splus functions. Let me know if anyone gets problems in figuring out how to use the files. The only files that reach the readership are .pdf and .txt files. I do not know how carefully these get inspected, so it is possible that a zip file named something.txt might make it through. best regards, Gustaf \ David Winsemius, MD West Hartford, CT Hi all again, I noticed (and suspected) that as David said, zip files does not get through. Here's a google docs link for the Anthro example.zip file that won't change in the foreseeable future: * https://docs.google.com/viewer?a=vpid=explorerchrome=truesrcid=0B77NeAmIHMaQMjJkZTQ0OTQtNTRkYy00ZWMzLThhNTUtMzg1ZDY5MjljOGQxhl=en_US *(if the link is problematic due to it's length, try * http://tinyurl.com/625vod6 *instead)* *The most interesting files are igrowup.standard2.R (which is a modified version of igrowup.standard) and anthro-example.R. Hopes this comes in use for someone in the future! Regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +44(0)704 253 760 42 address:St John's hill 18/5 EH8 9UQ Edinburgh, UK skype:gustaf_rydevik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fwd: WHO Anthro growth curve macros and R
Hi all, some years ago, I sent a question to the mailing list regarding the WHO anthro macros. Since I've now received three mails asking how I solved it, I thought I'd cc R-help in for future reference. Attaching a zip file with the relevant code parts that I used that I'm not sure gets through (if anyone has recommendations on how to manage such files for the list, I'd be grateful. What I ended up doing was importing the data in SPSS format, and adapting the Splus function igrowup.standard slightly. igrowup.standard2.R is the adapted function, while the ssc files are original splus functions. Let me know if anyone gets problems in figuring out how to use the files. best regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +44(0)704 253 760 42 address:St John's hill 18/5 EH8 9UQ Edinburgh, UK skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [Rd] New errors with difftime()-objects in 2.11.1 (was Re: Request: difftime method for cut())
On Wed, Jun 23, 2010 at 7:13 AM, Peter Dalgaard pda...@gmail.com wrote: Gustaf Rydevik wrote: Oh, I forgot to mention that the workaround of using as.double (or as.numeric) works fine, and I've done that. It's just that it can take quite a while (as in several hours) to figure out that the reason for the error is that you have to force difftime objects to be numeric in 2.11.1, when the code's been working fine before and the error messages are obscure. I don't think you realize the problems that could occur by assuming that difftime objects are numerics ON ANY PARTICULAR SCALE! -- Peter Dalgaard Center for Statistics, Copenhagen Business School Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com Ah. Yes, you're right that it would be problematic to say the least to assume that the difftime object is measured in days and not in, say, seconds. And I suppose that it makes sense to prioritize avoiding calculations that give misleading results over forcing changes in old code. I was just caught somewhat unprepared, and I know that my colleagues who is not quite as R-literate will be even more unprepared for old stuff no longer working. Usually, R prepares the user for these kind of things by throwing warnings a version or two before the change is actually implemented. But I guess that's not always practical. I take it that your argument would also work agains implementing simple difftime-methods of functions as well, where you force difftime objectws to be numeric? In that case, people can disregard my suggestion of adding a difftime-method to cut(). Anyhow, I'll stop whining now. Thanks for the good work you're doing in the R Core team. Regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] New errors with difftime()-objects in 2.11.1 (was Re: Request: difftime method for cut())
On Thu, Jun 10, 2010 at 3:39 PM, Gustaf Rydevik gustaf.ryde...@gmail.com wrote: Hi all, The recent change in 2.11 that made as.numeric() return false on difftime-objects broke some of my code that calculated age classes of individuals using cut(). While this was no big thing to fix for me, it might be wise to provide a cut.difftime method to stop other old code from breaking. I'm guessing something as simple as cut.difftime-function(x,...){ x-as.numeric(x) cut(x,...) } would suffice. best regards, Gustaf As a followup, the change in how to treat difftime objects break even more of my old code in a different project, since I'm used to treating difftime as numeric in regressions and other analysis. And the error messages become *very* obscure, I.e Error: NA/NaN/Inf in foreign function call (arg 2) when applying loess to a difftime object. Tracking down the source of those errors become quite a nuisance. I suppose there's no chance of reversing the change, but I'd appreciate if someone could tell me the reason for introducing it so abrubtly. I'm cc'ing this to R-help, since there's probably more people than me that will be bitten by this in the future when looking into old projects. Regards, Gustaf Rydevik. -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] New errors with difftime()-objects in 2.11.1 (was Re: Request: difftime method for cut())
On Tue, Jun 22, 2010 at 7:50 PM, David Winsemius dwinsem...@comcast.net wrote: On Jun 22, 2010, at 1:33 PM, Gustaf Rydevik wrote: Cannot help you there, but have you looked at the help page for difftime? The as.double method returns the numeric value expressed in the specified units. Using units = auto means the units of the object. David Winsemius, MD West Hartford, CT Oh, I forgot to mention that the workaround of using as.double (or as.numeric) works fine, and I've done that. It's just that it can take quite a while (as in several hours) to figure out that the reason for the error is that you have to force difftime objects to be numeric in 2.11.1, when the code's been working fine before and the error messages are obscure. Regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] remove last char of a text string
On Mon, Jun 14, 2010 at 3:47 PM, glaporta glapo...@freeweb.org wrote: Dear R experts, is there a simple way to remove the last char of a text string? substr() function use as parameter start end only... but my strings are of different length... 01asap05a - 01asap05 02ee04b - 02ee04 Thank you all, Gianandrea -- View this message in context: http://r.789695.n4.nabble.com/remove-last-char-of-a-text-string-tp2254377p2254377.html Sent from the R help mailing list archive at Nabble.com. It's not terribly elegant, but this works: orig.text-c(01asap05a,02ee04b) substr(orig.text,1,nchar(orig.text)-1) Regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] moving average on irregular time series
Dear William and Gabor, Both solutions worked, and my problem is now solved. Many thanks to both of you! regards, Gustaf On Thu, Jun 3, 2010 at 10:23 AM, Gustaf Rydevik gustaf.ryde...@gmail.com wrote: Hi all, I wonder if there is any way to calculate a moving average on an irregular time series, or use the rollapply function in zoo? I have a set of dates where I want to check if there has been an event 14 days prior to each time point in order to mark these timepoints for removal, and can't figure out a good way to do it. Many thanks in advance! Gustaf Example data: exData-structure(list(Datebegin = structure(c(14476, 14569, 14576, 14621, 14627, 14632, 14661, 14671, 14705, 14715, 14751, 14756, 14495, 14518, 14523, 14526, 14528, 14529, 14545, 14548), class = Date), Event = c(TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE)), .Names = c(Datebegin, Event ), row.names = c(NA, 20L), class = data.frame) ###In this example, row 18 is a date less than 14 days after an event and should be marked for removal. -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ISO 8601 Weeks/Years on Windows with strptime
2010/6/3 Michael Höhle michael.hoe...@stat.uni-muenchen.de: Dear R-help, I am working on a R package for public health surveillance where the ISO 8601 representation of dates is of importance. Especially, the ISO Week and ISO Year of a date needs to be extracted. I was quite happy to find all of this implemented in the Date class with appropriate calls to strptime/format (using e.g. %G and %V). However, only later I realized that this functionality is currently not implemented on Windows (I'm a happy Mac/Linux user). As this seriously limits the applicability, I would like to enquire, if there are any plans to make this functionality available in Windows as well? Or are there any good workarounds to make format.Date(2001-12-31, %G) give 2002 instead of on Windows? Best regards, Michael Höhle - Hello, This seems to be a problem that crops up from time to time. I wrote a small function that got the ISO week of a Date object, that you can find in a bug-fixed version here: http://tolstoy.newcastle.edu.au/R/e10/help/10/05/5588.html Hope this is of help. I agree that it would be of interest to incorporate OS-independent date management in R, but not being part of the R development team, I'm not sure how to go about implementing it... Best regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] moving average on irregular time series
Hi all, I wonder if there is any way to calculate a moving average on an irregular time series, or use the rollapply function in zoo? I have a set of dates where I want to check if there has been an event 14 days prior to each time point in order to mark these timepoints for removal, and can't figure out a good way to do it. Many thanks in advance! Gustaf Example data: exData-structure(list(Datebegin = structure(c(14476, 14569, 14576, 14621, 14627, 14632, 14661, 14671, 14705, 14715, 14751, 14756, 14495, 14518, 14523, 14526, 14528, 14529, 14545, 14548), class = Date), Event = c(TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE)), .Names = c(Datebegin, Event ), row.names = c(NA, 20L), class = data.frame) ###In this example, row 18 is a date less than 14 days after an event and should be marked for removal. -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R on the iPhone/iPad? Not so much....a GPL violation
As I noted in my closing comments in my second post, if one has a desire to make R's functionality available on smartphones (iPhone, Android, etc.) or iPad-class devices, then a client/server approach may be the most efficient means to do so. That approach also avails you of more powerful computing platforms than the client side mobile devices have, at least at present, which will also limit aspects of portable functionality. Regards, Marc Indeed, the client/server approach is what is used in MatLab Mobile, which is now on sale in the app store. See http://blogs.mathworks.com/desktop/2010/05/24/introducing-matlab-mobile-%E2%80%93-an-iphone-app-to-connect-remotely-to-your-matlab/ If matlab can do it, then surely the R community can as well. Regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] p-values 2.2e-16 not reported
On Wed, May 19, 2010 at 10:53 AM, Will Eagle will.ea...@gmx.net wrote: Dear all, how can I get the exact p-value of a statistical test like cor.test() if the p-value is below the default machine epsilon value of .Machine$double.eps = 2.220446e-16? At the moment smaller p-values are reported as p-value 2.2e-16. .Machine$double.eps - 1E-100 does not solve this issue, although this value should be used by the format.pval() function. To know the exact p-values down to 1E-200 is very important since I have multiple tests which require a alpha error-threshold below 2.2E-16. Thanks in advance, Will I would be interested to hear about what kind of multiple testing you're doing. Genetics? Intuitively, requiring that small p-values would seem to throw away most any interesting results that are not simply errors in your data - are you sure that there's not a better way of thinking about your problem? From a practical standpoint, I would be sceptical about the ability of most R-algorithms to generate theoretically valid p-values of such a small order. Best regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] A revised function for getting ISO week
Hi All, Two years back, I posted a small function for getting the ISO 8601 defined week number of a date (such as the week number used in all Swedish calendars), in a os-independent manner. I've since discovered an inaccuracy in that code, and so I thought I'd repost the corrected version. Hopefully this will come in handy for someone searching the mailing list archives in the future. Best Regards, Gustaf # ## Inputs a date object, posix object, or 3 numbers and gives back the iso week. ## By Gustaf Rydevik, revised 2010 getweek-function(Y,M=NULL,D=NULL){ if(!class(Y)[1]%in%c(Date,POSIXt)) { date.posix-strptime(paste(Y,M,D,sep=-),%Y-%m-%d) } if(class(Y)[1]%in%c(POSIXt,Date)){ date.posix-as.POSIXlt(Y) Y-as.numeric(format(date.posix,%Y)) M-as.numeric(format(date.posix,%m)) D-as.numeric(format(date.posix,%d)) } LY- (Y%%4==0 !(Y%%100==0))|(Y%%400==0) LY.prev- ((Y-1)%%4==0 !((Y-1)%%100==0))|((Y-1)%%400==0) date.yday-date.posix$yday+1 jan1.wday-strptime(paste(Y,01-01,sep=-),%Y-%m-%d)$wday jan1.wday-ifelse(jan1.wday==0,7,jan1.wday) date.wday-date.posix$wday date.wday-ifelse(date.wday==0,7,date.wday) If the date is in the beginning, or end of the year, ### does it fall into a week of the previous or next year? Yn-ifelse(date.yday=(8-jan1.wday)jan1.wday4,Y-1, ifelse(((365+LY-date.yday)(4-date.wday)),Y+1,Y)) ##Set the week differently if the date is in the beginning,middle or end of the year Wn-ifelse( Yn==Y-1, ifelse((jan1.wday==5|(jan1.wday==6 LY.prev)),53,52), ifelse(Yn==Y+1,1,(date.yday+(7-date.wday)+(jan1.wday-1))/7-(jan1.wday4)) ) return(list(Year=Yn,ISOWeek=Wn)) } -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Set encoding when load()-ing workspaces?
Hi all, I hope that there is someone that can help me out here. I am trying to load() a workspace on os x (R 2.11.0) that was saved in windows XP (R 2.9). In that workspace, there's a data.frame with names that contain swedish characters. These characters become garbled, which is a major problem. From the R windows FAQ, I read: Note though that character data in a workspace will be in a particular encoding that is not recorded in the workspace, so workspaces containing non-ASCII character data may not be interchangeable even on the same OS. Since R marks character data when it knows it to be in UTF-8 or Latin-1 (including its Windows superset, CP1252), strings in those encodings are likely to be transferred correctly: fortunately this covers most of the common cases (Mac OS X normally uses UTF-8, and Linux users are likely to use UTF-8 or perhaps Latin-1 (which used to be used for English)). Apparently, my case is not the most common one, and I don't know why. I've been trying to dig into the load() function, but since it uses a lot of .Internal functions, I get stuck there. I've also tried doing options(encoding=latin1), which doesn't seem to change anything. And now I'm stuck. Any suggestions on where to look? I've run into this issue twice before. The first time I managed to get it solved, but can't remember how (perhaps a .Rprofile setting somewhere?). The second time, I mailed R-Sig-Mac, got some tips that unfortunately did not lead anywhere, and subsequently gave up. I hope third time's a charm! Many thanks in advance, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Set encoding when load()-ing workspaces?
Many thanks Prof. and Duncan! Iconv worked like a charm together with CP1252 as the windows encoding, and now all the text shows up correctly Because the data frame also contained factors with levels that had swedish characters, i ended up writing a small function for converting the encoding of everything inside a dataframe in one go. It is a bit slow, but hopefully someone else will find it useful in the future: iconv.data.frame-function(df,...){ df.names-iconv(names(df),...) df.rownames-iconv(rownames(df),...) names(df)-df.names rownames(df)-df.rownames df.list-lapply(df,function(x){ if(class(x)==factor){x-factor(iconv(as.character(x),...))}else if(class(x)==character){x-iconv(x,...)}else{x} }) df.new-do.call(data.frame,df.list) return(df.new) } Best regards, Gustaf On Sun, May 2, 2010 at 8:36 PM, Prof Brian Ripley rip...@stats.ox.ac.uk wrote: On Sun, 2 May 2010, Duncan Murdoch wrote: Gustaf Rydevik wrote: Hi all, I hope that there is someone that can help me out here. I am trying to load() a workspace on os x (R 2.11.0) that was saved in windows XP (R 2.9). In that workspace, there's a data.frame with names that contain swedish characters. These characters become garbled, which is a major problem. From the R windows FAQ, I read: Note though that character data in a workspace will be in a particular encoding that is not recorded in the workspace, so workspaces containing non-ASCII character data may not be interchangeable even on the same OS. Since R marks character data when it knows it to be in UTF-8 or Latin-1 (including its Windows superset, CP1252), strings in those encodings are likely to be transferred correctly: fortunately this covers most of the common cases (Mac OS X normally uses UTF-8, and Linux users are likely to use UTF-8 or perhaps Latin-1 (which used to be used for English)). Apparently, my case is not the most common one, and I don't know why. I've been trying to dig into the load() function, but since it uses a lot of .Internal functions, I get stuck there. I've also tried doing options(encoding=latin1), which doesn't seem to change anything. You can't change the encoding when you load, but you can convert the encoding later (using iconv()) if you know what encoding it is. A good guess for a file created on Windows in my locale is latin1, but it's not certain, and I don't know what is commonly used on Windows in a Swedish locale. CP1252 (which is actually what you will get too). If you have an example where you know the correct version of the string and you can show us what you're getting, together with charToRaw() applied to it, someone will probably be able to make a guess at the encoding. Duncan Murdoch And now I'm stuck. Any suggestions on where to look? I've run into this issue twice before. The first time I managed to get it solved, but can't remember how (perhaps a .Rprofile setting somewhere?). The second time, I mailed R-Sig-Mac, got some tips that unfortunately did not lead anywhere, and subsequently gave up. I hope third time's a charm! Many thanks in advance, Gustaf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R loop.
On Thu, Apr 22, 2010 at 7:20 PM, mhalsham mhals...@bradford.ac.uk wrote: Ok sorry for bad explanation from my side What I want. I have a txt file name is (table3.txt) this file contains 1293 rows and some of these row will have 1 column and some of them will have up to may be 40 column. For example A B C D E F G H I 1 Deafness EYA4 MYO7A TECTA COL11A2 POU4F3 MYH9 ACTG1 MYO6 2 Leukemia TAL1 TAL2 ZNFN1A1 FLT3 3 Colon_cancer RAD54B PTPN12 BCL10 The orders below will show how I want the recorders to be. A B 1 Deafness EYA4 2 Deafness MYO7A 3 Deafness TECTA 4 Deafness COL11A2 5 Deafness POU4F3 6 Deafness MYH9 7 Deafness ACTG1 8 Deafness MYO6 9 Leukemia TAL1 10 Leukemia TAL2 11 Leukemia ZNFN1A1 12 Leukemia FLT3 13 Colon_cancer RAD54B 14 Colon_cancer PTPN12 15 Colon_cancer BCL10 Any help will very kind of every one, and thanks for those who trying to help and couldn’t understand me. Thank you Have you managed to read your table3.txt into R,using read.table etc? If so, could you copy/paste the result of using dput() on your object? After a bit of work, I've gotten your example data into R, but please post either comma-separated data or dput() results in the future.Anyhow, here's an example of how to get what you want. Hope it helps. Regards, Gustaf .. example.data-structure(list(V1 = structure(c(2L, 3L, 1L), .Label = c(Colon_cancer, Deafness, Leukemia), class = factor), V2 = structure(c(1L, 3L, 2L), .Label = c(EYA4, RAD54B, TAL1), class = factor), V3 = structure(c(1L, 3L, 2L), .Label = c(MYO7A, PTPN12, TAL2), class = factor), V4 = structure(c(2L, 3L, 1L), .Label = c(BCL10, TECTA, ZNFN1A1), class = factor), V5 = structure(c(2L, 3L, 1L), .Label = c(, COL11A2, FLT3), class = factor), V6 = structure(c(2L, 1L, 1L), .Label = c(, POU4F3), class = factor), V7 = structure(c(2L, 1L, 1L), .Label = c(, MYH9), class = factor), V8 = structure(c(2L, 1L, 1L), .Label = c(, ACTG1), class = factor), V9 = structure(c(2L, 1L, 1L), .Label = c(, MYO6), class = factor)), .Names = c(V1, V2, V3, V4, V5, V6, V7, V8, V9), class = data.frame, row.names = c(NA, -3L)) library(reshape) example.long-melt(exampledata,id.vars=V1) example.long # V1 variable value #1 Deafness V2EYA4 #2 Leukemia V2TAL1 #3 Colon_cancer V2 RAD54B #4 Deafness V3 MYO7A #5 Leukemia V3TAL2 #6 Colon_cancer V3 PTPN12 #7 Deafness V4 TECTA #8 Leukemia V4 ZNFN1A1 #9 Colon_cancer V4 BCL10 #10 Deafness V5 COL11A2 #11 Leukemia V5FLT3 #12 Colon_cancer V5 #13 Deafness V6 POU4F3 #14 Leukemia V6 #15 Colon_cancer V6 #16 Deafness V7MYH9 #17 Leukemia V7 #18 Colon_cancer V7 #19 Deafness V8 ACTG1 #20 Leukemia V8 #21 Colon_cancer V8 #22 Deafness V9MYO6 #23 Leukemia V9 #24 Colon_cancer V9 ##Or if you want it in the order of V1 example.long[order(example.long$V1),] #V1 variable value #3 Colon_cancer V2 RAD54B #6 Colon_cancer V3 PTPN12 #9 Colon_cancer V4 BCL10 #12 Colon_cancer V5 #15 Colon_cancer V6 #18 Colon_cancer V7 #21 Colon_cancer V8 #24 Colon_cancer V9 #1 Deafness V2EYA4 #4 Deafness V3 MYO7A #7 Deafness V4 TECTA #10 Deafness V5 COL11A2 #13 Deafness V6 POU4F3 #16 Deafness V7MYH9 #19 Deafness V8 ACTG1 #22 Deafness V9MYO6 #2 Leukemia V2TAL1 #5 Leukemia V3TAL2 #8 Leukemia V4 ZNFN1A1 #11 Leukemia V5FLT3 #14 Leukemia V6 #17 Leukemia V7 #20 Leukemia V8 #23 Leukemia V9 -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R loop.
On Fri, Apr 23, 2010 at 11:14 AM, mhalsham mhals...@bradford.ac.uk wrote: Hi Yes I have managed to read the file (Table2.txt) The command I have used a- read.table(table3.txt, fill=TRUE, header=FALSE) If I read the first row the result output will be like that. a[1,] Result would be V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 1 Deafness EYA4 DIAPH1 MYO7A TECTA COL11A2 POU4F3 MYH9 ACTG1 MYO6 GJB3 KCNQ4 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 1 GRHL2 GJB2 GJB6 TMC1 DSPP CRYM MYH14 DFNA5 COCH MYO1A TMPRSS3 CDH23 ATP2B2 V26 V27 V28 V29 V30 V31 V32 1 STRC USH1C OTOA PCDH15 CLDN14 MYO3A Did you try my code in that case? I think that does what you wanted. /Gustaf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Remove duplicated rows
On Fri, Apr 23, 2010 at 4:05 AM, chrisli1223 chri...@austwaterenv.com.au wrote: Hi all, I have a dataset similar to the following Name Date Value A 1/01/2000 4 A 2/01/2000 4 A 3/01/2000 5 A 4/01/2000 4 A 5/01/2000 1 B 6/01/2000 2 B 7/01/2000 1 B 8/01/2000 1 I would like R to remove duplicates based on column 1 and 3 only. In addition, I would like R to remove duplicates based on the underlying and overlying row only. For example, for A, I would like to remove row 2 only and keep row 1, 3 and 4. I have tried: unique() and replicated(), but I do not have much success. I have also tried: dataset-c(1,diff(dataset)!=0), but I don't know how to apply it to this multi-column situation. Any help would be greatly appreciated. Thanks in advance, Chris -- Hi, This code is a bit ugly, but it works. Hope it helps. /Gustaf library(zoo) test-read.table(clipboard,header=T) test$code-paste(test$Name,test$Value,sep=) drop.ndx-rollapply(zoo(test$code),3,function(x)(x[2]%in%c(x[1],x[3]))) drop.ndx-c(FALSE,drop.ndx,FALSE) test[!drop.ndx,] -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assigning week numbers
On Wed, Apr 21, 2010 at 6:50 PM, Michael Hosack mhosa...@hotmail.com wrote: I provided a minimized version of my dataframe at the bottom of this message containing the results of David's code in variable ('wkoffset') and Jeff Hallman's code in ('WEEK'). Jeff's code produced the correct results (thank you Jeff) though I have been unable to understand it. David, as you can see your code begins week 2 for year 2011 on a Wednesday, rather than on a Saturday, as it should. Your adjustment seems not to correct the problem, but I concede I may be using it incorrectly. If you are obtaining the correct results please let me know what I am doing wrong. Thanks, Mike Hello again, Just for fun, I implemented the gist of your original code in R. It's much longer and not as elegant as the other solutions, but perhaps someone can learn something from it. Regards, Gustaf Daterange-range(SCHEDULE3$DATE.) Daterange[1]-paste(as.numeric( substr(as.character(Daterange[1]),1,4))-1, -05-01,sep=) Daterange[2]-paste(as.numeric( substr(as.character(Daterange[2]),1,4))+1, -05-01,sep=) alldates-seq(from=Daterange[1],to=Daterange[2],by=1) My.locale-Sys.getlocale(LC_TIME) Sys.setlocale(LC_TIME,English_USA.1252) Week-1 allweeks-vector(length=length(alldates)) for(i in seq_along(alldates)){ if(weekdays(alldates[i])==Saturday){ Week-Week+1 } if(substr(as.character(alldates[i]),6,10)==05-01){ Week-1 } allweeks[i]-Week } SCHEDULE3$Week-allweeks[match(SCHEDULE3$DATE.,alldates)] Sys.setlocale(LC_TIME,My.locale) -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RNG
On Wed, Apr 21, 2010 at 4:37 PM, tamas barjak tamas.bar...@gmail.com wrote: Hi all! I would like to generate random numbers between 0 and 1. How can I do this? I downloaded it single RNG but it generates ones between only 1 and 1...:( Thank you for the help! Tamas Hi tamas, I am not sure what you mean by downloaded There is a lot of random number generators built into R. To generate 10 random numbers between 0 and 1, try runif(10) Regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assigning Week Numbers
On Tue, Apr 20, 2010 at 7:59 PM, Michael Hosack mhosa...@hotmail.com wrote: R experts, How could I extract the week number from a date vector (in Date class) such that week numbering (week 1...2...) begins (May 01) and ends (October 31) on the same specific dates each year? Week numbering must conform to the following day numbering format (Sat=1,Sun=2,Mon=3.Fri=7). This means that new weeks must begin on Saturdays, and end on Fridays (except for the first date of May 01, which always begins week 1; week 2 begins on the proceeding Saturday). This needs to be applicable across years to work effectively. I have tried using both vectorized and loop approaches with no success. I am including a bit of old Systat code that does the trick simply and concisely. If anyone knows an analogous method in R please let me know. My R dataframe contains all the variables and data in the Systat temp file. Use sched3.t Save sched4.t Hold By mm dd If bof then let week=1 Else if bog and DOW$=SAT then let week = week + 1 Run Thank you, Mike From your code, it seems as if you're assuming that SCHEDULE3 contains all consecutive saturdays, without skipping any. Is that correct? /Gustaf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how can I plot the histogram like this using R?
On Fri, Apr 16, 2010 at 10:13 AM, bbslover dlu...@yeah.net wrote: Thanks for your reply, I just want to get the figure like y1.jpg using the data from y1.txt. Through the figure I want to obtain the split point like y1.jpg, and consider 2.5 as the plit point. This figure is drawn by other people, I just want to draw it using R, but I can not, so I hope, friends can help me. Best wishes! kevin http://n4.nabble.com/file/n1965378/y1.jpg http://n4.nabble.com/file/n1965378/y1.txt y1.txt -- View this message in context: http://n4.nabble.com/how-can-I-plot-the-histogram-like-this-using-R-tp1839303p1965378.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hi, Does this do what you want? temp-read.table(url(http://n4.nabble.com/file/n1965378/y1.txt;)) hist(temp$V1,breaks=seq(0,5.1,by=0.1)) abline(v=2.5,lty=2,lwd=2,col=red) Regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simplifying particular piece of code
How about this (not tested, since you did not provide example data nor function code): --- SRnames - paste(colnames.mrets, .SR, sep=) AVnames - paste(colnames.mrets, .AV120, sep=) SDnames - paste(colnames.mrets, .SD120, sep=) names.matrix-cbind(SRnames,AVnames,SDnames) mrets.list-apply(names.matrix,1,function(.names){ apply(mrets,1,MyFunc,ret=.names[2],stdev=.names[3]} ) names(mrets.list)-names.matrix[,1] mrets-do.call(merge,mrets.list) - ? /Gustaf On Wed, Mar 31, 2010 at 12:10 PM, Sergey Goriatchev serg...@gmail.com wrote: Hello, everyone I have a piece of code that looks like this: mrets - merge(mrets, BMM.SR=apply(mrets, 1, MyFunc, ret=BMM.AV120, stdev=BMM.SD120)) mrets - merge(mrets, GM1.SR=apply(mrets, 1, MyFunc, ret=GM1.AV120, stdev=GM1.SD120)) mrets - merge(mrets, IYC.SR=apply(mrets, 1, MyFunc, ret=IYC.AV120, stdev=IYC.SD120)) mrets - merge(mrets, FCA.SR=apply(mrets, 1, MyFunc, ret=FCA.AV120, stdev=FCA.SD120)) mrets - merge(mrets, IMM.SR=apply(mrets, 1, MyFunc, ret=IMM.AV120, stdev=IMM.SD120)) mrets - merge(mrets, BME.SR=apply(mrets, 1, MyFunc, ret=BME.AV120, stdev=BME.SD120)) mrets - merge(mrets, CRT.SR=apply(mrets, 1, MyFunc, ret=CRT.AV120, stdev=CRT.SD120)) mrets - merge(mrets, GTF.SR=apply(mrets, 1, MyFunc, ret=GTF.AV120, stdev=GTF.SD120)) mrets - merge(mrets, ERU.SR=apply(mrets, 1, MyFunc, ret=ERU.AV120, stdev=ERU.SD120)) mrets - merge(mrets, ERE.SR=apply(mrets, 1, MyFunc, ret=ERE.AV120, stdev=ERE.SD120)) mrets - merge(mrets, EPT.SR=apply(mrets, 1, MyFunc, ret=EPT.AV120, stdev=EPT.SD120)) mrets - merge(mrets, EVA.SR=apply(mrets, 1, MyFunc, ret=EVA.AV120, stdev=EVA.SD120)) mrets - merge(mrets, EMT.SR=apply(mrets, 1, MyFunc, ret=EMT.AV120, stdev=EMT.SD120)) mrets - merge(mrets, EMM.SR=apply(mrets, 1, MyFunc, ret=EMM.AV120, stdev=EMM.SD120)) mrets - merge(mrets, EMV.SR=apply(mrets, 1, MyFunc, ret=EMV.AV120, stdev=EMV.SD120)) mrets - merge(mrets, ETM.SR=apply(mrets, 1, MyFunc, ret=ETM.AV120, stdev=ETM.SD120)) Is there a way to simplify this, some sort of loop? mrets is a zoo object. .AV120 and .SD120 are columns in this object. I need the exact .SR column names. This does not work: SRnames - paste(colnames.mrets, .SR, sep=) AVnames - paste(colnames.mrets, .AV120, sep=) SDnames - paste(colnames.mrets, .SD120, sep=) for(i in seq(SRnames)){ mrets - merge(mrets, SRnames[i]=apply(mrets, 1, MyFunc, ret=AVnames[i], stdev=SDnames[i])) } Help much appreciated. Regards, Sergey -- Simplicity is the last step of art./Bruce Lee The more you know, the more you know you don't know. /Myself I'm not young enough to know everything. /Oscar Wilde Experience is one thing you can't get for nothing. /Oscar Wilde When you are finished changing, you're finished. /Benjamin Franklin Luck is where preparation meets opportunity. /George Patten Kniven skärpes bara mot stenen. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simplifying particular piece of code
On Wed, Mar 31, 2010 at 5:11 PM, Sergey Goriatchev serg...@gmail.com wrote: but data - merge(data,data.list) works. Neither data or data.list is a list, so do.call does not work. I am very weak on lists, never used them before Best, Sergey Hej Sergey, Ok; I was wondering if the apply thing would work. Cool that merge would be clever enough to append a matrix. I'm guessing that you've got what you needed then? For reference, (and for the general list) I had changed the code before Sergeys response, replacing apply() with lapply(). That code follows below. Best regards, Gustaf - cnames - c(BMM, GM1, IYC, FCA, IMM, BME, CRT, GTF, ERU, ERE, EPT, EVA, EMT, EMM, EMV, ETM) AVnames - paste(cnames, .AV120, sep=) SDnames - paste(cnames, .SD120, sep=) a - zoo(matrix(rep(seq(from=160, to=10, by=-10), 1000), ncol=16, byrow=TRUE)) colnames(a) - AVnames b - zoo(matrix(rep(2, 16000), ncol=16)) colnames(b) - SDnames data - merge(a, b) MyFunc - function(x, ret, stdev){ if(any(is.na(c(x[ret], x[stdev]{ return(NA) }else{ return(x[ret]/x[stdev]) } } names.df-data.frame(rbind(SRnames,AVnames,SDnames)) func - function(.names){ apply(data, 1, MyFunc, ret=.names[2], stdev=.names[3]) } data.list-lapply(names.df, func) mrets-do.call(merge,c(list(data),data.list)) On Wed, Mar 31, 2010 at 12:33, Gustaf Rydevik gustaf.ryde...@gmail.com wrote: How about this (not tested, since you did not provide example data nor function code): --- SRnames - paste(colnames.mrets, .SR, sep=) AVnames - paste(colnames.mrets, .AV120, sep=) SDnames - paste(colnames.mrets, .SD120, sep=) names.matrix-cbind(SRnames,AVnames,SDnames) mrets.list-apply(names.matrix,1,function(.names){ apply(mrets,1,MyFunc,ret=.names[2],stdev=.names[3]} ) names(mrets.list)-names.matrix[,1] mrets-do.call(merge,mrets.list) - ? /Gustaf On Wed, Mar 31, 2010 at 12:10 PM, Sergey Goriatchev serg...@gmail.com wrote: Hello, everyone I have a piece of code that looks like this: mrets - merge(mrets, BMM.SR=apply(mrets, 1, MyFunc, ret=BMM.AV120, stdev=BMM.SD120)) mrets - merge(mrets, GM1.SR=apply(mrets, 1, MyFunc, ret=GM1.AV120, stdev=GM1.SD120)) mrets - merge(mrets, IYC.SR=apply(mrets, 1, MyFunc, ret=IYC.AV120, stdev=IYC.SD120)) mrets - merge(mrets, FCA.SR=apply(mrets, 1, MyFunc, ret=FCA.AV120, stdev=FCA.SD120)) mrets - merge(mrets, IMM.SR=apply(mrets, 1, MyFunc, ret=IMM.AV120, stdev=IMM.SD120)) mrets - merge(mrets, BME.SR=apply(mrets, 1, MyFunc, ret=BME.AV120, stdev=BME.SD120)) mrets - merge(mrets, CRT.SR=apply(mrets, 1, MyFunc, ret=CRT.AV120, stdev=CRT.SD120)) mrets - merge(mrets, GTF.SR=apply(mrets, 1, MyFunc, ret=GTF.AV120, stdev=GTF.SD120)) mrets - merge(mrets, ERU.SR=apply(mrets, 1, MyFunc, ret=ERU.AV120, stdev=ERU.SD120)) mrets - merge(mrets, ERE.SR=apply(mrets, 1, MyFunc, ret=ERE.AV120, stdev=ERE.SD120)) mrets - merge(mrets, EPT.SR=apply(mrets, 1, MyFunc, ret=EPT.AV120, stdev=EPT.SD120)) mrets - merge(mrets, EVA.SR=apply(mrets, 1, MyFunc, ret=EVA.AV120, stdev=EVA.SD120)) mrets - merge(mrets, EMT.SR=apply(mrets, 1, MyFunc, ret=EMT.AV120, stdev=EMT.SD120)) mrets - merge(mrets, EMM.SR=apply(mrets, 1, MyFunc, ret=EMM.AV120, stdev=EMM.SD120)) mrets - merge(mrets, EMV.SR=apply(mrets, 1, MyFunc, ret=EMV.AV120, stdev=EMV.SD120)) mrets - merge(mrets, ETM.SR=apply(mrets, 1, MyFunc, ret=ETM.AV120, stdev=ETM.SD120)) Is there a way to simplify this, some sort of loop? mrets is a zoo object. .AV120 and .SD120 are columns in this object. I need the exact .SR column names. This does not work: SRnames - paste(colnames.mrets, .SR, sep=) AVnames - paste(colnames.mrets, .AV120, sep=) SDnames - paste(colnames.mrets, .SD120, sep=) for(i in seq(SRnames)){ mrets - merge(mrets, SRnames[i]=apply(mrets, 1, MyFunc, ret=AVnames[i], stdev=SDnames[i])) } Help much appreciated. Regards, Sergey -- Simplicity is the last step of art./Bruce Lee The more you know, the more you know you don't know. /Myself I'm not young enough to know everything. /Oscar Wilde Experience is one thing you can't get for nothing. /Oscar Wilde When you are finished changing, you're finished. /Benjamin Franklin Luck is where preparation meets opportunity. /George Patten Kniven skärpes bara mot stenen. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http
Re: [R] Adding minutes to 24 hour time
On Wed, Mar 17, 2010 at 2:57 PM, Hosack, Michael mhos...@state.pa.us wrote: Hi, Does anyone know how to add minutes (up to 100 min) to a 24 hour time, to create a new 24 hour time? I can't seem to find any documentation or examples explaining how to do this. The variables of interest are 'ARRIVE','WAIT', and 'DEPART' in the attached partial dataframe. I want 'DEPART' to be the sum of 'ARRIVE' and 'WAIT' in 24 hour format. Also, can anyone direct me to some relevant documentation? Thank you, Mike If you convert all data to a date-and-time ?POSIXlt object, you can just convert the minutes to seconds and add together with +. Another way would be the something like this: addTime-function(timeTxt,mins){ start.time-strsplit(timeTxt,:) start.time-do.call(rbind,start.time) storage.mode(start.time)-numeric hours-mins%/%60 mins.left-mins%%60 end.mins-(start.time[,2]+mins.left)%%60 end.hours-(start.time[,1]+hours+(start.time[,2]+mins.left)%/%60)%%24 end.time-paste(end.hours,end.mins,sep=:) return(end.time) } addTime(c(15:23,7:00),c(70,100)) or this: addTime2-function(timeTxt,mins){ orig.date-as.POSIXct(paste(2001-01-01,timeTxt)) new.Date-orig.date+mins*60 new.Date-strsplit(as.character(new.Date), ) new.Time-(sapply(new.Date,[,2)) return(new.Time) } addTime2(c(15:23,7:00),c(70,100)) Regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] two questions for R beginners
On Mon, Mar 1, 2010 at 4:02 PM, Karl Ove Hufthammer k...@huftis.org wrote: On Mon, 01 Mar 2010 09:09:11 -0500 Duncan Murdoch murd...@stats.uwo.ca wrote: The reason for the difference is that data.frames are lists organized into columns (so the $ handling comes from the list, where it means extract the component) whereas a matrix is a single vector displayed in columns. Sure, I know that. But is there are reason why the '$' can't be overloaded to handle the extraction, as a *convenience* to the user? See the second paragraph of my response. OK. So I take it that there are no *technical* reasons can't be made to work for matrices and named vectors? I tried redefining it for matrices with `$.matrix`=function(x, name) ... something ... but I still get an error message when trying to use it. Of course I agree that 'the idea of a list is so fundamental to R that it needs to be something learned pretty early', but is there any harm in slightly 'blur[ing] the distinction between dataframes and matrices', as a convenience to the user? Or, in other words, what does one *gain* by having '$' on named matrices and vectors give a confusing error message instead of the expected results? Dinstinction for dinstinction's own sake is of little use. In case anyone is wondering about the vector case (of which matrices is of course only a special case), here is an example: d=iris[,1:4] d1=head(d,1) d2=mean(d) d1 Sepal.Length Sepal.Width Petal.Length Petal.Width 1 5.1 3.5 1.4 0.2 d2 Sepal.Length Sepal.Width Petal.Length Petal.Width 5.84 3.057333 3.758000 1.199333 d3$Sepal.Width [1] 3.5 d4$Sepal.Width Error in d4$Sepal.Width : $ operator is invalid for atomic vectors -- Karl Ove Hufthammer As a technical excercise, I wrote the following function: '%W%'-function(e1,e2)e1[,which(colnames(e1)%in%e2)] temp-matrix(1:6,nrow=2,dimnames=list(a=1:2,b=c(a,b,c))) temp%W%b I assume that the reason you can't use $.matrix , is that $ is a primitive function and doesn't use the UseMethod function. /Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] WHO Anthro growth curve macros and R
Hi all, I've got a project where I have to calculate weight-for-age Z-scores, preferably using the WHO standards. WHO have been very nice to publish macros for doing this in both STATA,SPSS, SAS and Splus formats (see http://www.who.int/childgrowth/software/en/), but for some reason have chosen not to use the free R alternative to Splus. In the Splus zipfile there are nine datafiles with a sdd file ending, presumably data dumps from Splus 7.x. I've tried using restore.data from the foreign package, but that does not work (probably because the data is saved in the newer format). I'm considering trying to read in spss files and massaging them to fit to the format that the splus macro is expecting, but I'd prefer to be able to use the Splus files directly. Has anyone on the list tried using the WHO anthro macros with R, and can tell me how they did it? Alternatively, could some, very kind, person try and open the Splus files, and save them in a R-readable format? I would be extremely grateful for any help on this. Best regards, Gustaf Rydevik -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] String question
On Wed, Dec 23, 2009 at 11:21 AM, Knut Krueger r...@krueger-family.de wrote: Hi to all I need a string like temp - paste(m1,m2,m3,sep=,) But i must know how many items are in the string,afterwards the other option would be to use a vector temp - c(m1,m2,m3) No problem to get the count of items but I must get afterwards the string m1,m2,m3 No problem to build the string with a loop, but it should be more easy but it seems that I am looking to the wrong functions. Kind regards Knut Just thought I'd show you a solution from the other direction, in addition to those that all other have posted: temp - paste(m1,m2,m3,sep=,)##Generate string nchar(gsub(([^,]),,temp))+1## Count commas in the string and add 1. Regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question About Repeat Random Sampling from a Data Frame
On Mon, Dec 21, 2009 at 4:12 PM, Adam Carr adamlc...@yahoo.com wrote: Good Morning: I've read many, many posts on the r-help system and I feel compelled to quickly admit that I am relatively new to R, I do have several reference books around me, but I cannot count myself among the fortunate who seem to strong programming intuition. I have a data set consisting of 1637 observations of five variables: tensile strength, yield strength, elongation, hardness and a character indicator with three levels: (Y)es, (N)o, and (F)ail. My objective is to randomly sample various subsets from this data set and then evaluate these subsets using simple parameters among them tests for normality, shape and skewness. The data set is ordered by the character variable prior to sampling, and the samples are weighted to mirror representation in an overall, physical process. I am sampling the data set using this code: sample - dataset[sample(1:1637, 500, prob=c(rep(163.7/1637,513),rep(245.5/1637,197),rep(1227.8/1637,927)),replace = TRUE),] What I would like to do is iterate this process to create many (say 500 or more) sampled sets of n=500 and then evaluate each set for the parameters of interest. I would actually be evaluating each variable within each subset for my characteristic of interest. I am familiar with sampling and saving single columns of data to do this sort of thing, but I am not sure how to accomplish this with a multiple-variable data set. For example, I am currently iterating this using a clunky process: mysamples-list() for (i in 1:10){ mysamples[[i]] - dataset[ sample(1:1637,100,prob=c(rep(163.7/1637,513),rep(245.5/1637,197),rep(1227.8/1637,927)),replace = TRUE), ] } But this leaves me with the additional task of defining each mysample[i] iteration and converting it to a form on which I can apply a standard statistical test like mean() or skewness() to the variable columns within each subset. I have attempted to iteratively convert these lists using this code: mat-matrix(nrow=100,ncol=5) for (i in 1:length(mysamples)) {mat[i]-do.call('rbind',mysamples[i])} but running the code generates the error message: number of items to replace is not a multiple of replacement length. I have tried unsuccessfully, by reading many, many helpful r-help emails on this error, to understand my probably obvious mistake. Based on the small amount that I think I know about R it seems to me that sampling the data frame and containing the samples in a list is likely a pretty inefficient way to do this task. Any help that any of you could provide to assist me in iteratively sampling the data frame, and storing the samples in a form on which I can apply other statistical tests would be greatly appreciated. Thank you very much for taking the time to consider my questions. Adam [[alternative HTML version deleted]] That's pretty much how I tend to do those things. what you seem to be missing is the ?apply family: mysamples.means-lapply(mysamples,function(x)mean(x[,1])) Hope that gets you on your way. If you want more help, I'd suggest including an example data set in your follow-up messages. /Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What is the fastest way to see what are in an RData file?
On Thu, Dec 17, 2009 at 4:33 PM, Peng Yu pengyu...@gmail.com wrote: On Thu, Dec 17, 2009 at 5:33 AM, Gustaf Rydevik gustaf.ryde...@gmail.com wrote: On Wed, Dec 16, 2009 at 10:13 PM, Peng Yu pengyu...@gmail.com wrote: Currently, I load the RData file then ls() and str(). But loading the file takes too long if the file is big. Most of the time, I only interested what the variables are in the the file and the attributes of the variables (like if it is a data.frame, matrix, what are the colnames/rownames, etc.) I'm wondering if there is any facility in R to help me avoid loading the whole file. I thought this was interesting as well, so i did a bit of searching through the R-help list archives and found this answer by Simon Urbanek: https://stat.ethz.ch/pipermail/r-devel/2007-August/046724.html The link to a c-routine that does what you want still works, but for future reference I'm pasting the code below. It doesn't work for the RData file that I saved by save(list='test', file='test.RData'). $ rdcopy test.RData Format version 3ec, R version = 23813.88.84, release = f9db1dba Sorry, this tool supported RXDR version 2 format only What happens if you remove the version check? I.e. this one: if (ver != 2) { XdrInTerm(d); error(_(Sorry, this tool supported RXDR version 2 format only\n)); } From what I can read on the hel page for ?save, there hasn't been a change in the file format since 1.4.0 /Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What is the fastest way to see what are in an RData file?
On Wed, Dec 16, 2009 at 10:13 PM, Peng Yu pengyu...@gmail.com wrote: Currently, I load the RData file then ls() and str(). But loading the file takes too long if the file is big. Most of the time, I only interested what the variables are in the the file and the attributes of the variables (like if it is a data.frame, matrix, what are the colnames/rownames, etc.) I'm wondering if there is any facility in R to help me avoid loading the whole file. I thought this was interesting as well, so i did a bit of searching through the R-help list archives and found this answer by Simon Urbanek: https://stat.ethz.ch/pipermail/r-devel/2007-August/046724.html The link to a c-routine that does what you want still works, but for future reference I'm pasting the code below. Regards, Gustaf /* rdcopy v0.1-0 - extract objects or display contents of RData RDX2 files * * Copyright (C) 2007 Simon Urbanek * based in part on src/main/serialize.c and src/main/saveload.c from R: * Copyright (C) 1995, 1996 Robert Gentleman and Ross Ihaka * Copyright (C) 1997--2007 Robert Gentleman, Ross Ihaka and the * R Development Core Team * License: GPL v2 * * Although R includes are needed to compile this (for constants), * libR does NOT have to be linked. */ #include stdio.h #include rpc/types.h #include rpc/xdr.h #include R.h #include Rinternals.h #ifndef _ #define _(X) X #endif #undef error void error(char *fmt, ...) { va_list(ap); va_start(ap, fmt); vprintf(fmt, ap); va_end(ap); exit(1); } /* .RData: byte 0..4 XDR2. - file magic (XDR2\n=XDR ver2) byte 5..6 X. - format (A\n=ASCII, B\n=binary, X\n=XDR) byte 7... RXDR2 stream. Note: RXDR2 format in NOT a valid XDR format! Strings and raw bytes are not padded and thus cannot be read using XDR alone. */ /* we need to override this so that we don't have to really use libR */ SEXP R_NilValue = 0; /* those are directly from serialize.c */ #define REFSXP 255 #define NILVALUE_SXP 254 #define GLOBALENV_SXP 253 #define UNBOUNDVALUE_SXP 252 #define MISSINGARG_SXP 251 #define BASENAMESPACE_SXP 250 #define NAMESPACESXP 249 #define PACKAGESXP 248 #define PERSISTSXP 247 #define CLASSREFSXP 246 #define GENERICREFSXP 245 #define BCREPDEF 244 #define BCREPREF 243 #define EMPTYENV_SXP 242 #define BASEENV_SXP 241 /* map type to a name */ static const char *nameSEXP(int type) { switch (type) { case REFSXP: return REF; case NILVALUE_SXP: return NULL; case GLOBALENV_SXP: return .GlobalEnv; case UNBOUNDVALUE_SXP: return unbound; case MISSINGARG_SXP: return missing; case BASENAMESPACE_SXP: return base; case NAMESPACESXP: return NAMESPACE; case PACKAGESXP: return PACKAGE; case PERSISTSXP: return PERSIST; case CLASSREFSXP: return CLASSREF; case GENERICREFSXP: return GENERICREF; case BCREPDEF: return BC-REP-DEF; case BCREPREF: return BC-REP-REF; case EMPTYENV_SXP: return empty-env; case BASEENV_SXP: return base-env; case NILSXP: return NIL; case SYMSXP: return SYM; case LISTSXP: return LIST; case CLOSXP: return CLO; case ENVSXP: return ENV; case PROMSXP: return PROM; case LANGSXP: return LANG; case SPECIALSXP: return SPECIAL; case BUILTINSXP: return BUILTIN; case CHARSXP: return CHAR; case LGLSXP: return LGL; case INTSXP: return INT; case REALSXP: return REAL; case CPLXSXP: return CPLX; case STRSXP: return STR; case DOTSXP: return ...; case ANYSXP: return ANY; case VECSXP: return VEC; case EXPRSXP: return EXPR; case BCODESXP: return BCODE; case EXTPTRSXP: return EXTPTR; case WEAKREFSXP: return WEAKREF; case RAWSXP: return RAW; case S4SXP: return S4; } return ?; } /* again from serialize.c */ #define IS_OBJECT_BIT_MASK (1 8) #define HAS_ATTR_BIT_MASK (1 9) #define HAS_TAG_BIT_MASK (1 10) #define ENCODE_LEVELS(v) (v 12) #define DECODE_LEVELS(v) (v 12) #define DECODE_TYPE(v) (v 255) /* this structure is passed acros all functions. it encapsulates both the reading an book-keeping */ typedef struct { XDR xdrs; char *buf; long bs; FILE *f; int lev; char *flag; int refs; long *ref; /* reference offsets */ int maxrefs; /* length of the refes vector */ int verb; int mode; int flags; long target; FILE *copyf; } SaveLoadData; #define M_Read 0 #define M_NonRefCopy 1 #define M_Copy 2 #define M_NonRefSelect 3 #define F_NOREF 1 /* the following is partially based on src/main/saveload.c from R */ static void XdrInInit(FILE *fp, SaveLoadData *d, long sbsize) { xdrstdio_create(d-xdrs, fp, XDR_DECODE); d-buf = (char*) malloc(sbsize); if (!(d-buf)) error(_(cannot allocate memory for a string buffer)); d-bs = sbsize; d-f = fp; d-lev = 0; d-flag = 0; d-flags = 0; d-refs = 0; d-maxrefs = 2048; d-ref = (long*) malloc(sizeof(long)*d-maxrefs);
Re: [R] How to find the significant digits of a number?
On Wed, Dec 16, 2009 at 10:26 AM, Xiang Wu xiang@gmail.com wrote: Is there a function in R that could find the significant digit of a specific number? Such as for 3.1415, return '5'? Thanks in advance. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hi, x-pi substr(as.character(x),6,6) Regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] write.csv and header
On Mon, Dec 14, 2009 at 4:37 PM, Walther, Alexander awaltherm...@googlemail.com wrote: Dear list, I would like to export a matrix to a TXT-File by using write.csv (not necessarily). Is there a way to add a header (with additional informations concerning the project) spanning multiple lines to this file before the actual data are listed up? Should look like this: date: filename: number of permutations: data (as a matrix) Any suggestions? Thnx in advance. cheers Alex Hi, ?write.table and the argument append should be of help. example: sink(test.csv) cat(-) cat(\n) cat(This is \n a test of header) cat(\n) cat(-) cat(\n) sink() write.table(matrix(rnorm(100),nrow=10),file=test.csv,append=TRUE,sep=,) regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Literature analysis
On Fri, Dec 11, 2009 at 3:04 PM, Schwan s.s.hosse...@utwente.nl wrote: Thanks, but how should I put the citation inside a data frame? data.frame(first txt file, second txt file...) plot (what should I insert here) type=p And how should I load the txt files anyway inside the frame? Can you give an example of a couple of text files? Are they in a standardised format (i.e. bibTEX or similar)? /Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grep() exclude certain patterns?
Hi, Just a quick note regarding google and R: I use www.rseek.org almost exclusively, and it tends to give me the results I need. It is based on google, but uses a number of smart tricks to ferret out R-relevant information. /Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] savePlot for Mac and / or Linux?
On Mon, Dec 7, 2009 at 9:53 AM, Christophe Genolini cgeno...@u-paris10.frwrote: Hi all, In the package rtlu, I use the function savePlot. It is convenient since it let the user decide in which graphic format he wants his graph to be export. But when I run R CMD check, I get the following message : rtlu(V1,fileOutput=First.tex,textBefore=\\section{Variable 1 to 3},graphName=V1) Error in savePlot(filename = nomBarplot, type = type) : can only copy from 'windows' devices Calls: rtlu ... r2lUniv - r2lUniv.factor - r2lBarplot - savePlot Execution halted I guess this is a compatibility problem with Linux/Mac? Is there something close to savePlot for Mac / Linux? Christophe I'm not sure I understand exactly what you want, but for easy changing of the output file type, I've written this small function. Perhaps it can be of help. Regards, Gustaf - ###Function by Gustaf Rydevik, 2009-12-03 gustaf.ryde...@gmail.com ## Created to facilitate easy changes in the file format of generated graphs. ## Gen.device() generates a device function that is a copy of an existing function, but ## with (possibly) new defaults. ## Wanted.device can be the name of any device you choose: png(),jpeg(),postcript(),etc. ## if fileEnding is missing, the function uses the Wanted.device name as file ending. ## I then use My.device in the rest of the script file, meaning that I only have to change file ##format in one location (in the argument of Gen.device()) to do so for all susequent graphs. Gen.device-function(Wanted.device=png,fileEnding=NULL,...){ dots-list(...) ending-Wanted.device Wanted.device-get(Wanted.device) if(!is.null(fileEnding)) ending-fileEnding generated.device-function(File,...){ dots2-list(...) File-paste(File,ending,sep=.) dots[which(names(dots)%in%names(dots2))]-NULL do.call(Wanted.device,c(filename=File,dots,dots2)) } return(generated.device) } ##example My.device-Gen.device(png,width=7,height=7,units=in,res=Res-200) My.device(File=test) plot(rnorm(1999)) dev.off() -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help - IGARCH estimation
On Mon, Oct 19, 2009 at 6:37 AM, Xu, Ke-Li k...@bus.ualberta.ca wrote: Hi there, Thanks for your previous help on R. Do you know how to estimate an IGARCH (integrated GARCH) model in R? I need it when I estimate the Value at Risk following RiskMetrics methodology. regards, Keli Hi Keli, I would like to point you to the website: www.rseek.org I don't know anything at all about IGARCH, but a quick search pointed me to: http://rgarch.r-forge.r-project.org/ which seem to include the mentioned mode. I would also recommend that you subscribe to R-help at https://stat.ethz.ch/mailman/listinfo/r-help and send questions there instead of directly to me (who is not much of an R expert...) best regards, and good luck Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Function to find prime numbers
library(gmp) ?isprime /Gustaf On Tue, Oct 13, 2009 at 9:59 AM, AJ83 aljense...@gmail.com wrote: I need to create a function to find all the prime numbers in an array. Can anyone point me in the right direction? Thank you. AJ -- View this message in context: http://www.nabble.com/Function-to-find-prime-numbers-tp25868633p25868633.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Matching in R
On Sun, Apr 26, 2009 at 6:22 PM, dirk...@gmx.de wrote: Dear R users, I am trying to do exact matching on a large dataset (500.000 obs), about equal size of treatment and controll group, with replacement: As for the moment I use the Match function of the Matching library. I match on 2 covariates and all observations in the treatment group have at least one exact counterpart in the controllgroup. Now I want to introduce observation weights. I set ties=FALSE, as I want exactly one by one matching: Is there a way which makes that I draw randomly from the individuals in the controllgroup which have the same values of covariates as the individual in the treatmentgroup, setting the probabilities to be drawn proportional to the weights of the individual in the CT? E.g. I have three individuals which all have the same value for the covariates as the one observation I want to find a partner for, and the first of the three individuals has a very large weight: Now when drawing randomly among those three I want the probability that the first one is dr! awn to be very large. I'd really appreciate any suggestions: the weights option does not do the job, this seems to work only if setting ties=TRUE Thanks Dirk -- Hi Dirk, You don't give a sample dataset, and I've not used the Matching library, so take my comments with a scoop of salt. Looking at the help page for Match, it seems as if the option Weight.matrix is what you're looking for. creating a weight column in the treatment group with a constant, high value, including weight in the matching, and giving that covariate a high importance might work, no? /Gustaf - Quote: Weight.matrix This matrix denotes the weights the matching algorithm uses when weighting each of the covariates in X—see the Weight option. This square matrix should have as many columns as the number of columns of the X matrix. This matrix is usually provided by a call to the GenMatch function which finds the optimal weight each variable should be given so as to achieve balance on the covariates. For most uses, this matrix has zeros in the off-diagonal cells. This matrix can be used to weight some variables more than others. For example, if X contains three variables and we want to match as best as we can on the first, the following would work well: Weight.matrix - diag(3) Weight.matrix[1,1] - 1000/var(X[,1]) Weight.matrix[2,2] - 1/var(X[,2]) Weight.matrix[3,3] - 1/var(X[,3]) This code changes the weights implied by the inverse of the variances by multiplying the first variable by a 1000 so that it is highly weighted. In order to enforce exact matching see the exact and caliper options. -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Margins in lattice and device resolution
Hi all, I believe I've run into this before, but I seem to have totally forgotten. No headway in the last couple of hours either. How do I make sure that points and margins remain the same absolute size as I change the resolution of a device? (I'm running 2.9.1 patched, on a Win XP-machine) Many thanks in advance, Gustaf Ps: As an afterthought, might it be that this behaviour is related to the receng grid-bug for text size in lattice when changing resolution? Example: . ###This give a totally squished graph, where the actual plotting area is minimal CairoPNG(example.png,width = 480, height = 480, dpi=600, pointsize = 12, bg = white) bwplot(decrease ~ treatment, OrchardSprays, groups = rowpos, panel = panel.superpose, panel.groups = panel.linejoin, xlab = treatment, key = list(lines = Rows(trellis.par.get(superpose.line), c(1:7, 1)), text = list(lab = as.character(unique(OrchardSprays$rowpos))), columns = 4, title = Row position)) dev.off() ### This gives a more norma looking graph CairoPNG(example2.png,width = 480, height = 480, dpi=20, pointsize = 12, bg = white) bwplot(decrease ~ treatment, OrchardSprays, groups = rowpos, panel = panel.superpose, panel.groups = panel.linejoin, xlab = treatment, key = list(lines = Rows(trellis.par.get(superpose.line), c(1:7, 1)), text = list(lab = as.character(unique(OrchardSprays$rowpos))), columns = 4, title = Row position)) dev.off() -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] utils lacking namespace?
Hi all, A colleague of mine tried to install the package EMV, which had been removed from CRAN. she ran into some kind of trouble, R locked up, and she closed the program. Now when she starts R, utils can't be loaded which of course create an unworkable environment. Below I've copy-pasted the error message she gets when starting R. Any ideas on what went wrong, and more importantly, how to fix it? Many thanks in advance, Gustaf Rydevik Ps: She's running R on a WinXP box, if that might be of relevance... Error : package 'utils' does not have a name space R version 2.8.1 (2008-12-22) Copyright (C) 2008 The R Foundation for Statistical Computing ISBN 3-900051-07-0 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. Warning message: package methods in options(defaultPackages) was not found Error in library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, : 'utils' is not a valid package -- installed 2.0.0? -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] utils lacking namespace?
On Wed, Apr 15, 2009 at 12:20 PM, Duncan Murdoch murd...@stats.uwo.ca wrote: Gustaf Rydevik wrote: Hi all, A colleague of mine tried to install the package EMV, which had been removed from CRAN. she ran into some kind of trouble, R locked up, and she closed the program. Now when she starts R, utils can't be loaded which of course create an unworkable environment. Below I've copy-pasted the error message she gets when starting R. Any ideas on what went wrong, and more importantly, how to fix it? No idea of the details of what went wrong, but it looks as though your colleague has some bad startup file (Renviron, Rprofile, etc; see ?Startup for the full list) or has actually damaged her R installation. I'd try re-installing it first, because that's easy, then work through ?Startup and see if there are some bad files or environment variables messing things up. Duncan Murdoch Hi, and thanks for the help! It turned out after a bit of searching among the libraries file structrure that the utils catalogue had somehow been moved to the catalogue belonging to package NADA. It must have been some installation script (of the EMV package?) that for some reason moved it there, but heavens know why. Oh well, things got sorted out in the end anyhow, and all's well now! regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] same value in column--delete
On Thu, Mar 26, 2009 at 12:15 PM, Duijvesteijn, Naomi naomi.duijveste...@ipg.nl wrote: Hi Readers, I have a question. I have a large dataset and want to throw away columns that have the same value in the column itself and I want to know which column this was. For example x-data.frame(id=c(1,2,3), snp1=c(A,G, G),snp2=c(G,G,G),snp3=c(G,G,A)) x id snp1 snp2 snp3 1 1 A G G 2 2 G G G 3 3 G G A Now I want to know that snp2 in monomorphic (the same value for the column) and after I know which column it is I want to take these columns out. Thanks, Naomi Another, perhaps slightly more intuitive solution than Jim's would be the following: x-data.frame(id=c(1,2,3), snp1=c(A,G, G),snp2=c(G,G,G),snp3=c(G,G,A)) is.monovalued-function(df){ sapply(df,function(x){ length(unique(x))==1 }) } monovaluedCols-is.monovalued(x) which(monovaluedCols) x[!monovaluedCols] /Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] modifying a built in function from the stats package (fixing arima)
On Thu, Mar 5, 2009 at 10:00 AM, Marc Vinyes mvin...@aleasoft.com wrote: If you ***look at the code*** for arima you will see that ``%+%'' is defined in terms of a call to ``.Call()'' which calls ``R_TSconv''. So apparently R_TSconv is a C or Fortran function or subroutine in a ``shared object library'' or dll upon which arima depends. Hence to do anything with it you'll need to get that shared object library and dynamically load it. (E.g. get the code, SHLIB it, and dynamically load the resulting shared object library.) The code is all available from the R source tarball. If this is a challenge for you then the best advice would be not to mess with it. Hi Rolf, It took me some time to come to the same conclusion (I didn't even know what .Call() was) but I've found an easier way to modify the R file without having to understand how to link dlls. I just downloaded the full R package, Rtools and followed the instructions in http://cran.r-project.org/doc/manuals/R-admin.html#Building-the-core-files to build it. Then I can modify C:\R\src\library\stats\R\arima.R and run it. It is quite exagerated that I have to build R in order to modify an R file without messing with dlls, and I think it would be interesting to make this process easier, but for now I'm happy to be productive again. Thank you all for your help, Best, MarC Just a quick note on your original question: if you use edit(arima), you have to remember that it returns the modified function, which then must be stored. I.e, use arima-edit(arima) instead of just edit(arima) ,and changes should be stored. Regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Have a function like the _n_ in R ? (Automatic count function )
On Wed, Feb 25, 2009 at 3:30 PM, hadley wickham h.wick...@gmail.com wrote: And for completeness here's a function that returns the next integer on each call. n - (function(){ i - 0 function() { i - i + 1 i } })() n() [1] 1 n() [1] 2 n() [1] 3 n() [1] 4 n() [1] 5 n() [1] 6 ;) Hadley *headache*! I can't wrap my head around this one - too strange code! Could someone please give a hint on what's going on? How doesi- i+1 modify i permanently, seeing as i is defined as 0 to start with? /Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Have a function like the _n_ in R ? (Automatic count function )
On Wed, Feb 25, 2009 at 4:43 PM, Charles C. Berry cbe...@tajo.ucsd.edu wrote: On Wed, 25 Feb 2009, Gustaf Rydevik wrote: On Wed, Feb 25, 2009 at 3:30 PM, hadley wickham h.wick...@gmail.com wrote: And for completeness here's a function that returns the next integer on each call. n - (function(){ i - 0 function() { i - i + 1 i } })() n() [1] 1 n() [1] 2 n() [1] 3 n() [1] 4 n() [1] 5 n() [1] 6 ;) Hadley *headache*! I can't wrap my head around this one - too strange code! Could someone please give a hint on what's going on? How doesi- i+1 modify i permanently, seeing as i is defined as 0 to start with? i is not _defined_ as zero. It is initially _assigned_ the value of zero and is subsequently assigned other values. As for the details of what goes here, see An Introduction to R Section 10.7 Scope and study the open.acount() example there. HTH, Chuck Thank you - I think I finally understood how that code got parsed. Does the text below describe things correctly? First, Hadley defines a function that returns another function, like this: function(){ i - 0 function() { i - i + 1 i } } Since the returned function is defined in a local environment , R returns the function together with that local environment, and lexical scoping can work it's magic Finally Hadley evaluates the above defined function-returning function, and stores the returned function in n. n-function(){ i - 0 function() { i - i + 1 i } }() *Phew* That wasn't too difficult after all :-) /Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] an S idiom for ordering matrix by columns?
On Thu, Feb 19, 2009 at 5:40 PM, Aaron Mackey ajmac...@gmail.com wrote: There's got to be a better way to use order() on a matrix than this: y 2L-035-3 2L-081-23 2L-143-18 2L-189-1 2R-008-5 2R-068-15 3L-113-4 3L-173-2 3981 1 221 12 2 8571 1 221 22 2 9111 1 221 22 2 3831 1 221 12 2 6391 2 212 21 2 7561 2 212 21 2 3L-186-1 3R-013-7 3R-032-1 3R-169-10 X-002 X-087 398122 2 1 2 857122 2 1 2 911122 2 1 2 383122 2 1 2 639221 2 1 2 756221 2 1 2 y[order(y[,1],y[,2],y[,3],y[,4],y[,5],y[,6],y[,7],y[,8],y[,9],y[,10],y[,11],y[,12],y[,13],y[,14]),] 2L-035-3 2L-081-23 2L-143-18 2L-189-1 2R-008-5 2R-068-15 3L-113-4 3L-173-2 3981 1 221 12 2 3831 1 221 12 2 8571 1 221 22 2 9111 1 221 22 2 6391 2 212 21 2 7561 2 212 21 2 3L-186-1 3R-013-7 3R-032-1 3R-169-10 X-002 X-087 398122 2 1 2 383122 2 1 2 857122 2 1 2 911122 2 1 2 639221 2 1 2 756221 2 1 2 Thanks for any suggestions! -Aaron You mean something like this: test-matrix(sample(1:4,100,replace=T),ncol=10) test[do.call(order,data.frame(test)),] ? Regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Alternate to for-loop
On Mon, Feb 16, 2009 at 12:59 PM, megh megh700...@yahoo.com wrote: Hi, I am trying to create a vector of length 10 (say), wherein each element will be average of random sample of size 100, from a distribution, say Normal. Can anyone please tell me without creating a for loop, how I can do that? Regards, -- View this message in context: http://www.nabble.com/Alternate-to-for-loop-tp22035954p22035954.html Sent from the R help mailing list archive at Nabble.com. as a variant of Patrick Burns code, you can write: rowMeans(matrix(rnorm(1000),ncol=100)) ,and substitute another distribution for rnorm if you want. /Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Generating Numbers With Certain Distribution in R
On Wed, Feb 11, 2009 at 2:15 PM, Ben Bolker bol...@ufl.edu wrote: Bernardo Rangel Tura wrote: I think your routine need a little fix x - rlnorm(1e6,meanlog=1,sdlog=1) ## pick any parameters you like y - round((x-min(x)/diff(range(x)))*19+1) What you think? Yes. No. Bernardo misplaced the parenthesis around (x-min(x)) Correct version is: x - rlnorm(1e6,meanlog=1,sdlog=1) ## pick any parameters you like y - round((x-min(x))/diff(range(x))*19+1) /Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to comment in R
On Wed, Feb 11, 2009 at 2:15 PM, baptiste auguie ba...@exeter.ac.uk wrote: A somewhat twisted approach that has not been mentioned is to consider everything a comment unless it is enclosed in special tags, as done in the brew package, for example, brew(textConnection( You won't see this R output, but it will run. % foo - 'bar' % Now foo is %=foo% and today is %=format(Sys.time(),'%B %d, %Y')%. ) ) gives, You won't see this R output, but it will run. Now foo is bar and today is February 11, 2009. I'd love to see an editor with a brew mode that acts as a notebook: you type in your text in whatever language without worrying about the syntax (R syntax, i mean!), and when you want to do a calculation you just enclose it in such tags that behave like an inverted block comment. Just a thought, baptiste Isn't this almost exactly what ?Sweave does? (and odfWeave). Granted, you have to deal with latex code to get nice output, but latex is a GoodThing (tm). /Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Beginner-how to down size a large sample
On Wed, Feb 11, 2009 at 3:15 PM, pramil cheriyath drpra...@gmail.com wrote: I have this large data set with an outcome variable 0 and 1, I want to randomly pick 100 from each group and create another data set. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. dataSet-data.frame(group=sample(c(1,0),1,replace=T),data=rnorm(1)) dataSet.1-dataSet[dataSet$group==1,] dataSet.0-dataSet[dataSet$group==0,] sampled.1-dataSet.1[sample(1:nrow(dataSet.1),100),] sampled.0-dataSet.0[sample(1:nrow(dataSet.0),100),] newdataSet-rbind(sampled.1,sampled.0) /Gustaf (a please, would have been nice) -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting session days
On Mon, Feb 9, 2009 at 4:57 PM, stefan.peters...@inizio.se wrote: hi, I have some session data in a dataframe, where each session is recorded with a start and a stop date. Like this: session_start session_stop === 2009-01-03 2009-01-04 2009-01-01 2009-01-05 2009-01-02 2009-01-09 A session is at least one day long. Now I want a dataframe with 'active sessions' per date. Like this: dateactive_sessions = 2009-01-01 1 2009-01-02 2 2009-01-03 3 2009-01-04 3 2009-01-05 2 2009-01-06 1 2009-01-07 1 2009-01-08 1 2009-01-09 1 How do I do that? I've searched the usual sources, but my newbie status and language barrier left me with nothing. So plz, anyone? Hej Stefan, The following should do. It's a bit convoluted though, so someone else might be able to come up with a better solution. test start stop 1 2009-01-03 2009-01-04 2 2009-01-01 2009-01-05 3 2009-01-02 2009-01-09 activeDaysPerSession-apply(test,MARGIN=1,FUN=function(x) seq(from=as.Date(x[start]), to=as.Date(x[stop]),by=1 ) ) ActiveDays-as.Date(unlist(activeDaysPerSession)) as.data.frame(table(ActiveDays)) Regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Selectively Removing objects
On Mon, Feb 2, 2009 at 2:16 PM, Paulo Grahl pgr...@gmail.com wrote: Dear list members, Does anyone know how to use rm() to remove only variables but not declared functions from the environment ? I understand I could name all the functions with, let's say f_something, make sure that all variables do not start with f_ and then remove all BUT objects starting with f_. However, I have already defined all the functions and it would be troublesome to change all of them to a new name. Any hint ? Thanks Paulo Gustavo Grahl, CFA [Note to Paulo:I changed the code slightly: defining Nonfunctions separately messed things up.] Hi Paulo, The following should do it. test-function(x)x^2 test2-5 test3-77 ls() rm(list=ls()[ sapply(ls(), function(x){ class(get(x))!=function }) ]) ls() Regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ifelse help?
On Mon, Jan 19, 2009 at 9:08 PM, rkevinbur...@charter.net wrote: Sorry I didn't give the proper initialization of j. But you are right j should also be an array of 5. So x[j + 5] would return 5 values. So if the array returned from 'ifelse' is the same dimention as test (h), then are all the values of h being tested? So since h as you say has no dimensions is the test only testing h[1]? Again it seems that if all of the elements of h are tested (there are 5 elements) and each element produces an array of 5 the resulting array should be 25. Kevin ifelse returns values row-by-row, so to speak. in this case, it will return the vector: c(x[j+2][1] , x[j+2][2] , x[j+2][3] , x[j+2][4] , x[j+2][5]). If you instead write: h-numeric(5) j-1:5 p - 1:5 x-1:1000 ifelse(h == 0,list(x[j+2]), 1:5) ,you get what you expected, since ifelse recycles the second argument if necessary. Regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] two-sample test of multinomial proportion
Hi all, This is perhaps more a statistics question than an R question, but I hope it's OK anyhow. I have some data (see below) with the number of tests positive to subtype H1 of a virus, the number of tests postive to subtype H3, and the total number of tests. This is for two different groups, and the two subtypes are mutually exclusive. What is the best way to test if the proportion of H1 tests to all positive tests differ between the two groups? I could run prop.test() on just the H1 and H3 part of the data, ignoring the total number of tests. But this seem to skip some information regarding variance of H1/H3 in the two groups, so I don't think it is correct. I've tried using a bootstrap approach on the ratio of the two proportions, but there must be a smarter way. Any help is much appreciated! Best regards, Gustaf Rydevik data and bootstrap attempt ### multi.data-data.frame( group=c(a,b), H1=c(2,12), H3=c(21,46), tests=c(189,411) ) multi.ind-data.frame(Type= rep(c(H1,H3,Neg),c(2+12,21+46,189+411-2-12-21-46)), group=rep(c(a,b,a,b,a,b),c(2,12,21,46,189-2-21,411-12-46)) ) props1-vector(mode=numeric,length=1000) props2-vector(mode=numeric,length=1000) for(i in 1:1000){ sub.tab-t(table(Subtyp.orig[sample(1:nrow(Subtyp.orig),nrow(Subtyp.orig),replace=TRUE),])) props1[i]-sub.tab[1,1]/(sub.tab[1,1]+sub.tab[1,2]) props2[i]-sub.tab[2,1]/(sub.tab[2,1]+sub.tab[2,2]) } sub.kvot-props1/props2 sort(sub.kvot)[50] sort(sub.kvot)[950] -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] two-sample test of multinomial proportion
On Tue, Jan 20, 2009 at 4:08 PM, Gustaf Rydevik gustaf.ryde...@gmail.com wrote: Hi all, This is perhaps more a statistics question than an R question, but I hope it's OK anyhow. I have some data (see below) with the number of tests positive to subtype H1 of a virus, the number of tests postive to subtype H3, and the total number of tests. This is for two different groups, and the two subtypes are mutually exclusive. What is the best way to test if the proportion of H1 tests to all positive tests differ between the two groups? I could run prop.test() on just the H1 and H3 part of the data, ignoring the total number of tests. But this seem to skip some information regarding variance of H1/H3 in the two groups, so I don't think it is correct. I've tried using a bootstrap approach on the ratio of the two proportions, but there must be a smarter way. Any help is much appreciated! Best regards, Gustaf Rydevik data and bootstrap attempt ### multi.data-data.frame( group=c(a,b), H1=c(2,12), H3=c(21,46), tests=c(189,411) ) multi.ind-data.frame(Type= rep(c(H1,H3,Neg),c(2+12,21+46,189+411-2-12-21-46)), group=rep(c(a,b,a,b,a,b),c(2,12,21,46,189-2-21,411-12-46)) ) props1-vector(mode=numeric,length=1000) props2-vector(mode=numeric,length=1000) for(i in 1:1000){ sub.tab-t(table(Subtyp.orig[sample(1:nrow(Subtyp.orig),nrow(Subtyp.orig),replace=TRUE),])) props1[i]-sub.tab[1,1]/(sub.tab[1,1]+sub.tab[1,2]) props2[i]-sub.tab[2,1]/(sub.tab[2,1]+sub.tab[2,2]) } sub.kvot-props1/props2 sort(sub.kvot)[50] sort(sub.kvot)[950] -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik ooops - forgot to change a name of the bootstrap code. Below is a corrected version. /Gustaf data and bootstrap attempt ### multi.data-data.frame( group=c(a,b), H1=c(2,12), H3=c(21,46), tests=c(189,411) ) multi.ind-data.frame(Type= rep(c(H1,H3,Neg),c(2+12,21+46,189+411-2-12-21-46)), group=rep(c(a,b,a,b,a,b),c(2,12,21,46,189-2-21,411-12-46)) ) props1-vector(mode=numeric,length=1000) props2-vector(mode=numeric,length=1000) for(i in 1:1000){ sub.tab-t(table(multi.ind[sample(1:nrow(multi.ind),nrow(multi.ind),replace=TRUE),])) props1[i]-sub.tab[1,1]/(sub.tab[1,1]+sub.tab[1,2]) props2[i]-sub.tab[2,1]/(sub.tab[2,1]+sub.tab[2,2]) } sub.kvot-props1/props2 sort(sub.kvot)[50] sort(sub.kvot)[950] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data frames with å, ä, and ö (=n on-ASCII-characters) from windows to mac os x
Hi, I ran into this issue previously and managed to solve it, but I've forgotten how and am getting frustrated... I have a data frame (see below) with scandinavian characters in R (2.7.1) running on a Win Xp-computer. I save the data frame in an RData-file on a usb stick, and load() it in R (2.8.0) running on OS X 10.5. Now the name of the data frame and all factor labels with scandinavian characters are scrambled. How do I make R in OS X read my data frame? From what I've managed to find in the list archives and the FAQ I either 1) run Sys.setlocale(LC_ALL,en_US.UTF-8) ### Doesn't change anything or 2) run defaults write org.R-project.R force.LANG en_US.UTF-8 in the terminal, which doesn't help either. I must admit that I couldn't quite follow what documentation i found on locales, so I might have messed up somewhere along the line. Many thanks in advance for your help! Regards, Gustaf Länkarta - structure(list(LANKOD = structure(c(11L, 19L, 10L, 13L, 21L, 7L, 9L, 18L, 8L, 3L, 16L, 6L, 5L, 4L, 15L, 2L, 20L, 17L, 1L, 14L, 12L), .Label = c(AB, AC, BD, C, D, E, F, G, H, I, K, M, N, O, S, T, U, W, X, Y, Z ), class = factor), Län = structure(c(1L, 4L, 3L, 5L, 6L, 7L, 8L, 2L, 9L, 10L, 20L, 21L, 13L, 14L, 15L, 16L, 17L, 18L, 12L, 19L, 11L), .Label = c(Blekinge län, Dalarnas län, Gotlands län, Gävleborgs län, Hallands län, Jämtlands län, Jönköpings län, Kalmar län, Kronobergs län, Norrbottens län, Skåne län, Stockholms län, Södermanlands län, Uppsala län, Värmlands län, Västerbottens län, Västernorrlands län, Västmanlands län, Västra Götalands län, Örebro län, Östergötlands län), class = factor)), .Names = c(LANKOD, Län), class = data.frame, row.names = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)) -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] reshape, direction=long: multiple row names not allowed
Hi all, for some reason I always get stuck spending hours when trying to use reshape or the Reshape package. Heaven knows why. My latest frustration (in 2.7.1, so ignore if this has been fixed): test-data.frame(matrix(rnorm(42*4),ncol=4),rep(1:21,2),rep(c(a,b),each=21)) reshape(test,varying=list(colnames(test)[1:4]),direction=long) test-data.frame(matrix(rnorm(42*4),ncol=4),id=rep(1:21,2),rep(c(a,b),each=21)) reshape(test,varying=list(colnames(test)[1:4]),direction=long) The first works, but the second does not. The only information on why is that duplicate row names are not allowed. It took me a fair time before figuring out that it was the id-column that caused problems. Perhaps something to fix, or at least give a more informative error message? Best regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reshape, direction=long: multiple row names not allowed
On Wed, Jan 14, 2009 at 3:07 PM, hadley wickham h.wick...@gmail.com wrote: On Wed, Jan 14, 2009 at 5:51 AM, Gustaf Rydevik gustaf.ryde...@gmail.com wrote: Hi all, for some reason I always get stuck spending hours when trying to use reshape or the Reshape package. Heaven knows why. My latest frustration (in 2.7.1, so ignore if this has been fixed): test-data.frame(matrix(rnorm(42*4),ncol=4),rep(1:21,2),rep(c(a,b),each=21)) reshape(test,varying=list(colnames(test)[1:4]),direction=long) test-data.frame(matrix(rnorm(42*4),ncol=4),id=rep(1:21,2),rep(c(a,b),each=21)) reshape(test,varying=list(colnames(test)[1:4]),direction=long) The first works, but the second does not. The only information on why is that duplicate row names are not allowed. It took me a fair time before figuring out that it was the id-column that caused problems. Perhaps something to fix, or at least give a more informative error message? Well there isn't any problem with the reshape package: test - data.frame( matrix(rnorm(42 * 4), ncol = 4), A = rep(1:21,2), B = rep(c(a,b), each = 21) ) library(reshape) melt(test, id = c(A, B)) but I'm not sure what you're trying to achieve. Hadley PS. Usingwhitespacemakesyourcodeeasiertoread! -- http://had.co.nz/ Hi, sorry, I didn't mean to imply that the Reshape package fails here. Just that for some reason I find it difficult to wrap my head around the syntax of both the reshape command and the Reshape package... Your code was exactly was I was trying to achieve btw. Thank you! regards, Gustaf ps: Butitisooeasytojustcodewithoutbotheringaboutmerehumanreadability. :-) -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] getting ISO week
Hi all, Is there a simple function already implemented for getting the ISO weeks of a Date object? I couldn't find one, and so wrote my own function to do it, but would appreciate a pointer to the default way. If a function is not yet implemented, could the code below be of interest to submit to CRAN? Best Regards, Gustaf getweek-function(Y,M=NULL,D=NULL){ if(!class(Y)[1]%in%c(Date,POSIXt)) { date.posix-strptime(paste(c(Y,M,D),collapse=-),%Y-%m-%d) } if(class(Y)[1]%in%c(POSIXt,Date)){ date.posix-as.POSIXlt(Y) Y-as.numeric(format(date.posix,%Y)) M-as.numeric(format(date.posix,%m)) D-as.numeric(format(date.posix,%d)) } LY- (Y%%4==0 !(Y%%100==0))|(Y%%400==0) LY.prev- ((Y-1)%%4==0 !((Y-1)%%100==0))|((Y-1)%%400==0) date.yday-date.posix$yday+1 jan1.wday-strptime(paste(Y,01-01,sep=-),%Y-%m-%d)$wday jan1.wday-ifelse(jan1.wday==0,7,jan1.wday) date.wday-date.posix$wday date.wday-ifelse(date.wday==0,7,date.wday) If the date is in the beginning, or end of the year, ### does it fall into a week of the previous or next year? Yn-ifelse(date.yday=(8-jan1.wday)jan1.wday4,Y-1,Y) Yn-ifelse(Yn==Y((365+LY-date.yday)(4-date.wday)),Y+1,Y) ##Set the week differently if the date is in the beginning,middle or end of the year Wn-ifelse( Yn==Y-1, ifelse((jan1.wday==5|(jan1.wday==6 LY.prev)),53,52), ifelse(Yn==Y+1,1,(date.yday+(7-date.wday)+(jan1.wday-1))/7-(jan1.wday4)) ) return(list(Year=Yn,ISOWeek=Wn)) } -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] getting ISO week
On Thu, Dec 11, 2008 at 2:10 PM, Prof Brian Ripley [EMAIL PROTECTED] wrote: A slightly simpler version is format(Sys.Date(), %V) On Thu, 11 Dec 2008, Prof Brian Ripley wrote: strftime(x, %V) E.g. strftime(as.POSIXlt(Sys.Date()), %V) is 50, and you might want as.numeric() on it. Note that this is OS-dependent, and AFAIR Windows does not have it. - On Thu, Dec 11, 2008 at 2:15 PM, Gabor Grothendieck [EMAIL PROTECTED] wrote: format(d, %U) and format(d, %W) give week numbers using different conventions. See ?strptime Thank you both for your replies! I'm on windows, so prof Ripleys solution does not work (why is this OS-dependent?). Regarding Gabor's solution, neither convention follow the ISO 8601 standard, which is used in Europe (and Sweden in particular). See http://en.wikipedia.org/wiki/ISO_8601#Week_dates . So it seems that my function does fill a hole, however small I know that for me, working with week numbers, which are used quite heavily in Sweden, have always been a major frustration. Would it be possible to implement something similar to my solution in base, and how should I go about making it fit in to the rest of the date functions? /Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question involving loops from intro level R programming class
On Fri, Nov 28, 2008 at 8:35 AM, Heidi Wong [EMAIL PROTECTED] wrote: a. Write a R function zerdiag.v1(m) using loop to output a square matrix whose diagonal elements are zero and the other elements are filled in by consecutive integers from 1 to m row-wise. For example, zerdiag.v1(6) = [0, 1, 2] [3, 0, 4] [5, 6, 0] This function should have error checking ability. If the input m cannot form a square matrix, then the function will return an error message: Input number is incorrect. b. Write a R function zerdiag.v2(m) to produce the same output as in part (a) without using a loop. c. Test your functions in part (a) and (b) using m=12 and m=14 respectively. I'd appreciate any help with this problem... I've spent a lot of time staring at it, and I'm still not sure where to start. Thanks! [[alternative HTML version deleted]] Hi, Nice little brain teaser! Not too difficult, but requires a bit of creative thinking... You might wanna have a look at, for example, ?diag, ?uniroot, or ?polyroot. regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding Stopping time
Hi Debanjan, It would be more likely that you get a response if your question was more clear. Your code is very difficult to read, and it doesn't help that you don't provide any context, or comment your code with ### This is calculating the average kind of statements. What are you trying to do? Anyhow, after quite a bit of effort trying to understand what you've done, I found your (simple!) mistake: Since you are resetting the k counter after your first try, you need to change your k constant in that big quantity you're calculating to (k-N[j-1]), like this: T[k] - (((k-N[j-1])/2)*log(theta1/theta2))+(((theta2-theta1)/(2*theta1*theta2))*smm[k])-((k-N[j-1])*(theta2-theta1)/2) As an aside, try not to use variables defined outside a function in the function code (in this case your x). It makes the code more difficult to follow, and far more likely to break. Regards, Gustaf On Wed, Nov 26, 2008 at 4:04 PM, Debanjan Bhattacharjee [EMAIL PROTECTED] wrote: Can any one help me to solve problem in my code? I am actually trying to find the stopping index N. So first I generate random numbers from normals. There is no problem in finding the first stopping index. Now I want to find the second stopping index using obeservation starting from the one after the first stopping index. E.g. If my first stopping index was 5. I want to set 6th observation from the generated normal variables as the first random number, and I stop at second stopping index. This is my code, alpha - 0.05 beta - 0.07 a - log((1-beta)/alpha) b - log(beta/(1-alpha)) theta1 - 2 theta2 - 3 cumsm-function(n) {y-NULL for(i in 1:n) {y[i]=x[i]^2} s=sum(y) return(s) } psum - function(p,q) {z - NULL for(l in p:q) { z[l-p+1] - x[l]^2} ps - sum(z) return(ps) } smm - NULL sm - NULL N - NULL Nout - NULL T - NULL k-0 x - rnorm(100,theta1,theta1) for(i in 1:length(x)) { sm[i] - psum(1,i) T[i] - ((i/2)*log(theta1/theta2))+(((theta2-theta1)/(2*theta1*theta2))*sm[i])-(i*(theta2-theta1)/2) if (T[i]=b | T[i]=a){N[1]-i break} } for(j in 2:200) { for(k in (N[j-1]+1):length(x)) { smm[k] - psum((N[j-1]+1),k) T[k] - ((k/2)*log(theta1/theta2))+(((theta2-theta1)/(2*theta1*theta2))*smm[k])-(k*(theta2-theta1)/2) if (T[k]=b | T[k]=a){N[j]-k break} } } But I cannot get the stopping index after the first one. Tanks -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple rep() question duplicating times and dates.
On Wed, Nov 5, 2008 at 4:02 PM, John Kane [EMAIL PROTECTED] wrote: I want to create a data.frame of time and date for a year. I started with the idea of simply producing two vectors (time and date) The first part ( time) is easy. rep(1:24, 365) But how do I get a series of 24 dates for O1 January 2005 and repeat this to 31 December 2005. It should be easy but I don't see it. Thanks Hi John, ?Date leads you to (among other things) ?seq.Date. Something like this should work: time-rep(1:24, 365) dates-seq(as.Date(01012005,format=%d%m%Y),as.Date(31122005,format=%d%m%Y),by=1) TimeFrame-data.frame(time) TimeFrame$dates-rep(dates,each=24) Regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to get the duplicated elements from a vector?
On Wed, Oct 29, 2008 at 2:47 PM, Leon Yee [EMAIL PROTECTED] wrote: Dear all, How can I get the duplicated elements from a vector? For example, x - c(yes, no, yes, yes, no, not sure), how can I filter out all the elements which occured =2 times? Thanks for any help! Regards, Leon Hi Leon, unique(x) or duplicated(x) should work, depending on what you want. Best, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to get the duplicated elements from a vector?
On Wed, Oct 29, 2008 at 3:45 PM, Erik Iverson [EMAIL PROTECTED] wrote: Leon Yee wrote: Gustaf Rydevik wrote: Hi Leon, unique(x) or duplicated(x) should work, depending on what you want. Best, Gustaf Hi, Thank you all. Actually, I have a data frame or matrix, whose first column is numerical values, and whose 2nd column is names. Then you have a data.frame, as matrices in R are of homogeneous type. I need those whose names repeated 3 times and get the mean of the 3 values for each repeated names. It sounds that I need some programming work. Yes, but not much ## BEGIN R CODE ## guarantees there is at least one level with exactly three elements, ## which your problem seems to require t1 - data.frame(a = rnorm(10), b = c(D, D, D, sample(LETTERS[1:3], 7, replace = TRUE))) ## find which names have exactly three elements t2 - subset(t1, b %in% names(which(table(t1$b) == 3))) ## note that the elements of the returned value depend on what was ## originally in your data set's 'b' column tapply(t2$a, t2$b, mean) ## END R CODE I'm always forgetting about the ave function. Using that one, here's another way: temp-data.frame(Num=sample(1:1000,100),Names=sample(letters[1:25],100,replace=T)) temp$count-ave(rep(1,nrow(temp)),temp$Names,FUN=sum) temp$MeanOfThree[temp$count==3]- ave(temp$Num[temp$count==3],temp$Names[temp$count==3]) /Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Automatically adjust text size in plot
Hi all, I'm writing a function that will automatically generate a report based on answers to a questionnaire. The exact questions and answers to the questionnaire can vary. One of the question types is in a matrix format, where the agreement to several statements can be indicated on a scale. I'm planning to plot this on a multilevel barplot, and only labeling each bar column once. However, I'm stuck as to how I should adjust text size and wrapping to fit to each column. Here's an example of what I mean: barnames-c(I agree completely, I agree, I partly agree, I do not agree, I really hate this stupid question, don't you?) answers-data.frame(question=paste(Q,1:5,sep=),S1=sample(1:100,5),S2=sample(1:100,5),S3=sample(1:100,5),S4=sample(1:100,5),S5=sample(1:100,5)) Width-50 Cex-1.5 par(mfrow=c(nrow(answers)+1,1),mar=c(0,1,1,1)) plot.new() plot.window(xlim=c(0,1),ylim=c(0,1)) barnames.plot-do.call(c,lapply(barnames,function(x)paste(strwrap(x,Width),collapse=\n))) text(barnames.plot,x= (seq.int(0, 1, length.out = length(barnames)+1)-0.5/length(barnames))[-1],y=0.5,cex=Cex) for(i in 1:nrow(answers)){ barheight-rep(0,length(barnames)) barheight[as.numeric(names(subQ.tables[[i]]))]-subQ.tables[[i]] barplot(barheight,space=0) } The question is, how do I figure out the appropriate Width and Cex parameters as a function of barnames? That is, with varying text lengths of the barnames, varying number of alternatives etc, and independent on which device type is used? strwrap uses column as units, and I can't really figure out how to convert that to graph units. Same goes for cex. Many thanks in advance, Gustaf PS: As an alternative, if someone could come up with a better way do display this type of data, I'd be all ears. I'm not too happy with my current solution -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Combining all possible values of variables into a new...
On Mon, Oct 20, 2008 at 4:10 PM, [EMAIL PROTECTED] wrote: I'm trying to create a new column in my data.frame where subjects are categorized depending on values on four other columns. In any other case I would just nest a few ifelse statements, however, in this case i have 4*6*2*3=144 combinations and i get weird 'context overflow' errors. So I wonder if there is a more efficient way of doing this. For illustrational purposes, let's say i have: x-c(1,0,0,1,0,0,1,0,0,1) y-c(1,3,2,3,2,1,2,3,2,3) z-c(1,2,1,2,1,2,1,2,1,2) d-as.data.frame(cbind(x,y,z)) and i do: d$myvar - ifelse(d$x == 0 d$y==1 d$z==1 , d$myvar - 1, ifelse(d$x == 0 d$y==1 d$z==2 , d$myvar - 2, ifelse(d$x == 0 d$y==2 d$z==1 , d$myvar - 3, ifelse(d$x == 0 d$y==2 d$z==2 , d$myvar - 4, ifelse(d$x == 0 d$y==3 d$z==1 , d$myvar - 5, ifelse(d$x == 0 d$y==3 d$z==2 , d$myvar - 6, ifelse(d$x == 1 d$y==1 d$z==1 , d$myvar - 7, ifelse(d$x == 1 d$y==1 d$z==2 , d$myvar - 8, ifelse(d$x == 1 d$y==2 d$z==1 , d$myvar - 9, ifelse(d$x == 1 d$y==2 d$z==2 , d$myvar - 10, ifelse(d$x == 1 d$y==3 d$z==1 , d$myvar - 11, ifelse(d$x == 1 d$y==3 d$z==2 , d$myvar - 12, NA Suggestions? How about the following? x-c(1,0,0,1,0,0,1,0,0,1) y-c(1,3,2,3,2,1,2,3,2,3) z-c(1,2,1,2,1,2,1,2,1,2) d-as.data.frame(cbind(x,y,z)) xyz.comb-interaction(x,y,z,lex.order=T) d$myvar-match(xyz.comb,levels(xyz.comb)) /Gustaf Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading Data
On Tue, Oct 7, 2008 at 10:36 AM, [EMAIL PROTECTED] wrote: Hi, I have a data in which the first row is in date format and the first column is in text format and rest all the entries are numeric. Whenever I am trying to read the data using read.table, the whole of my data is converted in to the text format. Please suggest what shall I do because using the numeric data which are prices I need to calculate the return but if these prices are not numeric then calculating return will be a problem regards Rahul Agarwal Analyst Equities Quantitative Research UBS_ISC, Hyderabad On Net: 19 533 6363 Hi, A single column in a data frame can't contain mixed formats. In the absence of example data, would guess one of the following could work : 1) read.table(data.txt,skip=1, header=T) ## If you have headers 2) read.table(data.txt, header=T) ## If the date row is supposed to be variable names. 3) read.table(data.txt,skip=1) ## If there are no headers, and you want to ignore the date regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with Grep Under Loop
On Mon, Oct 6, 2008 at 1:37 PM, Gundala Viswanath [EMAIL PROTECTED] wrote: Dear all, I have no problem with this individual grep command: datk - grep(XM_528056, source$V1) dat2 - source[datk,] print(dat2) V1 V2 V3 V4 V5 V6 V7 35995 XM_528056 panTro2 chr8 + 1775569 1896107 Chimpanzee BUT, when I run them under the loop it gives this error: hm_acc - c(XM_528056,AB002296) for (i in 1:length(hm_acc)){ + +hm_acc_id - as.character(hm_acc[i]) +print(hm_acc_id) + +hm_allk - grep(hm_acc_id,source$V1) +hm_all - source[hm_allk,] + +print(hm_all) + } [1] XM_528056 [1] V1 V2 V3 V4 V5 V6 V7 0 rows (or 0-length row.names) [1] AB002296 [1] V1 V2 V3 V4 V5 V6 V7 0 rows (or 0-length row.names) . What's wrong with my way of using grep? Please advice. - Gundala Viswanath Jakarta - Indonesia Hi, Could you give us a small sample of the source data, so that your example is reproducible? From looking at your code, it seems as if you copied something wrong. First you write: grep(XM_528056, source$V1) ,but when you print dat2 it seems as if your ID-code (XM_528056) is in V2, not V1. regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function in R
On Thu, Oct 2, 2008 at 1:34 PM, Alphonse Monkamg [EMAIL PROTECTED] wrote: Dear ALL, Does anyone know how to get the complete code program for any build-in function in R, e.g. when I tape mean in the R-console, I get the following: mean function (x, ...) UseMethod(mean) environment: namespace:base but I need the full mean function. Thank in advance, Alphonse. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hi Alphonse, mean is a so-called generic function, that behaves differently depending on what class it's argument is. writing: ?UseMethod ,explains a bit of this, and points you to: ?methods So you can write methods(mean) , and see which functions exist. For example mean.default, or mean.data.frame, for which you can have a look at the code. An added complication is that these functions are calling C-code by using .Internal. This C-code can be found in cran, but as I don't know C, I've never tried it out more than having a quick look. But it's there if you want it. Regards, Gustaf Rydevik -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about multiple regression
On Mon, Sep 8, 2008 at 7:47 PM, Dimitri Liakhovitski [EMAIL PROTECTED] wrote: Thank you everyone for your responses. I'll answer several questions. 1. Disclaimer: I have **NO IDEA** of the details of what you want to do or why -- but I am willing to bet that there are better ways of doing it than 1.8 mm multiple refressions that take 270 secs each!! (which I find difficult to believe in itself -- are you sure you are doing things right? Something sounds very fishy here: R's regression code is typically very fast). I probably should not bore everyone, but just to explain where the large number is coming from. I have an experimental design with 7 factors. Each factor has between 3 and 5 levels. Once you cross them all, you end up with 18,000 cells. For each cell, I want to generate a sample of N=100. For each sample I have to analyze the data using 3 different statistical methods of analysis (the goal of the Monte-Carlo) is to compare those methods. One of the methods requires running of up to ~32,000 simple multiple regressions - yes just for one sample and it's not a mistake. I test-ran one such analysis for a sample with N=800 and 15 predictors and it took 270 seconds. R was actually very fast - it ran each of the individual regressions in about 0.008 seconds. Still I need something faster. 2. Sorry - what was the formula sum(lm.fit(x,y))$residuals^2) for? For example, using it on my data, I got a value of 36,644... 3. I know that for similarly challenging situations people did used Fortran compilers. So, anyone heard of a free Fortran library or an efficient piece of code? Thank you! Dimitri Have you considered the fact that 32000 regressions simply takes a lot of time? I don't really have anything to go by, but it sounds unlikely that you will be able to cut computing time by more than, say, ten times to 27 second. That would still leave you with 4 months of running a computer. Perhaps an alternative approach would be to get access to stronger (super)computers, either at a university, or buying access. A quick googling turns up http://www.clusterondemand.com/ for example. Anyhow, good luck with your project! I'm sure the R list would be very interested to hear of how you solved your problem. Regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coefficients of Logistic Regression from bootstrap - how to get them?
On Thu, Jul 31, 2008 at 4:30 PM, Michal Figurski [EMAIL PROTECTED] wrote: Frank and all, The point you were looking for was in a page that was linked from the referenced page - I apologize for confusion. Please take a look at the two last paragraphs here: http://people.revoledu.com/kardi/tutorial/Bootstrap/examples.htm Though, possibly it's my ignorance, maybe it's yours, but you actually missed the important point again. It is that you just don't estimate mean, or CI, or variance on PK profile data! It is as if you were trying to estimate mean, CI and variance of a Toccata__Fugue_in_D_minor.wav file. What for? The point is in the music! Would the mean or CI or variance tell you anything about that? Besides, everybody knows the variance (or variability?) is there and can estimate it without spending time on calculations. What I am trying to do is comparable to compressing a wave into mp3 - to predict the wave using as few data points as possible. I have a bunch of similar waves and I'm trying to find a common equation to predict them all. I am *not* looking for the variance of the mean! I could be wrong (though it seems less and less likely), but you keep talking about the same irrelevant parameters (CI, variance) on and on. Well, yes - we are at a standstill, but not because of Davison Hinkley's book. I can try reading it, though as I stated above, it is not even remotely related to what I am trying to do. I'll skip it then - life is too short. Nevertheless I thank you (all) for relevant criticism on the procedure (in the points where it was relevant). I plan to use this methodology further, and it was good to find out that it withstood your criticism. I will look into the penalized methods, though. -- Michal J. Figurski I take it you mean the sentence: For example, in here, the statistical estimator is the sample mean. Using bootstrap sampling, you can do beyond your statistical estimators. You can now get even the distribution of your estimator and the statistics (such as confidence interval, variance) of your estimator. Again you are misinterpreting text. The phrase about doing beyond your statistical estimators, is explained in the next sentence, where he says that using bootstrap gives you information about the mean *estimator* (and not more information about the population mean). And since you're not interested in this information, in your case bootstrap/resampling is not useful at all. As another example of misinterpretation: In your email from a week ago, it sounds like you believe that the authors of the original paper are trying to improve on a fixed model Figurski: Regarding the multiple stepwise regression - according to the cited SPSS manual, there are 5 options to select from. I don't think they used 'stepwise selection' option, because their models were already pre-defined. Variables were pre-selected based on knowledge of pharmacokinetics of this drug and other factors. I think this part I understand pretty well. This paragraph is wrong. Sorry, no way around it. Quoting from the paper Pawinski etal: *__Twenty-six(!)* 1-, 2-, or 3-sample estimation models were fit (r2 0.341– 0.862) to a randomly selected subset of the profiles using linear regression and were used to estimate AUC0–12h for the profiles not included in the regression fit, comparing those estimates with the corresponding AUC0–12h values, calculated with the linear trapezoidal rule, including all 12 timed MPA concentrations. The 3-sample models were constrained to include no samples past 2 h. (emph. mine) They clearly state that they are choosing among 26 different models by using their bootstrap-like procedure, not improving on a single, predefined model. This procedure is statistically sound (more or less at least), and not controversial. However, (again) what you are wanting to do is *not* what they did in their paper! resampling can not improve on the performance of a pre-specified model. This is intuitively obvious, but moreover its mathematically provable! That's why we're so certain of our standpoint. If you really wish, I (or someone else) could write out a proof, but I'm unsure if you would be able to follow. In the end, it doesn't really matter. What you are doing amounts to doing a regression 50 times, when once would suffice. No big harm done, just a bit of unnecessary work. And proof to a statistically competent reviewer that you don't really understand what you're doing. The better option would be to either study some more statistics yourself, or find a statistician that can do your analysis for you, and trust him to do it right. Anyhow, good luck with your research. Best regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read
Re: [R] Simple... but...
On Wed, Jul 23, 2008 at 3:23 PM, Doran, Harold [EMAIL PROTECTED] wrote: Shubba I'm confused. Your first post said the result should be c(1,2,3,4,5,6) when x and y are combined. The code I sent does that. But here you say your result should be c(4,1,2,5,2,3). What do you want your result to actually be? -Original Message- From: Shubha Vishwanath Karanth [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 23, 2008 9:17 AM To: Doran, Harold; [EMAIL PROTECTED] Subject: RE: [R] Simple... but... OK, Let x=c(4,2,2) y=c(1,5,3) My result should be c(4,1,2,5,2,3) Thanks, Shubha There should be nicer ways, but this does it: x-c(4,2,2) y-c(1,5,3) c(matrix(c(x,y),byrow=T,nrow=2)) /Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [Fwd: Re: Coefficients of Logistic Regression from bootstrap - how to get them?]
On Wed, Jul 23, 2008 at 3:14 PM, Michal Figurski [EMAIL PROTECTED] wrote: I think the argument supporting the use of bootstrap to determine coefficients, as opposed to just running linear regression on the whole dataset, is the comparison of Rsq and prediction errors between these two approaches - page 1502. There's a substantial difference in favor of the bootstrap approach. -- Michal J. Figurski Are you talking about this passage? A commonly used approach for establishing estimation models is to perform a multiple stepwise linear regression on the total set of full AUCs (19 ). When we used that approach, we obtained a r2 value of 0.74 and a prediction error of 7.6% 26.7%, (median, 6.5%; 95% CI, 51.9% to 67.5%), and the model estimated MPA AUC to within 15% of the full value in 56% of the profiles. Our estimation model using the repeated cross-validation approach was significantly better, with a r2 value of 0.862, prediction error of 6.1% 19%, (median, 3.0%; 95% CI, 33.1% to 32%), and estimation of MPA AUC to within 15% of the value (when all 12 samples are used to calculate MPA AUC) in 82% of the profiles. As far as I can tell, they are talking about the disadvantage using stepwise regression to determine the optimal variables in the regression, versus the bootstrap/CV-approach. And this might well be true. It is the following part in the methods description that seem unmotivated to me: Once the general model (of the 26) was selected, the proposed regression coefficients were taken as the median of the distribution of regression coefficient values described in step 2. I.e, after having decided upon the model that uses C0, C0.5 and C2 , using a median of the bootstrap estimates (which is what the R-code I wrote does, more or less) , instead of fitting that model on the entire data set. I don't see how this could be better, since we can't get any more information from the data other than what's there from the beginning. And I believe that this is what's all the other people on the list is trying to tell you, that it's a step without purpose. You have to distinguish between finding out which model is best, which bootstrap can be useful for, and estimating the parameters for the final, decided model, where bootstrapping several regressions and taking median most likely is no better than standard regression. best regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coefficients of Logistic Regression from bootstrap - how to get them?
On Wed, Jul 23, 2008 at 4:08 PM, Michal Figurski [EMAIL PROTECTED] wrote: Gustaf, I am sorry, but I don't get the point. Let's just focus on predictive performance from the cited passage, that is the number of values predicted within 15% of the original value. So, the predictive performance from the model fit on entire dataset was 56% of profiles, while from bootstrapped model it was 82% of profiles. Well - I see a stunning purpose in the bootstrap step here: it turns an useless equation into a clinically applicable model! Honestly, I also can't see how this can be better than fitting on entire dataset, but here you have a proof that it is. I think that another argument supporting this approach is model validation. If you fit model on entire data, you have no data left to validate its predictions. On the other hand, I agree with you that the passage in methods section looks awkward. In my work on a similar problem, that is going to appear in August in Ther Drug Monit, I used medians since beginning and all the comparisons were done based on models with median coefficients. I think this is what the authors of that paper did, though they might just have had a problem with describing it correctly, and unfortunately it passed through review process unchanged. Hi, I believe that you misunderstand the passage. Do you know what multiple stepwise regression is? Since they used SPSS, I copied from http://www.visualstatistics.net/SPSS%20workbook/stepwise_multiple_regression.htm Stepwise selection is a combination of forward and backward procedures. Step 1 The first predictor variable is selected in the same way as in forward selection. If the probability associated with the test of significance is less than or equal to the default .05, the predictor variable with the largest correlation with the criterion variable enters the equation first. Step 2 The second variable is selected based on the highest partial correlation. If it can pass the entry requirement (PIN=.05), it also enters the equation. Step 3 From this point, stepwise selection differs from forward selection: the variables already in the equation are examined for removal according to the removal criterion (POUT=.10) as in backward elimination. Step 4 Variables not in the equation are examined for entry. Variable selection ends when no more variables meet entry and removal criteria. --- It is the outcome of this *entire process*,step1-4, that they compare with the outcome of their *entire bootstrap/crossvalidation/selection process*, Step1-4 in the methods section, and find that their approach gives better result What you are doing is only step4 in the article's method section,estimating the parameters of a model *when you already know which variables to include*.It is the way this step is conducted that I am sceptical about. Regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Non-normal data issues in PhD software engineering experiment
On Thu, Jul 10, 2008 at 5:15 PM, Andrew Jackson [EMAIL PROTECTED] wrote: Hi All, Hi Andrew, The main questions here are not R-related, but statistical modelling questions, and much too broad for the R list. They are things you'd ask a (paid) statistical consultant. I would suggest taking contact with your own university's statistical support unit: http://www.insightsc.ie/statistics_clinic.htm ,and discuss the best approach for analysis. Rummaging around in R, looking for tests that you can squeeze your data into *really* isn't the best approach (and ?friedman.test clearly states that it's for unreplicated designs only). Some things I'd want to know if I were your statistical consultant: -What are you doing with your research? what's your goal? -What exactly are sensitivity (,coverage,execution,infection,propogation) measuring? looking at your data, it seems as if sensitivity is making discrete jumps. Why is this? -What's your actual hypothesis? That the means sensitivity values for the two paradigms differ by a constant no matter the version? or that they differ by a fraction? - If this is measured on test persons, I assume that you used each person several times. Is that so? Answers to the above questions might be good to bring to your meeting with the statistics faculty. Good luck with your research, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question: Beginner stuck in a R cycle
On Tue, Jul 8, 2008 at 3:18 PM, Daniela Ottaviani [EMAIL PROTECTED] wrote: Dear All, I have a database of 200 observations named myD. In the dataframe there are a column named code (with codes varying from 1 to 77), a column named prevalence with some quantitative measurements are given and an column named Pr_mean, with no values. I would like to set a cycle to compute the average of prevalence values for each different code and store the averages under the empty field Pr_mean. This is what I wrote: # Set a cycle for (i in 1:nrow(myD)) { mycode = myD$code[i] mymean[i] = mean(prevalence) myD$Pr_mean[i] = mymean[i] } With the above cycle I am able to compute the average of all 200 observations which is then written in every cell. I understand that a condition is missing, that indicates that the average has to be computed amongst the observations showing the same codes values. Could you please help me ? D. The easiest thing to do is to use ?by: myD-data.frame(code=sample(letters[1:5],200,replace=T),value=rnorm(200)) by(myD$value,myD$code,mean) but that won't get you the the group means in the empty column without some more lines of code. Another way is to use ?lapply and ?unlist: myD$Pr_mean-unlist(lapply(as.character(myD$code),function(x) mean(myD$value[myD$code==x]))) Regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Migrating from S-Plus to R - Exporting Tables
On Thu, Jul 3, 2008 at 2:17 AM, jim holtman [EMAIL PROTECTED] wrote: Does something like this get you close: x - list() keys - LETTERS[1:6] # create for (i in keys){ x[[i]] - data.frame(a=1:5, b=1:5, c=1:5) } # output output - file('tempxx.txt', 'w') for (i in keys){ write.table(i, row.names=FALSE, col.names=FALSE, file=output, quote=FALSE) write.table(x[[i]], file=output, quote=FALSE) } close(output) In order to get row.names written above the row names, I think you have to cheat a bit: (modifying Jim's code) x - list() keys - LETTERS[1:6] # create for (i in keys){ x[[i]] - data.frame(a=1:5, b=1:5, c=1:5) } # output output - file('tempxx.txt', 'w') for (i in keys){ write.table(i, row.names=FALSE, col.names=FALSE, file=output, quote=FALSE) write.table(data.frame(RowNames=row.names(x[[i]]),x[[i]]), file=output, quote=FALSE,row.names=FALSE) ##excluding actual rownames, adding them as a column. } close(output) - It seems as if you can't get it to write row.names, since that is a restricted name in a dataframe, but hopefully RowNames is good enough. /Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] tiff()-bug (was re:Preparing high quality figures with tiff as end result)
Hi all, A while back I sent a message concerning working with tiff-files, and mentioned that I encountered a bug in 2.7.0. This bug still occurs in 2.7.1, and is reproducable on a separate computer (both running WinXP professional): tiff() plot(1:1000) dev.off() This causes R to show the window R GUI has encountered a problem and needs to close. Can anyone else out there reproduce this, so I can file a bug report? Best, Gustaf Rydevik --- sessionInfo() R version 2.7.1 (2008-06-23) i386-pc-mingw32 locale: LC_COLLATE=Swedish_Sweden.1252;LC_CTYPE=Swedish_Sweden.1252;LC_MONETARY=Swedish_Sweden.1252;LC_NUMERIC=C;LC_TIME=Swedish_Sweden.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] RWinEdt_1.8-0 loaded via a namespace (and not attached): [1] tools_2.7.1 -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tiff()-bug (was re:Preparing high quality figures with tiff as end result)
On Wed, Jun 25, 2008 at 10:16 AM, Uwe Ligges [EMAIL PROTECTED] wrote: Gustaf Rydevik wrote: Hi all, A while back I sent a message concerning working with tiff-files, and mentioned that I encountered a bug in 2.7.0. This bug still occurs in 2.7.1, and is reproducable on a separate computer (both running WinXP professional): tiff() plot(1:1000) dev.off() This causes R to show the window R GUI has encountered a problem and needs to close. Can anyone else out there reproduce this, so I can file a bug report? Yes. Confirmed. Uwe Ligges Thank you. Bug report submitted. /Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tiff()-bug (was re:Preparing high quality figures with tiff as end result)
A short update that may be of help: The snippet of code does not crash R if i run under vanilla, nor if I change R to MDI-mode. It does crash R infallibly if I set it to SDI-mode in the Rprofile file. Strange... /Gustaf On Wed, Jun 25, 2008 at 3:16 PM, Prof Brian Ripley [EMAIL PROTECTED] wrote: On Wed, 25 Jun 2008, Peng Jiang wrote: Hi , Gustaf i don't know why but it works pretty well on a mac. with completely different code. Gustaf Rydevik has mentioned this before -- it never fails for me on Windows and hence one would not expect there to be a change in 2.7.1. Only if someone can reproduce it under a debugger have we a chance of tracking it down. regards . On 2008-6-25, at 下午4:16, Uwe Ligges wrote: Gustaf Rydevik wrote: Hi all, A while back I sent a message concerning working with tiff-files, and mentioned that I encountered a bug in 2.7.0. This bug still occurs in 2.7.1, and is reproducable on a separate computer (both running WinXP professional): tiff() plot(1:1000) dev.off() This causes R to show the window R GUI has encountered a problem and needs to close. Can anyone else out there reproduce this, so I can file a bug report? Yes. Confirmed. Uwe Ligges Best, Gustaf Rydevik --- sessionInfo() R version 2.7.1 (2008-06-23) i386-pc-mingw32 locale: LC_COLLATE=Swedish_Sweden.1252;LC_CTYPE=Swedish_Sweden.1252;LC_MONETARY=Swedish_Sweden.1252;LC_NUMERIC=C;LC_TIME=Swedish_Sweden.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] RWinEdt_1.8-0 loaded via a namespace (and not attached): [1] tools_2.7.1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peng Jiang 江鹏 Ph.D. Candidate Antai College of Economics Management 安泰经济管理学院 Department of Mathematics 数学系 Shanghai Jiaotong University (Minhang Campus) 800 Dongchuan Road 200240 Shanghai P. R. China __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help
dear Xu, does: library(urca) example(ur.ers) ers.gnp str(ers.gnp) [EMAIL PROTECTED] ,do what you want? (this reminds me that I have to learn S4 sometime) best, Gustaf Rydevik On Tue, Jun 24, 2008 at 3:52 AM, Xu, Ke-Li [EMAIL PROTECTED] wrote: Dear Sir/Madam, I found your email address and your correspondence with R-users. I hope you could help me with this question about the function ur.ers in the package of urca. It is an improved unit root test (Elliott et al. 1996 Econometrica). Do you know how to extract the value of the test statistic from the output? The only thing I can get is the print-out of all results including the test statistic. But I am wondering whether the value is saved somewhere, like g$coef will give you the estimated coefficient, where g is a linear model lm object. Thank you very much. Best regards, Keli Xu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ifelse and vs
On Wed, Jun 18, 2008 at 3:10 PM, Christos Argyropoulos [EMAIL PROTECTED] wrote: Hi, I noticed whether some one could explain why and behave differently in data frame transformations. Consider the following : a-data.frame(r=c(0,0,2,3),g=c(0,2,0,2.1)) Then: transform(a,R=ifelse(r0 g 0,log(r/g),NA)) r g R 1 0 0.0 NA 2 0 2.0 NA 3 2 0.0 NA 4 3 2.1 NA but transform(a,R=ifelse(r0 g 0,log(r/g),NA)) r g R 1 0 0.0NA 2 0 2.0NA 3 2 0.0NA 4 3 2.1 0.3566749 If my understanding of the differences between and and how 'transform' works are accurate, both statements should produce the same output. I got the same behaviour in Windows XP Pro 32-bit (running R v 2.7) and Ubuntu Hardy (running the same version of R). Thanks Christos Argyropoulos University of Pittsburgh Medical Center _ from ? : The shorter form performs elementwise comparisons in much the same way as arithmetic operators. The longer form evaluates left to right examining only the first element of each vector. Thus, a$r a$g [1] FALSE FALSE FALSE TRUE a$r a$g [1] FALSE ifelse takes a vector as argument. isince only gives a single value, ifelse(r0 g 0,log(r/g),NA) will only return NA, which then is recycled by transform. When using , ifelse returns a vector, and this vector is appended to the data frame. /Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Preparing high quality figures with tiff as end result
Hi all, I'm currently preparing some figures that will be submitted to PloS One. In their guidelines they state that they will only accept figures in tiff or eps format, with the warning that eps figures will be converted to tiff format ( see http://www.plosone.org/static/figureGuidelines.action ). Because of this conversion, I figured I'd generate tiff-format figures from the beginning. However, a number of issues cropped up: 1) using library(Cairo) CairoTIFF(test.tif) I get Sorry, this Cairo was compiled without tiff support.. I tried finding out how to recompile Cairo, but got lost in a lot of confusing talk about GTK+, downloaded dll files that I didn't know how to use etc. so I turned to plain tiff(), and 2) R started crashing on me.The following code tiff(test.tif) plot(rnorm(100)) dev.off() ,crashes R (i.e R GUI has encountered a problem and needs to close...) every third time or so. When it does work, the resulting output is not too pretty. So I turned to using postscript files. However, Plos One requires that fonts be embedded into the figure. embedFonts() works for this, but the result is that text becomes low-res bitmaps, and I don't know how to solve this. So basically my question is: How should I go about generating graphics that will look as nice as possible given the above constraints? Many thanks in advance, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Preparing high quality figures with tiff as end result
On Fri, May 23, 2008 at 4:40 PM, Gustaf Rydevik [EMAIL PROTECTED] wrote: Hi all, I'm currently preparing some figures that will be submitted to PloS One. In their guidelines they state that they will only accept figures in tiff or eps format, with the warning that eps figures will be converted to tiff format ( see http://www.plosone.org/static/figureGuidelines.action ). Because of this conversion, I figured I'd generate tiff-format figures from the beginning. However, a number of issues cropped up: 1) using library(Cairo) CairoTIFF(test.tif) I get Sorry, this Cairo was compiled without tiff support.. I tried finding out how to recompile Cairo, but got lost in a lot of confusing talk about GTK+, downloaded dll files that I didn't know how to use etc. so I turned to plain tiff(), and 2) R started crashing on me.The following code tiff(test.tif) plot(rnorm(100)) dev.off() ,crashes R (i.e R GUI has encountered a problem and needs to close...) every third time or so. When it does work, the resulting output is not too pretty. So I turned to using postscript files. However, Plos One requires that fonts be embedded into the figure. embedFonts() works for this, but the result is that text becomes low-res bitmaps, and I don't know how to solve this. So basically my question is: How should I go about generating graphics that will look as nice as possible given the above constraints? Many thanks in advance, Gustaf Oh, and before anyone bites my head of, I forgot the following: sessionInfo() R version 2.7.0 (2008-04-22) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] RWinEdt_1.8-0 Cairo_1.4-2 , and I'm using windows XP professional again, many thanks in advance /Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Change the position of panel strips in a lattice plot.
Hi all, In lattice plots, is there any option to position the panel strips with text below each subgraph, instead of above? i.e. in: Depth - equal.count(quakes$depth, number=8, overlap=.1) xyplot(lat ~ long | Depth, data = quakes) ,is there any way to make Depth appear below the subgraphs, instead of above? I've been looking through the lattice documentation and the list archive but have not found such a thing. Many thanks in advance, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loop for in R to generate several variables
On Mon, Apr 7, 2008 at 11:31 AM, arpino [EMAIL PROTECTED] wrote: Hi everybody, I have to create several variables of this form: Yind = L0 + L1*X1 + L2*X2 + L3*X3 + K*Cind + n where ind varires in {1,...,10} I thought to this loop for but it does not work: for (ind in 1:10) { Yind = L0 + L1*X1 + L2*X2 + L3*X3 + K*Cind + n } Any suggestions? Thank you. look up ?assign and ?get, i.e: for (ind in 1:10) { assign(paste(Y,ind,sep=),L0 + L1*X1 + L2*X2 + L3*X3 + get(paste(C,ind,sep=))+ n) } regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Make plots with GNUplot. Have anyone tried that?
On Fri, Feb 29, 2008 at 11:12 PM, Louise Hoffman [EMAIL PROTECTED] wrote: [snip] Seriously. Be specific if you have a problem. (read the posting guide). R can also plot. If you don't like R's plots (which I could not understand) you can export data and import them to gnuplot. So what? Okay, my post was not very good. The reason (I think) I need GNUplot, is that I would like to include the plots from R in a Latex report, where I would like to have all the text and equations in the plots with the same font as used in Latex. So when I read about opening and closing dev for making a pdf I figured that the plots that R produces are like the once Matlab makes; shows what they ought to, nothing more, nothing less. So I was wondering if anyone know of an GNUplot friendly format and the code that would produce that text file. I am new to both R and GNUplot, so I am pure ears if someone knows how to make such plots in R. Hi Louise, In addition to what Paul Murrell linked to regarding latex fonts, take a look at demo(plotmath). I really don't think you have to go outside of R to do what you want. In addition, if you aim to end up with a latex report I strongly encourage you to try out ?Sweave. It has certainly helped to streamline my workflow. Regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Avoiding overplotting of text.
On Mon, Feb 25, 2008 at 10:36 PM, hadley wickham [EMAIL PROTECTED] wrote: I am plotting some data, and use text() to get variable names next to points on the graph. What is the best way to make sure that these text labels are readable and not overlapping when two datapoints are close? I've tried using jitter(), but the effect is random and doesn't always give a good result. Any suggestions would be most appreciated. Have a look at pointLabel in maptools - http://finzi.psych.upenn.edu/R/library/maptools/html/pointLabel.html -- http://had.co.nz/ Thank you, Hadley. That was a very good tool to find! In conjunction to the regular tricks mentioned by Rickard Cotton, and thigmophobe by Jim Lemon, the problem turned out to be fairly easy. This seem like one of those tasks that is needed fairly frequently, but which is rarely bothered with. Would it be possible to add one of these algorithms as an option to the regular text()? Regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Avoiding overplotting of text.
Hi all, I am plotting some data, and use text() to get variable names next to points on the graph. What is the best way to make sure that these text labels are readable and not overlapping when two datapoints are close? I've tried using jitter(), but the effect is random and doesn't always give a good result. Any suggestions would be most appreciated. Best regards, Gustaf Example: -- x-rnorm(20) x.labels-vector(length=length(x)) for(i in 1:length(x))x.labels[i]-paste(sample(LETTERS,5,replace=T),collapse=) y-rnorm(length(x)) plot(x,y) text(x,y,x.labels) ###Most of the time some of the labels end up unreadable. --- -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cox model
On Feb 13, 2008 2:37 PM, Matthias Gondan [EMAIL PROTECTED] wrote: Hi Eleni, The problem of this approach is easily explained: Under the Null hypothesis, the P values of a significance test are random variables, uniformly distributed in the interval [0, 1]. It is easily seen that the lowest of these P values is not any 'better' than the highest of the P values. Best wishes, Matthias Correct me if I'm wrong, but isn't that the point? I assume that the hypothesis is that one or more of these genes are true predictors, i.e. for these genes the p-value should be significant. For all the other genes, the p-value is uniformly distributed. Using a significance level of 0.01, and an a priori knowledge that there are significant genes, you will end up with on the order of 20 genes, some of which are the true predictors, and the rest being false positives. this set of 20 genes can then be further analysed. A much smaller and easier problem to solve, no? /Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cox model
On Feb 13, 2008 3:06 PM, Gustaf Rydevik [EMAIL PROTECTED] wrote: On Feb 13, 2008 2:37 PM, Matthias Gondan [EMAIL PROTECTED] wrote: Hi Eleni, The problem of this approach is easily explained: Under the Null hypothesis, the P values of a significance test are random variables, uniformly distributed in the interval [0, 1]. It is easily seen that the lowest of these P values is not any 'better' than the highest of the P values. Best wishes, Matthias Correct me if I'm wrong, but isn't that the point? I assume that the hypothesis is that one or more of these genes are true predictors, i.e. for these genes the p-value should be significant. For all the other genes, the p-value is uniformly distributed. Using a significance level of 0.01, and an a priori knowledge that there are significant genes, you will end up with on the order of 20 genes, some of which are the true predictors, and the rest being false positives. this set of 20 genes can then be further analysed. A much smaller and easier problem to solve, no? /Gustaf Sorry, it should say 200 genes instead of 20. -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How does do.call() work??
On Jan 25, 2008 11:27 AM, Sergey Goriatchev [EMAIL PROTECTED] wrote: Dear members of R forum, Say I have a list: L - list(1:3, 1:3, 1:3) that I want to turn into a matrix. I wonder why if I do: do.call(cbind, L) I get the matrix I want, but if I do cbind(L) I get something different from what I want. Why is that? How does do.call() actually work? I've read in do.call() help file this sentence: The behavior of some functions, such as substitute, will not be the same for functions evaluated using do.call as if they were evaluated from the interpreter. The precise semantics are currently undefined and subject to change. Thanks for help! Sergey Try cbind(L[[1]],L[[2]],L[[3]]) ,which is equal to do.call(cbind,L). do.call takes a list of arguments, and feed each element of that list to the function. cbind takes two or more matrices, not a list of matrices as arguments. /Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] histogram with NAs
On Jan 18, 2008 4:49 PM, [EMAIL PROTECTED] wrote: Dear list, I have a categorical variable in a data.frame that I would like to plot using a histogram to show number of events. Values are 0, 1 and some NAs. I can´t make the hist() function to 1) include a column with the number of NAs 2) have the x axis to be categorical, I always get 0, 0.2, 0.4,... 1 divisions Can anyone help me? This is my code. database is my data.frame and Event is my variable. attach(database) hist(Event, col = 2, main = Number of Events)) Thanks in advance, David Please read ?hist, especially the line: Typical plots with vertical bars are not histograms. Consider barplot or plot(*, type = h) for such bar plots. . But no worry, I've mixed them up myself a number of times. To get a column of NA's, see the following: ###Example: sample.data-as.factor(sample(c(1,0,NA),100,replace=T)) sample.data-as.character(sample.data) sample.data[is.na(sample.data)]- NA sample.data-factor(sample.data) plot(sample.data) # /Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] An R is slow-article
Hi all, Reading the wikipedia page on R, I stumbled across the following: http://fluff.info/blog/arch/0172.htm It does seem interesting that the C execution is that much slower from R than from a native C program. Could any of the more technically knowledgeable people explain why this is so? The author also have some thought-provoking opinions on R being no-good and that you should write everything in C instead (mainly because R is slow and too good at graphics, encouraging data snooping). See http://fluff.info/blog/arch/0041.htm While I don't agree (granted, I can't really write C), it was interesting to read something from a very different perspective than I'm used to. Best regards, Gustaf _ Department of Epidemiology, Swedish Institute for Infectious Disease Control work email: gustaf.rydevik at smi dot ki dot se skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help
On Jan 9, 2008 5:47 PM, [EMAIL PROTECTED] wrote: Is there a number I can call to get started with R? I have some really basic questions that won't take more than 10 minutes. Sitadri Try and write your questions down to this mailing list, and you're bound to get answers, /Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tutorial for Basic Stats
On Dec 10, 2007 5:43 AM, Kapoor, Bharat [EMAIL PROTECTED] wrote: Thanks in advance - am looking for a Tutorial for doing basic stats. I have already looked/looking at the R-intro.pdf at the R site. Regards BK [[alternative HTML version deleted]] google introductory statistics R, and you'll find a nice pdf by J. Verzani. /Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extracting clusters from Data Frame
On Dec 10, 2007 2:28 PM, Johannes Graumann [EMAIL PROTECTED] wrote: Hello, I have a large data frame (1006222 rows), which I subject to a crude clustering attempt that results in a vector stating whether the datapoint represented by a row belongs to a cluster or not. Conceptually this looks something like this: Value Cluster? 0.01FALSE 0.03TRUE 0.04TRUE 0.05TRUE 0.07FALSE ... What I'm looking for is an efficient strategy to extract all consecutive rows associated with TRUE as a single cluster (data.frame representation?) without cluttering memory with thousends of data.frames. I was thinking of an independent data.frame that would contain a column of lists that reference all indexes from the big one which are contained in one cluster ... Can anyone kindly nudge me and let me know how to deal with this efficiently? Joh How about : orig.data-sample(c(TRUE,FALSE),100,replace=T) Cluster-data.frame(c.ndx=cumsum(rle(orig.data)$lengths),c.size=rle(orig.data)$lengths,c.type=rle(orig.data)$values) Cluster-Cluster[Cluster$c.type==TRUE,] ##Then, to get all original data belonging to cluster three: orig.data[rev(Cluster[3,c.ndx]-seq(length.out=Cluster[3,c.size])+1)] Not the neatest solution, but I'm sure someone here can improve on it. /Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.