Below is some working code that, generally speaking, accomplishes why I want, but am looking for a necessary improvement in the final step. The code below scrapes data from a website (thousands of pages actually) and organizes athlete�s scores in a data frame. The final variable, called Workout05 in the original data is a timed event. So, I use strplit() to pull out the data I want in that column and format it using as.POSIXct() as you can see in the code below (using a regular expression I�m sure would improve on how to pull out those data in the column, but that is not my primary question).
After I have all data, I want to find the empirical CDF of the data, so I use ecdf() on those data just as I would on other variables. Now, the main issue I�m interested is in the final step where you plug in a specific time to find its percentile ## These are below in context of the real problem as well fn <- ecdf(dat$score5) fn(dat$score5[1]) This works, but not in the way I want. What I want is for a user to easily be able to enter their time in �lay� terms such as 5:35 and from that it would return the percentile rank. So, I�d like something like the following to be able to work fn(5:35) The larger context for this problem for why I want this can be seen if you visit my web app built using shiny. I�ve built a site where athletes can build customized reports based on their performance on certain events by entering in data. This specific issue would be found on the �get my percentile� tab where a user can use the text input box to enter their time in a way humans typically understand it and then it gets passed to the R fn() function that runs in the background and builds the plot for them. https://hdoran.shinyapps.io/openAnalysis/ So, my question is how can I structure this such that a time can be expressed as simply minute:seconds (e.g., 4:52) in a text box so that it would still work to return a percentile rank as I�ve described here. Thanks library(XML) i = 1; j = 0; division = 1 url <- paste(paste('http://games.crossfit.com/scores/leaderboard.php?stage=5&sort=0&page=', i, sep=''), paste('&division=1®ion=', j, sep=''), '&numberperpage=100&competition=0&frontpage=0&expanded=1&year=15&full=1&showtoggles=0&hidedropdowns=0&showathleteac=1&=&is_mobile=0', sep='') tmp <- try(readHTMLTable(readLines(url), which=1, header=TRUE)) if(!is.null(dim(tmp))){ # new part here names(tmp) <- gsub("\\n", "", names(tmp)) names(tmp) <- gsub(" +", "", names(tmp)) tmp[] <- lapply(tmp, function(x) gsub("\\n", "", x)) tmp$region <- j } dat <- tmp aa <- strsplit(dat$Workout05, split = '\\(') bb <- sapply(aa, function(x) x[2]) aa <- strsplit(bb, split = '\\)') dat$score5 <- as.character(sapply(strsplit(bb, split = '\\)'), function(x) x)) dat$score5 <- as.POSIXct(dat$score5, format="%M:%S") fn <- ecdf(dat$score5) fn(dat$score5[1]) [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.