Below is some working code that, generally speaking, accomplishes why I want, 
but am looking for a necessary improvement in the final step. The code below 
scrapes data from a website (thousands of pages actually) and organizes 
athlete�s scores in a data frame. The final variable, called Workout05 in the 
original data is a timed event. So, I use strplit() to pull out the data I want 
in that column and format it using as.POSIXct() as you can see in the code 
below (using a regular expression I�m sure would improve on how to pull out 
those data in the column, but that is not my primary question).

After I have all data, I want to find the empirical CDF of the data, so I use 
ecdf() on those data just as I would on other variables. Now, the main issue 
I�m interested is in the final step where you plug in a specific time to find 
its percentile

## These are below in context of the real problem as well
fn <- ecdf(dat$score5)
fn(dat$score5[1])

This works, but not in the way I want. What I want is for a user to easily be 
able to enter their time in �lay� terms such as 5:35 and from that it would 
return the percentile rank.

So, I�d like something like the following to be able to work

fn(5:35)

The larger context for this problem for why I want this can be seen if you 
visit my web app built using shiny. I�ve built a site where athletes can build 
customized reports based on their performance on certain events by entering in 
data. This specific issue would be found on the �get my percentile� tab where a 
user can use the text input box to enter their time in a way humans typically 
understand it and then it gets passed to the R fn() function that runs in the 
background and builds the plot for them.

https://hdoran.shinyapps.io/openAnalysis/

So, my question is how can I structure this such that a time can be expressed 
as simply minute:seconds (e.g., 4:52) in a text box so that it would still work 
to return a percentile rank as I�ve described here.

Thanks



library(XML)

        i = 1; j = 0; division = 1
        url <-
        
paste(paste('http://games.crossfit.com/scores/leaderboard.php?stage=5&sort=0&page=',
 i, sep=''), paste('&division=1&region=', j, sep=''), 
'&numberperpage=100&competition=0&frontpage=0&expanded=1&year=15&full=1&showtoggles=0&hidedropdowns=0&showathleteac=1&=&is_mobile=0',
 sep='')
        tmp <- try(readHTMLTable(readLines(url), which=1, header=TRUE))
if(!is.null(dim(tmp))){ # new part here
        names(tmp) <- gsub("\\n", "", names(tmp))
        names(tmp) <- gsub(" +", "", names(tmp))
        tmp[] <- lapply(tmp, function(x) gsub("\\n", "", x))
        tmp$region <- j
        }
        dat <- tmp

   aa <- strsplit(dat$Workout05, split = '\\(')
bb <- sapply(aa, function(x) x[2])
aa <- strsplit(bb, split = '\\)')

dat$score5 <- as.character(sapply(strsplit(bb, split = '\\)'), function(x) x))
dat$score5 <- as.POSIXct(dat$score5, format="%M:%S")

fn <- ecdf(dat$score5)
fn(dat$score5[1])

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to