[R] Best place to ask questions about non-R Base topics, ex. dplyr, dbplyr, etc. ?
__ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] getting to a point where packages are installed and "ready to use", without unnecessarily reinstalling packages
I have R packages I want to use. Q. What is the "best" way to get to a point where all of the packages are installed and "ready to use", AND where I only install or re-install a package if doing so is needed? #I searched the web for insights and found these: https://hohenfeld.is/posts/check-if-a-package-is-installed-in-r/ https://stackoverflow.com/questions/9341635/check-for-installed-packages-before-running-install-packages Based on what I read there, I "think" I should use the require function. Here is what I came up with. Is there anything "wrong" with this code, and are there any ways I can improve the code? ### START OF REPRODUCIBLE CODE #install and load packages (list the packages I want in a vector, check if they are available to use, install if needed, load and attach, review) #create a vector with the character vector of the name(s) of package(s) I want to use packages_i_want_to_use <- c('RODBC', 'data.table', 'matrixStats', 'plyr', 'MASS', 'dplyr', 'lubridate') #packages_i_want_to_use <- c("this_pac_does_not_exist", "abcz", "lubridate") #use the require function to check if the package(s) is (are) available packages_exist_true_false <- sapply(X = packages_i_want_to_use, FUN = require, character.only = TRUE, quietly = TRUE) # create a vector with the names of the packages that need to be installed packages_to_install <- packages_i_want_to_use[packages_exist_true_false == FALSE] #specify the repo(s) AKA CRAN mirror I want to use myrepo <- 'https://ftp.osuosl.org/pub/cran/' #install the package(s) install.packages(pkgs = packages_to_install, repos = myrepo) #load and attach the packages_i_want_to_use using the library function sapply(X = packages_i_want_to_use, FUN = library, character.only = TRUE) #review #review to determine if the packages are available, using require() packages_exist_true_false_review <- sapply(X = packages_i_want_to_use, FUN = require, character.only = TRUE, quietly = TRUE) packages_exist_true_false_review ### END OF REPRODUCIBLE CODE __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] When using require(), why do I get the error message "Error in if (!loaded) { : the condition has length > 1" ?
Andrew, Thanks. I reviewed the code for "require" and saw: "if (!character.only) package <- as.character(substitute(package))" #This helps me better understand what is going on. I am sharing this here because I think it might help others understand. as.character( substitute("this_pac_does_not_exist") ) #quoted as.character( substitute( this_pac_does_not_exist ) ) #not quoted as.character( substitute("this_pac_does_not_exist") ) == as.character( substitute( this_pac_does_not_exist ) ) # packages_i_want_to_use <- c("this_pac_does_not_exist", "abcz") as.character( substitute(packages_i_want_to_use[1] ) ) packages_i_want_to_use[1] as.character( substitute(packages_i_want_to_use[1] ) ) == packages_i_want_to_use[1] #To prevent substitute(packages_i_want_to_use[1] from getting changed to as.character( substitute(packages_i_want_to_use[1] ) ), we need to set character.only = TRUE On Mon, Oct 24, 2022 at 12:53 PM Andrew Simmons wrote: > > In the first one, the argument is a character vector of length 1, so the code > works perfectly fine. > > The second is a call, and when coerced to a character vector should look like > > c("[", "packages_i_want_to_use", "1") > > You can try this yourself with quote(packages_i_want_to_use[1]) which returns > its first argument, unevaluated. > > On Mon, Oct 24, 2022, 12:46 Kelly Thompson wrote: >> >> Thanks! >> >> # Please, can you help me understand why >> require( 'base' ) # works, but >> require( packages_i_want_to_use[1] ) # does not work? >> >> # In require( 'base' ), what is the "first argument"? >> >> On Mon, Oct 24, 2022 at 12:29 PM Andrew Simmons wrote: >> > >> > require(), similarly to library(), does not evaluate its first argument >> > UNLESS you add character.only = TRUE >> > >> > require( packages_i_want_to_use[1], character.only = TRUE) >> > >> > >> > On Mon, Oct 24, 2022, 12:26 Kelly Thompson wrote: >> >> >> >> # Below, when using require(), why do I get the error message "Error >> >> in if (!loaded) { : the condition has length > 1" ? >> >> >> >> # This is my reproducible code: >> >> >> >> #create a vector with the names of the packages I want to use >> >> packages_i_want_to_use <- c('base', 'this_pac_does_not_exist') >> >> >> >> # Here I get error messages: >> >> require( packages_i_want_to_use[1] ) >> >> #Error in if (!loaded) { : the condition has length > 1 >> >> >> >> require( packages_i_want_to_use[2] ) >> >> #Error in if (!loaded) { : the condition has length > 1 >> >> >> >> # Here I get what I expect: >> >> require('base') >> >> >> >> require('this_pac_does_not_exist') >> >> #Loading required package: this_pac_does_not_exist >> >> #Warning message: >> >> #In library(package, lib.loc = lib.loc, character.only = TRUE, >> >> logical.return = TRUE, : >> >> # there is no package called ‘this_pac_does_not_exist’ >> >> >> >> __ >> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide >> >> http://www.R-project.org/posting-guide.html >> >> and provide commented, minimal, self-contained, reproducible code. >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] When using require(), why do I get the error message "Error in if (!loaded) { : the condition has length > 1" ?
Thanks! # Please, can you help me understand why require( 'base' ) # works, but require( packages_i_want_to_use[1] ) # does not work? # In require( 'base' ), what is the "first argument"? On Mon, Oct 24, 2022 at 12:29 PM Andrew Simmons wrote: > > require(), similarly to library(), does not evaluate its first argument > UNLESS you add character.only = TRUE > > require( packages_i_want_to_use[1], character.only = TRUE) > > > On Mon, Oct 24, 2022, 12:26 Kelly Thompson wrote: >> >> # Below, when using require(), why do I get the error message "Error >> in if (!loaded) { : the condition has length > 1" ? >> >> # This is my reproducible code: >> >> #create a vector with the names of the packages I want to use >> packages_i_want_to_use <- c('base', 'this_pac_does_not_exist') >> >> # Here I get error messages: >> require( packages_i_want_to_use[1] ) >> #Error in if (!loaded) { : the condition has length > 1 >> >> require( packages_i_want_to_use[2] ) >> #Error in if (!loaded) { : the condition has length > 1 >> >> # Here I get what I expect: >> require('base') >> >> require('this_pac_does_not_exist') >> #Loading required package: this_pac_does_not_exist >> #Warning message: >> #In library(package, lib.loc = lib.loc, character.only = TRUE, >> logical.return = TRUE, : >> # there is no package called ‘this_pac_does_not_exist’ >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] When using require(), why do I get the error message "Error in if (!loaded) { : the condition has length > 1" ?
# Below, when using require(), why do I get the error message "Error in if (!loaded) { : the condition has length > 1" ? # This is my reproducible code: #create a vector with the names of the packages I want to use packages_i_want_to_use <- c('base', 'this_pac_does_not_exist') # Here I get error messages: require( packages_i_want_to_use[1] ) #Error in if (!loaded) { : the condition has length > 1 require( packages_i_want_to_use[2] ) #Error in if (!loaded) { : the condition has length > 1 # Here I get what I expect: require('base') require('this_pac_does_not_exist') #Loading required package: this_pac_does_not_exist #Warning message: #In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, : # there is no package called ‘this_pac_does_not_exist’ __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] getting data from a "vertical" table into a "2-dimensional" grid
Bert, Thanks! I'm pretty sure what you provided gets me to what I was looking for, and is much simpler. I really appreciate your help. A follow-up question: I adjusted the code to not use "hard-coded" column names. mat2 <- with(data_original, tapply( get(names(data_original)[3]), list( get(names(data_original)[1]), get(names(data_original)[2])), sum )) Is there any better way to write that? Thanks again! - For clarity and to improve upon what I previously wrote, and so I can practice writing questions like this and asking for help, here's a recap of my question and "reproducible code", and the "better way" you provided: I have data presented in a 3-column data frame as shown below in "data_original". I want to aggregate the data in column 3, with the "by" argument using the first and second columns of "data_original". I want the results of the aggregation in a matrix, as shown below in "mat1". As my end "result", I want a matrix with one row for each unique value of column1 of data_original and one column for each unique value of column2 of data_original. What I show below seems like one way this can be done. My question: Are there easier or better ways to do this, especially in Base R, and also in R packages? #create data set.seed(1) data_original <- data.frame(year = rep(1990:1999, length = 50), category = sample(1:5, size = 50, replace = TRUE), sales = sample(0:9, size = 50 , replace = TRUE) ) dim(data_original) #remove rows where data_original[,1] == 1990 & data_original[,2] == 5, to ensure there is at least one NA in the desired matrix (this is an "edge" case I want the code to "deal with" correctly.) data_original <- data_original[ (data_original[,1] == 1990 & data_original[,2] == 5) == FALSE, ] dim(data_original) #aggregate data data_aggregate_col3_by_col1_and_col2 <- aggregate(x = data_original[3], by = list(data_original[,1], data_original[,2]), FUN = sum) colnames(data_aggregate_col3_by_col1_and_col2) <- colnames(data_original) dim(data_aggregate_col3_by_col1_and_col2) data_expanded <- expand.grid(unique(data_aggregate_col3_by_col1_and_col2[,1]), unique(data_aggregate_col3_by_col1_and_col2[,2])) colnames(data_expanded) <- colnames(data_aggregate_col3_by_col1_and_col2)[1:2] dim(data_expanded) data_expanded <- merge(data_expanded, data_aggregate_col3_by_col1_and_col2, all = TRUE) dim(data_expanded) mat1 <- matrix(data = data_expanded[,3], nrow = length(unique(data_expanded[,1])), ncol = length(unique(data_expanded[,2])) , byrow = TRUE, dimnames = list( unique(data_expanded[,1]), unique(data_expanded[,2]) ) ) #this is an easier way, using with and tapply mat2 <- with(data_original, tapply( get(names(data_original)[3]), list( get(names(data_original)[1]), get(names(data_original)[2])), sum )) #check that mat1 and mat 2 are "nearly equal" all.equal(mat1, mat2) Gunter wrote: > > "As my end result, I want a matrix or data frame, with one row for each > year, and one column for each category." > > If I understand you correctly, no reshaping gymnastics are needed -- > just use ?tapply: > > set.seed(1) > do <- data.frame(year = rep(1990:1999, length = 50), > category = sample(1:5, size = 50, replace = TRUE), > sales = sample(0:9, size = 50 , replace = TRUE) ) > > > with(do, tapply(sales, list(year, category),sum)) > ## which gives the matrix: > > 1 2 3 4 5 > 1990 13283 NA 55083 87522 64877 > 1991 NA 80963 NA 30100 28277 > 1992 9391 202916 NA 55090NA > 1993 29696 167344 NANA 17625 > 1994 98015 99521 NA 70536 52252 > 1995 157003 NA 26875NA 11366 > 1996 32986 88683 6562 79475 95282 > 1997 13601 NA 134757 12398NA > 1998 30537 51117 31333 20204NA > 1999 39240 87845 62479NA 98804 > > If this is not what you wanted, you may need to explain further or > await a response from someone more insightful than I. > > Cheers, > Bert > > > On Fri, Oct 21, 2022 at 3:34 PM Kelly Thompson wrote: > > > > As my end result, I want a matrix or data frame, with one row for each > > year, and one column for each category. > > > > On Fri, Oct 21, 2022 at 6:23 PM Kelly Thompson wrote: > > > > > > # I think this might be a better example. > > > > > > # I have data presented in a "vertical" dataframe as shown below in > > > data_original. > > > # I want this data in a matrix or "grid", as shown below. > > > # What I show below seems like one way this can be done. > > > > > > # My question: Are there easier or better ways to do this, especially > > > in Base R, and also in R
Re: [R] getting data from a "vertical" table into a "2-dimensional" grid
As my end result, I want a matrix or data frame, with one row for each year, and one column for each category. On Fri, Oct 21, 2022 at 6:23 PM Kelly Thompson wrote: > > # I think this might be a better example. > > # I have data presented in a "vertical" dataframe as shown below in > data_original. > # I want this data in a matrix or "grid", as shown below. > # What I show below seems like one way this can be done. > > # My question: Are there easier or better ways to do this, especially > in Base R, and also in R packages? > > #create data > set.seed(1) > data_original <- data.frame(year = rep(1990:1999, length = 50), > category = sample(1:5, size = 50, replace = TRUE), sales = > sample(0:9, size = 50 , replace = TRUE) ) > dim(data_original) > > #remove rows where data_original$year == 1990 & data_original$category > == 5, to ensure there is at least one NA in the "grid" > data_original <- data_original[ (data_original$year == 1990 & > data_original$category == 5) == FALSE, ] > dim(data_original) > > #aggregate data > data_aggregate_sum_by_year_and_category <- aggregate(x = > data_original$sales, by = list(year = data_original$year, category = > data_original$category), FUN = sum) > colnames(data_aggregate_sum_by_year_and_category) <- c('year', > 'category', 'sum_of_sales') > dim(data_aggregate_sum_by_year_and_category) > > data_expanded <- expand.grid(year = > unique(data_aggregate_sum_by_year_and_category$year), category = > unique(data_aggregate_sum_by_year_and_category$category)) > dim(data_expanded) > data_expanded <- merge(data_expanded, > data_aggregate_sum_by_year_and_category, all = TRUE) > dim(data_expanded) > > mat <- matrix(data = data_expanded$sum_of_sales, nrow = > length(unique(data_expanded$year)), ncol = > length(unique(data_expanded$category)) , byrow = TRUE, dimnames = > list( unique(data_expanded$year), unique(data_expanded$category) ) ) > > > data_original > data_expanded > mat > > On Fri, Oct 21, 2022 at 5:03 PM Kelly Thompson wrote: > > > > ### > > #I have data presented in a "vertical" data frame as shown below in > > data_original. > > #I want this data in a matrix or "grid", as shown below. > > #What I show below seems like one way this can be done. > > > > #My question: Are there easier or better ways to do this, especially > > in Base R, and also in R packages? > > > > #reproducible example > > > > data_original <- data.frame(year = c('1990', '1999', '1990', '1989'), > > size = c('s', 'l', 'xl', 'xs'), n = c(99, 33, 3, 4) ) > > > > data_expanded <- expand.grid(unique(data_original$year), > > unique(data_original$size), stringsAsFactors = FALSE ) > > colnames(data_expanded) <- c('year', 'size') > > data_expanded <- merge(data_expanded, data_original, all = TRUE) > > > > mat <- matrix(data = data_expanded $n, nrow = > > length(unique(data_expanded $year)), ncol = > > length(unique(data_expanded $size)) , byrow = TRUE, dimnames = list( > > unique(data_expanded$year), unique(data_expanded$size) ) ) > > > > data_original > > data_expanded > > mat __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] getting data from a "vertical" table into a "2-dimensional" grid
# I think this might be a better example. # I have data presented in a "vertical" dataframe as shown below in data_original. # I want this data in a matrix or "grid", as shown below. # What I show below seems like one way this can be done. # My question: Are there easier or better ways to do this, especially in Base R, and also in R packages? #create data set.seed(1) data_original <- data.frame(year = rep(1990:1999, length = 50), category = sample(1:5, size = 50, replace = TRUE), sales = sample(0:9, size = 50 , replace = TRUE) ) dim(data_original) #remove rows where data_original$year == 1990 & data_original$category == 5, to ensure there is at least one NA in the "grid" data_original <- data_original[ (data_original$year == 1990 & data_original$category == 5) == FALSE, ] dim(data_original) #aggregate data data_aggregate_sum_by_year_and_category <- aggregate(x = data_original$sales, by = list(year = data_original$year, category = data_original$category), FUN = sum) colnames(data_aggregate_sum_by_year_and_category) <- c('year', 'category', 'sum_of_sales') dim(data_aggregate_sum_by_year_and_category) data_expanded <- expand.grid(year = unique(data_aggregate_sum_by_year_and_category$year), category = unique(data_aggregate_sum_by_year_and_category$category)) dim(data_expanded) data_expanded <- merge(data_expanded, data_aggregate_sum_by_year_and_category, all = TRUE) dim(data_expanded) mat <- matrix(data = data_expanded$sum_of_sales, nrow = length(unique(data_expanded$year)), ncol = length(unique(data_expanded$category)) , byrow = TRUE, dimnames = list( unique(data_expanded$year), unique(data_expanded$category) ) ) data_original data_expanded mat On Fri, Oct 21, 2022 at 5:03 PM Kelly Thompson wrote: > > ### > #I have data presented in a "vertical" data frame as shown below in > data_original. > #I want this data in a matrix or "grid", as shown below. > #What I show below seems like one way this can be done. > > #My question: Are there easier or better ways to do this, especially > in Base R, and also in R packages? > > #reproducible example > > data_original <- data.frame(year = c('1990', '1999', '1990', '1989'), > size = c('s', 'l', 'xl', 'xs'), n = c(99, 33, 3, 4) ) > > data_expanded <- expand.grid(unique(data_original$year), > unique(data_original$size), stringsAsFactors = FALSE ) > colnames(data_expanded) <- c('year', 'size') > data_expanded <- merge(data_expanded, data_original, all = TRUE) > > mat <- matrix(data = data_expanded $n, nrow = > length(unique(data_expanded $year)), ncol = > length(unique(data_expanded $size)) , byrow = TRUE, dimnames = list( > unique(data_expanded$year), unique(data_expanded$size) ) ) > > data_original > data_expanded > mat __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] getting data from a "vertical" table into a "2-dimensional" grid
### #I have data presented in a "vertical" data frame as shown below in data_original. #I want this data in a matrix or "grid", as shown below. #What I show below seems like one way this can be done. #My question: Are there easier or better ways to do this, especially in Base R, and also in R packages? #reproducible example data_original <- data.frame(year = c('1990', '1999', '1990', '1989'), size = c('s', 'l', 'xl', 'xs'), n = c(99, 33, 3, 4) ) data_expanded <- expand.grid(unique(data_original$year), unique(data_original$size), stringsAsFactors = FALSE ) colnames(data_expanded) <- c('year', 'size') data_expanded <- merge(data_expanded, data_original, all = TRUE) mat <- matrix(data = data_expanded $n, nrow = length(unique(data_expanded $year)), ncol = length(unique(data_expanded $size)) , byrow = TRUE, dimnames = list( unique(data_expanded$year), unique(data_expanded$size) ) ) data_original data_expanded mat __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] From within R, what are "good" ways to run SQL code contained in a text file?
I am interested in this topic and found this post on StackOverflow, https://stackoverflow.com/questions/44853322/how-to-read-the-contents-of-an-sql-file-into-an-r-script-to-run-a-query This response seems especially useful, https://stackoverflow.com/a/44886192/10816734 I'm curious about the thoughts and insights people here in r-help have about this question and topic, and to learn if there are "better" ways than what is suggested in the StackOverflow thread. Thank you! __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] "apply" a function that takes two or more vectors as arguments, such as cor(x, y), over a "category" or "grouping variable" or "index"?
Thanks. I have a clarification and a follow-up question. I should have asked this in the original post, and I should have provided a better example for the FUN argument, I apologize. For use in an example, here is a "silly" example of a function that requires arguments such as x and y to be "separately assigned" : udf_x_plus_y <- function (x, y) { return ( x + y) } Q. Is there a way to use by() when the argument of FUN is a function that requires arguments such as "x" and "y" to be separately assigned (ex. udf_x_plus_y (x = my_x , y = my_y ), rather than assigned as a range of columns using brackets (ex. cor(x)[1,2]) ? Something like this perhaps? (This produces an error message.) by( data = my_df[-1], INDICES = my_df$my_category, FUN = function(x, y) { udf_x_plus_y (x = data$my_x, y = data$my_y) } ) Thanks again. On Sat, Apr 9, 2022 at 5:32 AM Rui Barradas wrote: > > Hello, > > Another option is ?by. > > > by(my_df[-1], my_df$my_category, cor) > by(my_df[-1], my_df$my_category, \(x) cor(x)[1,2]) > > > Hope this helps, > > Rui Barradas > > Às 02:26 de 09/04/2022, Kelly Thompson escreveu: > > #Q. How can I "apply" a function that takes two or more vectors as > > arguments, such as cor(x, y), over a "category" or "grouping variable" > > or "index"? > > #I'm using cor() as an example, I'd like to find a way to do this for > > any function that takes 2 or more vectors as arguments. > > > > > > #create example data > > > > my_category <- rep ( c("a","b","c"), 4) > > > > set.seed(12345) > > my_x <- rnorm(12) > > > > set.seed(54321) > > my_y <- rnorm(12) > > > > my_df <- data.frame(my_category, my_x, my_y) > > > > #review data > > my_df > > > > #If i wanted to get the correlation of x and y grouped by category, I > > could use this code and loop: > > > > my_category_unique <- unique(my_category) > > > > my_results <- vector("list", length(my_category_unique) ) > > names(my_results) <- my_category_unique > > > > #start i loop > >for (i in 1:length(my_category_unique) ) { > > my_criteria_i <- my_category == my_category_unique[i] > > my_x_i <- my_x[which(my_criteria_i)] > > my_y_i <- my_y[which(my_criteria_i)] > > my_correl_i <- cor(x = my_x_i, y = my_y_i) > > my_results[i] <- list(my_correl_i) > > } # end i loop > > > > #review results > > my_results > > > > #Q. Is there a better or more "elegant" way to do this, using by(), > > aggregate(), apply(), or some other function? > > > > #This does not work and results in this error message: "Error in > > FUN(dd[x, ], ...) : incompatible dimensions" > > by (data = my_x, INDICES = my_category, FUN = cor, y = my_y) > > > > #This does not work and results in this error message: "Error in > > cor(my_df$x, my_df$y) : ... supply both 'x' and 'y' or a matrix-like > > 'x' " > > by (data = my_df, INDICES = my_category, FUN = function(x, y) { cor > > (my_df$x, my_df$y) } ) > > > > > > #if I wanted the mean of x by category, I could use by() or aggregate(): > > by (data = my_x, INDICES = my_category, FUN = mean) > > > > aggregate(x = my_x, by = list(my_category), FUN = mean) > > > > #Thanks! > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] What is the intended behavior, when subsetting using brackets [ ], when the subset criterion has NA's?
I noticed that I get different results when subsetting using subset, compared to subsetting using "brackets" when the subset criteria have NA's. Here's an example #START OF EXAMPLE my_data <- 1:5 my_data my_subset_criteria <- c( F, F, T, NA, NA) my_subset_criteria #subsetting using subset returns the data where my_subset_criteria equals TRUE my_data[my_subset_criteria == T] #subsetting using brackets returns the data where my_subset_criteria equals TRUE, and also NA where my_subset_criteria is NA subset(my_data, my_subset_criteria == T) #END OF EXAMPLE This behavior is also mentioned here https://statisticaloddsandends.wordpress.com/2018/10/07/subsetting-in-the-presence-of-nas/ Q. Is this the intended behavior when subsetting with brackets? Thank you! __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] calculating quintile values of numeric data?
I’d like to take numeric data, and calculate numeric “quintiles” with integer values in from 1 – 5 , with values in the lowest 20% of values having a value of 1, the >20 - <= 40% of values having a value of 2, the >40% - <=60% of values having a value of 3, etc. How can I use quantcut, or another function, to do this? Thanks! Ex. x <- c(1:10) I want: myquintilefunction (x, q=5, na.rm=T) to return a vector with values: 1,1,2,2,3,3,4,4,5,5 Thanks! __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.