[R] Best place to ask questions about non-R Base topics, ex. dplyr, dbplyr, etc. ?

2022-10-26 Thread Kelly Thompson


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] getting to a point where packages are installed and "ready to use", without unnecessarily reinstalling packages

2022-10-25 Thread Kelly Thompson
I have R packages I want to use.

Q. What is the "best" way to get to a point where all of the packages
are installed and "ready to use", AND where I only install or
re-install a package if doing so is needed?

#I searched the web for insights and found these:
https://hohenfeld.is/posts/check-if-a-package-is-installed-in-r/
https://stackoverflow.com/questions/9341635/check-for-installed-packages-before-running-install-packages

Based on what I read there, I "think" I should use the require function.

Here is what I came up with.

Is there anything "wrong" with this code, and are there any ways I can
improve the code?

### START OF REPRODUCIBLE CODE

#install and load packages (list the packages I want in a vector,
check if they are available to use, install if needed, load and
attach, review)

#create a vector with the character vector of the name(s) of
package(s) I want to use
packages_i_want_to_use <- c('RODBC', 'data.table', 'matrixStats',
'plyr', 'MASS', 'dplyr', 'lubridate')
#packages_i_want_to_use <- c("this_pac_does_not_exist", "abcz", "lubridate")

#use the require function to check if the package(s) is (are) available
packages_exist_true_false <- sapply(X = packages_i_want_to_use, FUN =
require, character.only = TRUE, quietly = TRUE)

# create a vector with the names of the packages that need to be installed
packages_to_install <-
packages_i_want_to_use[packages_exist_true_false == FALSE]

#specify the repo(s) AKA CRAN mirror I want to use
myrepo <- 'https://ftp.osuosl.org/pub/cran/'

#install the package(s)
install.packages(pkgs = packages_to_install, repos = myrepo)

#load and attach the packages_i_want_to_use using the library function
sapply(X = packages_i_want_to_use, FUN = library, character.only = TRUE)

#review
#review to determine if the packages are available, using require()
packages_exist_true_false_review <- sapply(X = packages_i_want_to_use,
FUN = require, character.only = TRUE, quietly = TRUE)
packages_exist_true_false_review

### END OF REPRODUCIBLE CODE

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] When using require(), why do I get the error message "Error in if (!loaded) { : the condition has length > 1" ?

2022-10-24 Thread Kelly Thompson
Andrew,
Thanks. I reviewed the code for "require" and saw:
"if (!character.only)
package <- as.character(substitute(package))"

#This helps me better understand what is going on. I am sharing this
here because I think it might help others understand.
as.character( substitute("this_pac_does_not_exist") ) #quoted

as.character( substitute( this_pac_does_not_exist ) ) #not quoted

as.character( substitute("this_pac_does_not_exist") ) == as.character(
substitute( this_pac_does_not_exist ) )

#

packages_i_want_to_use <- c("this_pac_does_not_exist", "abcz")
as.character( substitute(packages_i_want_to_use[1] ) )
packages_i_want_to_use[1]
as.character( substitute(packages_i_want_to_use[1] ) ) ==
packages_i_want_to_use[1]

#To prevent substitute(packages_i_want_to_use[1] from getting changed
to as.character( substitute(packages_i_want_to_use[1] ) ), we need to
set character.only = TRUE

On Mon, Oct 24, 2022 at 12:53 PM Andrew Simmons  wrote:
>
> In the first one, the argument is a character vector of length 1, so the code 
> works perfectly fine.
>
> The second is a call, and when coerced to a character vector should look like
>
> c("[", "packages_i_want_to_use", "1")
>
> You can try this yourself with quote(packages_i_want_to_use[1]) which returns 
> its first argument, unevaluated.
>
> On Mon, Oct 24, 2022, 12:46 Kelly Thompson  wrote:
>>
>> Thanks!
>>
>> # Please, can you help me understand why
>> require( 'base' ) # works, but
>> require( packages_i_want_to_use[1] ) # does not work?
>>
>> # In require( 'base' ), what is the "first argument"?
>>
>> On Mon, Oct 24, 2022 at 12:29 PM Andrew Simmons  wrote:
>> >
>> > require(), similarly to library(), does not evaluate its first argument 
>> > UNLESS you add character.only = TRUE
>> >
>> > require( packages_i_want_to_use[1], character.only = TRUE)
>> >
>> >
>> > On Mon, Oct 24, 2022, 12:26 Kelly Thompson  wrote:
>> >>
>> >> # Below, when using require(), why do I get the error message "Error
>> >> in if (!loaded) { : the condition has length > 1" ?
>> >>
>> >> # This is my reproducible code:
>> >>
>> >> #create a vector with the names of the packages I want to use
>> >> packages_i_want_to_use <- c('base', 'this_pac_does_not_exist')
>> >>
>> >> # Here I get error messages:
>> >> require( packages_i_want_to_use[1] )
>> >> #Error in if (!loaded) { : the condition has length > 1
>> >>
>> >> require( packages_i_want_to_use[2] )
>> >> #Error in if (!loaded) { : the condition has length > 1
>> >>
>> >> # Here I get what I expect:
>> >> require('base')
>> >>
>> >> require('this_pac_does_not_exist')
>> >> #Loading required package: this_pac_does_not_exist
>> >> #Warning message:
>> >> #In library(package, lib.loc = lib.loc, character.only = TRUE,
>> >> logical.return = TRUE,  :
>> >> #  there is no package called ‘this_pac_does_not_exist’
>> >>
>> >> __
>> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide 
>> >> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] When using require(), why do I get the error message "Error in if (!loaded) { : the condition has length > 1" ?

2022-10-24 Thread Kelly Thompson
Thanks!

# Please, can you help me understand why
require( 'base' ) # works, but
require( packages_i_want_to_use[1] ) # does not work?

# In require( 'base' ), what is the "first argument"?

On Mon, Oct 24, 2022 at 12:29 PM Andrew Simmons  wrote:
>
> require(), similarly to library(), does not evaluate its first argument 
> UNLESS you add character.only = TRUE
>
> require( packages_i_want_to_use[1], character.only = TRUE)
>
>
> On Mon, Oct 24, 2022, 12:26 Kelly Thompson  wrote:
>>
>> # Below, when using require(), why do I get the error message "Error
>> in if (!loaded) { : the condition has length > 1" ?
>>
>> # This is my reproducible code:
>>
>> #create a vector with the names of the packages I want to use
>> packages_i_want_to_use <- c('base', 'this_pac_does_not_exist')
>>
>> # Here I get error messages:
>> require( packages_i_want_to_use[1] )
>> #Error in if (!loaded) { : the condition has length > 1
>>
>> require( packages_i_want_to_use[2] )
>> #Error in if (!loaded) { : the condition has length > 1
>>
>> # Here I get what I expect:
>> require('base')
>>
>> require('this_pac_does_not_exist')
>> #Loading required package: this_pac_does_not_exist
>> #Warning message:
>> #In library(package, lib.loc = lib.loc, character.only = TRUE,
>> logical.return = TRUE,  :
>> #  there is no package called ‘this_pac_does_not_exist’
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] When using require(), why do I get the error message "Error in if (!loaded) { : the condition has length > 1" ?

2022-10-24 Thread Kelly Thompson
# Below, when using require(), why do I get the error message "Error
in if (!loaded) { : the condition has length > 1" ?

# This is my reproducible code:

#create a vector with the names of the packages I want to use
packages_i_want_to_use <- c('base', 'this_pac_does_not_exist')

# Here I get error messages:
require( packages_i_want_to_use[1] )
#Error in if (!loaded) { : the condition has length > 1

require( packages_i_want_to_use[2] )
#Error in if (!loaded) { : the condition has length > 1

# Here I get what I expect:
require('base')

require('this_pac_does_not_exist')
#Loading required package: this_pac_does_not_exist
#Warning message:
#In library(package, lib.loc = lib.loc, character.only = TRUE,
logical.return = TRUE,  :
#  there is no package called ‘this_pac_does_not_exist’

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] getting data from a "vertical" table into a "2-dimensional" grid

2022-10-21 Thread Kelly Thompson
Bert,
Thanks! I'm pretty sure what you provided gets me to what I was
looking for, and is much simpler. I really appreciate your help.

A follow-up question:
I adjusted the code to not use "hard-coded" column names.

mat2 <- with(data_original, tapply( get(names(data_original)[3]),
list( get(names(data_original)[1]), get(names(data_original)[2])), sum
))

Is there any better way to write that?

Thanks again!
-

For clarity and to improve upon what I previously wrote, and so I can
practice writing questions like this and asking for help, here's a
recap of my question and "reproducible code", and the "better way" you
provided:

I have data presented in a 3-column data frame as shown below in
"data_original".

I want to aggregate the data in column 3, with the "by" argument using
the first and second columns of "data_original".

I want the results of the aggregation in a matrix, as shown below in "mat1".

As my end "result", I want a matrix with one row for each unique value
of column1 of data_original and one column for each unique value of
column2 of data_original.

What I show below seems like one way this can be done.

My question: Are there easier or better ways to do this, especially in
Base R, and also in R packages?


#create data
set.seed(1)
data_original <- data.frame(year = rep(1990:1999, length  = 50),
category = sample(1:5, size = 50, replace = TRUE),  sales =
sample(0:9, size = 50 , replace = TRUE) )
dim(data_original)

#remove rows where data_original[,1] == 1990 & data_original[,2] == 5,
to ensure there is at least one NA in the desired matrix (this is an
"edge" case I want the code to "deal with" correctly.)
data_original <- data_original[ (data_original[,1] == 1990 &
data_original[,2] == 5) == FALSE, ]
dim(data_original)

#aggregate data
data_aggregate_col3_by_col1_and_col2 <- aggregate(x =
data_original[3], by = list(data_original[,1], data_original[,2]), FUN
= sum)
colnames(data_aggregate_col3_by_col1_and_col2) <- colnames(data_original)
dim(data_aggregate_col3_by_col1_and_col2)

data_expanded <-
expand.grid(unique(data_aggregate_col3_by_col1_and_col2[,1]),
unique(data_aggregate_col3_by_col1_and_col2[,2]))
colnames(data_expanded) <- colnames(data_aggregate_col3_by_col1_and_col2)[1:2]
dim(data_expanded)

data_expanded <- merge(data_expanded,
data_aggregate_col3_by_col1_and_col2, all = TRUE)
dim(data_expanded)

mat1 <- matrix(data = data_expanded[,3], nrow =
length(unique(data_expanded[,1])), ncol =
length(unique(data_expanded[,2])) , byrow = TRUE, dimnames = list(
unique(data_expanded[,1]), unique(data_expanded[,2]) ) )

#this is an easier way, using with and tapply
mat2 <- with(data_original, tapply( get(names(data_original)[3]),
list( get(names(data_original)[1]), get(names(data_original)[2])), sum
))
#check that mat1 and mat 2 are "nearly equal"
all.equal(mat1, mat2)



Gunter  wrote:
>
> "As my end result, I want a matrix or data frame, with one row for each
> year, and one column for each category."
>
> If I understand you correctly, no reshaping gymnastics are needed --
> just use ?tapply:
>
> set.seed(1)
> do <- data.frame(year = rep(1990:1999, length  = 50),
> category = sample(1:5, size = 50, replace = TRUE),
> sales = sample(0:9, size = 50 , replace = TRUE) )
>
>
> with(do, tapply(sales, list(year, category),sum))
>  ## which gives the matrix:
>
>  1  2  3 4 5
> 1990  13283 NA  55083 87522 64877
> 1991 NA  80963 NA 30100 28277
> 1992   9391 202916 NA 55090NA
> 1993  29696 167344 NANA 17625
> 1994  98015  99521 NA 70536 52252
> 1995 157003 NA  26875NA 11366
> 1996  32986  88683   6562 79475 95282
> 1997  13601 NA 134757 12398NA
> 1998  30537  51117  31333 20204NA
> 1999  39240  87845  62479NA 98804
>
> If this is not what you wanted, you may need to explain further or
> await a response from someone more insightful than I.
>
> Cheers,
> Bert
>
>
> On Fri, Oct 21, 2022 at 3:34 PM Kelly Thompson  wrote:
> >
> > As my end result, I want a matrix or data frame, with one row for each
> > year, and one column for each category.
> >
> > On Fri, Oct 21, 2022 at 6:23 PM Kelly Thompson  wrote:
> > >
> > > # I think this might be a better example.
> > >
> > > # I have data presented in a "vertical" dataframe as shown below in
> > > data_original.
> > > # I want this data in a matrix or "grid", as shown below.
> > > # What I show below seems like one way this can be done.
> > >
> > > # My question: Are there easier or better ways to do this, especially
> > > in Base R, and also in R 

Re: [R] getting data from a "vertical" table into a "2-dimensional" grid

2022-10-21 Thread Kelly Thompson
As my end result, I want a matrix or data frame, with one row for each
year, and one column for each category.

On Fri, Oct 21, 2022 at 6:23 PM Kelly Thompson  wrote:
>
> # I think this might be a better example.
>
> # I have data presented in a "vertical" dataframe as shown below in
> data_original.
> # I want this data in a matrix or "grid", as shown below.
> # What I show below seems like one way this can be done.
>
> # My question: Are there easier or better ways to do this, especially
> in Base R, and also in R packages?
>
> #create data
> set.seed(1)
> data_original <- data.frame(year = rep(1990:1999, length  = 50),
> category = sample(1:5, size = 50, replace = TRUE),  sales =
> sample(0:9, size = 50 , replace = TRUE) )
> dim(data_original)
>
> #remove rows where data_original$year == 1990 & data_original$category
> == 5, to ensure there is at least one NA in the "grid"
> data_original <- data_original[ (data_original$year == 1990 &
> data_original$category == 5) == FALSE, ]
> dim(data_original)
>
> #aggregate data
> data_aggregate_sum_by_year_and_category <- aggregate(x =
> data_original$sales, by = list(year = data_original$year, category =
> data_original$category), FUN = sum)
> colnames(data_aggregate_sum_by_year_and_category) <- c('year',
> 'category', 'sum_of_sales')
> dim(data_aggregate_sum_by_year_and_category)
>
> data_expanded <- expand.grid(year =
> unique(data_aggregate_sum_by_year_and_category$year), category =
> unique(data_aggregate_sum_by_year_and_category$category))
> dim(data_expanded)
> data_expanded <- merge(data_expanded,
> data_aggregate_sum_by_year_and_category, all = TRUE)
> dim(data_expanded)
>
> mat <- matrix(data = data_expanded$sum_of_sales, nrow =
> length(unique(data_expanded$year)), ncol =
> length(unique(data_expanded$category)) , byrow = TRUE, dimnames =
> list( unique(data_expanded$year), unique(data_expanded$category) ) )
>
>
> data_original
> data_expanded
> mat
>
> On Fri, Oct 21, 2022 at 5:03 PM Kelly Thompson  wrote:
> >
> > ###
> > #I have data presented in a "vertical" data frame as shown below in
> > data_original.
> > #I want this data in a matrix or "grid", as shown below.
> > #What I show below seems like one way this can be done.
> >
> > #My question: Are there easier or better ways to do this, especially
> > in Base R, and also in R packages?
> >
> > #reproducible example
> >
> > data_original <- data.frame(year = c('1990', '1999', '1990', '1989'),
> > size = c('s', 'l', 'xl', 'xs'),  n = c(99, 33, 3, 4) )
> >
> > data_expanded <- expand.grid(unique(data_original$year),
> > unique(data_original$size), stringsAsFactors = FALSE )
> > colnames(data_expanded) <- c('year', 'size')
> > data_expanded <- merge(data_expanded, data_original, all = TRUE)
> >
> > mat <- matrix(data = data_expanded $n, nrow =
> > length(unique(data_expanded $year)), ncol =
> > length(unique(data_expanded $size)) , byrow = TRUE, dimnames = list(
> > unique(data_expanded$year), unique(data_expanded$size) ) )
> >
> > data_original
> > data_expanded
> > mat

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] getting data from a "vertical" table into a "2-dimensional" grid

2022-10-21 Thread Kelly Thompson
# I think this might be a better example.

# I have data presented in a "vertical" dataframe as shown below in
data_original.
# I want this data in a matrix or "grid", as shown below.
# What I show below seems like one way this can be done.

# My question: Are there easier or better ways to do this, especially
in Base R, and also in R packages?

#create data
set.seed(1)
data_original <- data.frame(year = rep(1990:1999, length  = 50),
category = sample(1:5, size = 50, replace = TRUE),  sales =
sample(0:9, size = 50 , replace = TRUE) )
dim(data_original)

#remove rows where data_original$year == 1990 & data_original$category
== 5, to ensure there is at least one NA in the "grid"
data_original <- data_original[ (data_original$year == 1990 &
data_original$category == 5) == FALSE, ]
dim(data_original)

#aggregate data
data_aggregate_sum_by_year_and_category <- aggregate(x =
data_original$sales, by = list(year = data_original$year, category =
data_original$category), FUN = sum)
colnames(data_aggregate_sum_by_year_and_category) <- c('year',
'category', 'sum_of_sales')
dim(data_aggregate_sum_by_year_and_category)

data_expanded <- expand.grid(year =
unique(data_aggregate_sum_by_year_and_category$year), category =
unique(data_aggregate_sum_by_year_and_category$category))
dim(data_expanded)
data_expanded <- merge(data_expanded,
data_aggregate_sum_by_year_and_category, all = TRUE)
dim(data_expanded)

mat <- matrix(data = data_expanded$sum_of_sales, nrow =
length(unique(data_expanded$year)), ncol =
length(unique(data_expanded$category)) , byrow = TRUE, dimnames =
list( unique(data_expanded$year), unique(data_expanded$category) ) )


data_original
data_expanded
mat

On Fri, Oct 21, 2022 at 5:03 PM Kelly Thompson  wrote:
>
> ###
> #I have data presented in a "vertical" data frame as shown below in
> data_original.
> #I want this data in a matrix or "grid", as shown below.
> #What I show below seems like one way this can be done.
>
> #My question: Are there easier or better ways to do this, especially
> in Base R, and also in R packages?
>
> #reproducible example
>
> data_original <- data.frame(year = c('1990', '1999', '1990', '1989'),
> size = c('s', 'l', 'xl', 'xs'),  n = c(99, 33, 3, 4) )
>
> data_expanded <- expand.grid(unique(data_original$year),
> unique(data_original$size), stringsAsFactors = FALSE )
> colnames(data_expanded) <- c('year', 'size')
> data_expanded <- merge(data_expanded, data_original, all = TRUE)
>
> mat <- matrix(data = data_expanded $n, nrow =
> length(unique(data_expanded $year)), ncol =
> length(unique(data_expanded $size)) , byrow = TRUE, dimnames = list(
> unique(data_expanded$year), unique(data_expanded$size) ) )
>
> data_original
> data_expanded
> mat

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] getting data from a "vertical" table into a "2-dimensional" grid

2022-10-21 Thread Kelly Thompson
###
#I have data presented in a "vertical" data frame as shown below in
data_original.
#I want this data in a matrix or "grid", as shown below.
#What I show below seems like one way this can be done.

#My question: Are there easier or better ways to do this, especially
in Base R, and also in R packages?

#reproducible example

data_original <- data.frame(year = c('1990', '1999', '1990', '1989'),
size = c('s', 'l', 'xl', 'xs'),  n = c(99, 33, 3, 4) )

data_expanded <- expand.grid(unique(data_original$year),
unique(data_original$size), stringsAsFactors = FALSE )
colnames(data_expanded) <- c('year', 'size')
data_expanded <- merge(data_expanded, data_original, all = TRUE)

mat <- matrix(data = data_expanded $n, nrow =
length(unique(data_expanded $year)), ncol =
length(unique(data_expanded $size)) , byrow = TRUE, dimnames = list(
unique(data_expanded$year), unique(data_expanded$size) ) )

data_original
data_expanded
mat

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] From within R, what are "good" ways to run SQL code contained in a text file?

2022-05-18 Thread Kelly Thompson
I am interested in this topic and found this post on StackOverflow,
https://stackoverflow.com/questions/44853322/how-to-read-the-contents-of-an-sql-file-into-an-r-script-to-run-a-query

This response seems especially useful,
https://stackoverflow.com/a/44886192/10816734

I'm curious about the thoughts and insights people here in r-help have
about this question and topic, and to learn if there are "better" ways
than what is suggested in the StackOverflow thread.

Thank you!

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] "apply" a function that takes two or more vectors as arguments, such as cor(x, y), over a "category" or "grouping variable" or "index"?

2022-04-09 Thread Kelly Thompson
Thanks. I have a clarification and a follow-up question. I should have
asked this in the original post, and I should have provided a better
example for the FUN argument, I apologize.

For use in an example, here is a "silly" example of a function that
requires arguments such as x and y to be "separately assigned" :

udf_x_plus_y <- function (x, y) { return ( x + y) }

Q. Is there a way to use by() when the argument of FUN is a function
that requires arguments such as "x" and "y" to be separately assigned
(ex. udf_x_plus_y (x = my_x , y = my_y ), rather than assigned as a
range of columns using brackets (ex. cor(x)[1,2]) ?

Something like this perhaps? (This produces an error message.)
by( data = my_df[-1], INDICES = my_df$my_category,  FUN = function(x,
y) { udf_x_plus_y (x = data$my_x, y = data$my_y) } )

Thanks again.

On Sat, Apr 9, 2022 at 5:32 AM Rui Barradas  wrote:
>
> Hello,
>
> Another option is ?by.
>
>
> by(my_df[-1], my_df$my_category, cor)
> by(my_df[-1], my_df$my_category, \(x) cor(x)[1,2])
>
>
> Hope this helps,
>
> Rui Barradas
>
> Às 02:26 de 09/04/2022, Kelly Thompson escreveu:
> > #Q. How can I "apply" a function that takes two or more vectors as
> > arguments, such as cor(x, y), over a "category" or "grouping variable"
> > or "index"?
> > #I'm using cor() as an example, I'd like to find a way to do this for
> > any function that takes 2 or more vectors as arguments.
> >
> >
> > #create example data
> >
> > my_category <- rep ( c("a","b","c"),  4)
> >
> > set.seed(12345)
> > my_x <- rnorm(12)
> >
> > set.seed(54321)
> > my_y <- rnorm(12)
> >
> > my_df <- data.frame(my_category, my_x, my_y)
> >
> > #review data
> > my_df
> >
> > #If i wanted to get the correlation of x and y grouped by category, I
> > could use this code and loop:
> >
> > my_category_unique <- unique(my_category)
> >
> > my_results <- vector("list", length(my_category_unique) )
> > names(my_results) <- my_category_unique
> >
> > #start i loop
> >for (i in 1:length(my_category_unique) ) {
> >  my_criteria_i <- my_category == my_category_unique[i]
> >  my_x_i <- my_x[which(my_criteria_i)]
> >  my_y_i <- my_y[which(my_criteria_i)]
> >  my_correl_i <- cor(x = my_x_i, y = my_y_i)
> >  my_results[i] <- list(my_correl_i)
> > } # end i loop
> >
> > #review results
> > my_results
> >
> > #Q. Is there a better or more "elegant" way to do this, using by(),
> > aggregate(), apply(), or some other function?
> >
> > #This does not work and results in this error message: "Error in
> > FUN(dd[x, ], ...) : incompatible dimensions"
> > by (data = my_x, INDICES = my_category, FUN = cor, y = my_y)
> >
> > #This does not work and results in this error message: "Error in
> > cor(my_df$x, my_df$y) : ... supply both 'x' and 'y' or a matrix-like
> > 'x' "
> > by (data = my_df, INDICES = my_category, FUN = function(x, y) { cor
> > (my_df$x, my_df$y) } )
> >
> >
> > #if I wanted the mean of x by category, I could use by() or aggregate():
> > by (data = my_x, INDICES = my_category, FUN = mean)
> >
> > aggregate(x = my_x, by = list(my_category), FUN = mean)
> >
> > #Thanks!
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] What is the intended behavior, when subsetting using brackets [ ], when the subset criterion has NA's?

2022-04-06 Thread Kelly Thompson
I noticed that I get different results when subsetting using subset,
compared to subsetting using  "brackets" when the subset criteria have
NA's.

Here's an example

#START OF EXAMPLE
my_data <- 1:5
my_data

my_subset_criteria <- c( F, F, T, NA, NA)
my_subset_criteria

#subsetting using subset returns the data where my_subset_criteria equals TRUE
my_data[my_subset_criteria == T]

#subsetting using brackets returns the data where my_subset_criteria
equals TRUE, and also NA where my_subset_criteria is NA
subset(my_data, my_subset_criteria == T)

#END OF EXAMPLE

This behavior is also mentioned here
https://statisticaloddsandends.wordpress.com/2018/10/07/subsetting-in-the-presence-of-nas/

Q. Is this the intended behavior when subsetting with brackets?

Thank you!

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] calculating quintile values of numeric data?

2019-01-22 Thread Kelly Thompson
I’d like to take numeric data, and calculate numeric “quintiles” with
integer values in from 1 – 5 , with values in the lowest 20% of values
having a value of 1, the >20 - <= 40% of values having a value of 2,
the >40% - <=60% of values having a value of 3, etc.

How can I use quantcut, or another function, to do this?


Thanks!


Ex.

x <- c(1:10)

I want:
myquintilefunction (x, q=5, na.rm=T) to return a vector with values:
1,1,2,2,3,3,4,4,5,5

Thanks!

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.