Re: [R] Strange behavior when sampling rows of a data frame

2020-06-19 Thread Sébastien Lahaie
Thank you all for the responses, these are the insights I was hoping for.
There are many ways to get this right, and I happened to run into one that
has a glitch. I see from Luke's explanation how the strange output came
about. Glad to hear that this bug/behavior is already known.

On Fri, Jun 19, 2020 at 7:04 PM Daniel Nordlund 
wrote:

> On 6/19/2020 5:49 AM, Sébastien Lahaie wrote:
> > I ran into some strange behavior in R when trying to assign a treatment
> to
> > rows in a data frame. I'm wondering whether any R experts can explain
> > what's going on.
> >
> > First, let's assign a treatment to 3 out of 10 rows as follows.
> >
> > df <- data.frame(unit = 1:10)
> > df$treated <- FALSE
> > s <- sample(nrow(df), 3)
> > df[s,]$treated <- TRUE
> > df
> > unit treated
> > 1 1   FALSE
> > 2 2TRUE
> > 3 3   FALSE
> > 4 4   FALSE
> > 5 5TRUE
> > 6 6   FALSE
> > 7 7TRUE
> > 8 8   FALSE
> > 9 9   FALSE
> > 10   10   FALSE
> >
> > This is as expected. Now we'll just skip the intermediate step of saving
> > the sampled indices, and apply the treatment directly as follows.
> >
> > df <- data.frame(unit = 1:10)
> > df$treated <- FALSE
> > df[sample(nrow(df), 3),]$treated <- TRUE
> > df
> > unit treated
> > 1 6TRUE
> > 2 2   FALSE
> > 3 3   FALSE
> > 4 9TRUE
> > 5 5   FALSE
> > 6 6   FALSE
> > 7 7   FALSE
> > 8 5TRUE
> > 9 9   FALSE
> > 10   10   FALSE
> >
> > Now the data frame still has 10 rows with 3 assigned to the treatment.
> But
> > the units are garbled. Units 1 and 4 have disappeared, for instance, and
> > there are duplicates for 6 and 9, one assigned to treatment and the other
> > to control. Why would this happen?
> >
> > Thanks,
> > Sebastien
> >
> Sébastien,
>
> You have received good explanations of what is going on with your code.
> I think you can get what you want by making a simple modification of
> your treatment assignment statement. At least it works for me.
>
> df[sample(nrow(df),3), 'treated'] <- TRUE
>
> Hope this is helpful,
>
> Dan
>
> --
> Daniel Nordlund
> Port Townsend, WA  USA
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange behavior when sampling rows of a data frame

2020-06-19 Thread Daniel Nordlund

On 6/19/2020 5:49 AM, Sébastien Lahaie wrote:

I ran into some strange behavior in R when trying to assign a treatment to
rows in a data frame. I'm wondering whether any R experts can explain
what's going on.

First, let's assign a treatment to 3 out of 10 rows as follows.

df <- data.frame(unit = 1:10)
df$treated <- FALSE
s <- sample(nrow(df), 3)
df[s,]$treated <- TRUE
df
unit treated
1 1   FALSE
2 2TRUE
3 3   FALSE
4 4   FALSE
5 5TRUE
6 6   FALSE
7 7TRUE
8 8   FALSE
9 9   FALSE
10   10   FALSE

This is as expected. Now we'll just skip the intermediate step of saving
the sampled indices, and apply the treatment directly as follows.

df <- data.frame(unit = 1:10)
df$treated <- FALSE
df[sample(nrow(df), 3),]$treated <- TRUE
df
unit treated
1 6TRUE
2 2   FALSE
3 3   FALSE
4 9TRUE
5 5   FALSE
6 6   FALSE
7 7   FALSE
8 5TRUE
9 9   FALSE
10   10   FALSE

Now the data frame still has 10 rows with 3 assigned to the treatment. But
the units are garbled. Units 1 and 4 have disappeared, for instance, and
there are duplicates for 6 and 9, one assigned to treatment and the other
to control. Why would this happen?

Thanks,
Sebastien


Sébastien,

You have received good explanations of what is going on with your code.  
I think you can get what you want by making a simple modification of 
your treatment assignment statement. At least it works for me.


df[sample(nrow(df),3), 'treated'] <- TRUE

Hope this is helpful,

Dan

--
Daniel Nordlund
Port Townsend, WA  USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange behavior when sampling rows of a data frame

2020-06-19 Thread William Dunlap via R-help
It is a bug that has been present in R since at least R-2.14.0 (the oldest
that I have installed on my laptop).

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Fri, Jun 19, 2020 at 10:37 AM Rui Barradas  wrote:

> Hello,
>
>
> Thanks, I hadn't thought of that.
>
> But, why? Is it evaluated once before assignment and a second time when
> the assignment occurs?
>
> To trace both sample and `[<-` gives 2 calls to sample.
>
>
> trace(sample)
> trace(`[<-`)
> df[sample(nrow(df), 3),]$treated <- TRUE
> trace: sample(nrow(df), 3)
> trace: `[<-`(`*tmp*`, sample(nrow(df), 3), , value = list(unit = c(7L,
> 6L, 8L), treated = c(TRUE, TRUE, TRUE)))
> trace: sample(nrow(df), 3)
>
>
> Regards,
>
> Rui Barradas
>
>
> Às 17:20 de 19/06/2020, William Dunlap escreveu:
> > The first subscript argument is getting evaluated twice.
> > > trace(sample)
> > > set.seed(2020); df[i<-sample(10,3), ]$Treated <- TRUE
> > trace: sample(10, 3)
> > trace: sample(10, 3)
> > > i
> > [1]  1 10  4
> > > set.seed(2020); sample(10,3)
> > trace: sample(10, 3)
> > [1] 7 6 8
> > > sample(10,3)
> > trace: sample(10, 3)
> > [1]  1 10  4
> >
> > Bill Dunlap
> > TIBCO Software
> > wdunlap tibco.com 
> >
> >
> > On Fri, Jun 19, 2020 at 8:46 AM Rui Barradas  > > wrote:
> >
> > Hello,
> >
> > I don't have an answer on the reason why this happens but it seems
> > like
> > a bug. Where?
> >
> > In which of  `[<-.data.frame` or `[<-.default`?
> >
> > A solution is to subset and assign the vector:
> >
> >
> > set.seed(2020)
> > df2 <- data.frame(unit = 1:10)
> > df2$treated <- FALSE
> >
> > df2$treated[sample(nrow(df2), 3)] <- TRUE
> > df2
> > #  unit treated
> > #1 1   FALSE
> > #2 2   FALSE
> > #3 3   FALSE
> > #4 4   FALSE
> > #5 5   FALSE
> > #6 6TRUE
> > #7 7TRUE
> > #8 8TRUE
> > #9 9   FALSE
> > #10   10   FALSE
> >
> >
> > Or
> >
> >
> > set.seed(2020)
> > df3 <- data.frame(unit = 1:10)
> > df3$treated <- FALSE
> >
> > df3[sample(nrow(df3), 3), "treated"] <- TRUE
> > df3
> > # result as expected
> >
> >
> > Hope this helps,
> >
> > Rui  Barradas
> >
> >
> >
> > Às 13:49 de 19/06/2020, Sébastien Lahaie escreveu:
> > > I ran into some strange behavior in R when trying to assign a
> > treatment to
> > > rows in a data frame. I'm wondering whether any R experts can
> > explain
> > > what's going on.
> > >
> > > First, let's assign a treatment to 3 out of 10 rows as follows.
> > >
> > >> df <- data.frame(unit = 1:10)
> > >> df$treated <- FALSE
> > >> s <- sample(nrow(df), 3)
> > >> df[s,]$treated <- TRUE
> > >> df
> > > unit treated
> > >
> > > 1 1   FALSE
> > >
> > > 2 2TRUE
> > >
> > > 3 3   FALSE
> > >
> > > 4 4   FALSE
> > >
> > > 5 5TRUE
> > >
> > > 6 6   FALSE
> > >
> > > 7 7TRUE
> > >
> > > 8 8   FALSE
> > >
> > > 9 9   FALSE
> > >
> > > 10   10   FALSE
> > >
> > > This is as expected. Now we'll just skip the intermediate step
> > of saving
> > > the sampled indices, and apply the treatment directly as follows.
> > >
> > >> df <- data.frame(unit = 1:10)
> > >> df$treated <- FALSE
> > >> df[sample(nrow(df), 3),]$treated <- TRUE
> > >> df
> > > unit treated
> > >
> > > 1 6TRUE
> > >
> > > 2 2   FALSE
> > >
> > > 3 3   FALSE
> > >
> > > 4 9TRUE
> > >
> > > 5 5   FALSE
> > >
> > > 6 6   FALSE
> > >
> > > 7 7   FALSE
> > >
> > > 8 5TRUE
> > >
> > > 9 9   FALSE
> > >
> > > 10   10   FALSE
> > >
> > > Now the data frame still has 10 rows with 3 assigned to the
> > treatment. But
> > > the units are garbled. Units 1 and 4 have disappeared, for
> > instance, and
> > > there are duplicates for 6 and 9, one assigned to treatment and
> > the other
> > > to control. Why would this happen?
> > >
> > > Thanks,
> > > Sebastien
> > >
> > >   [[alternative HTML version deleted]]
> > >
> > > __
> > > R-help@r-project.org  mailing list
> > -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> > --
> > Este e-mail foi verificado em termos de vírus pelo software
> > antivírus Avast.
> > https://www.avast.com/antivirus
> >
> > __
> > R-help@r-project.org 

Re: [R] Strange behavior when sampling rows of a data frame

2020-06-19 Thread Rui Barradas

Hello,


Thanks, I hadn't thought of that.

But, why? Is it evaluated once before assignment and a second time when 
the assignment occurs?


To trace both sample and `[<-` gives 2 calls to sample.


trace(sample)
trace(`[<-`)
df[sample(nrow(df), 3),]$treated <- TRUE
trace: sample(nrow(df), 3)
trace: `[<-`(`*tmp*`, sample(nrow(df), 3), , value = list(unit = c(7L,
6L, 8L), treated = c(TRUE, TRUE, TRUE)))
trace: sample(nrow(df), 3)


Regards,

Rui Barradas


Às 17:20 de 19/06/2020, William Dunlap escreveu:

The first subscript argument is getting evaluated twice.
> trace(sample)
> set.seed(2020); df[i<-sample(10,3), ]$Treated <- TRUE
trace: sample(10, 3)
trace: sample(10, 3)
> i
[1]  1 10  4
> set.seed(2020); sample(10,3)
trace: sample(10, 3)
[1] 7 6 8
> sample(10,3)
trace: sample(10, 3)
[1]  1 10  4

Bill Dunlap
TIBCO Software
wdunlap tibco.com 


On Fri, Jun 19, 2020 at 8:46 AM Rui Barradas > wrote:


Hello,

I don't have an answer on the reason why this happens but it seems
like
a bug. Where?

In which of  `[<-.data.frame` or `[<-.default`?

A solution is to subset and assign the vector:


set.seed(2020)
df2 <- data.frame(unit = 1:10)
df2$treated <- FALSE

df2$treated[sample(nrow(df2), 3)] <- TRUE
df2
#  unit treated
#1 1   FALSE
#2 2   FALSE
#3 3   FALSE
#4 4   FALSE
#5 5   FALSE
#6 6    TRUE
#7 7    TRUE
#8 8    TRUE
#9 9   FALSE
#10   10   FALSE


Or


set.seed(2020)
df3 <- data.frame(unit = 1:10)
df3$treated <- FALSE

df3[sample(nrow(df3), 3), "treated"] <- TRUE
df3
# result as expected


Hope this helps,

Rui  Barradas



Às 13:49 de 19/06/2020, Sébastien Lahaie escreveu:
> I ran into some strange behavior in R when trying to assign a
treatment to
> rows in a data frame. I'm wondering whether any R experts can
explain
> what's going on.
>
> First, let's assign a treatment to 3 out of 10 rows as follows.
>
>> df <- data.frame(unit = 1:10)
>> df$treated <- FALSE
>> s <- sample(nrow(df), 3)
>> df[s,]$treated <- TRUE
>> df
>     unit treated
>
> 1     1   FALSE
>
> 2     2    TRUE
>
> 3     3   FALSE
>
> 4     4   FALSE
>
> 5     5    TRUE
>
> 6     6   FALSE
>
> 7     7    TRUE
>
> 8     8   FALSE
>
> 9     9   FALSE
>
> 10   10   FALSE
>
> This is as expected. Now we'll just skip the intermediate step
of saving
> the sampled indices, and apply the treatment directly as follows.
>
>> df <- data.frame(unit = 1:10)
>> df$treated <- FALSE
>> df[sample(nrow(df), 3),]$treated <- TRUE
>> df
>     unit treated
>
> 1     6    TRUE
>
> 2     2   FALSE
>
> 3     3   FALSE
>
> 4     9    TRUE
>
> 5     5   FALSE
>
> 6     6   FALSE
>
> 7     7   FALSE
>
> 8     5    TRUE
>
> 9     9   FALSE
>
> 10   10   FALSE
>
> Now the data frame still has 10 rows with 3 assigned to the
treatment. But
> the units are garbled. Units 1 and 4 have disappeared, for
instance, and
> there are duplicates for 6 and 9, one assigned to treatment and
the other
> to control. Why would this happen?
>
> Thanks,
> Sebastien
>
>       [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org  mailing list
-- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Este e-mail foi verificado em termos de vírus pelo software

antivírus Avast.
https://www.avast.com/antivirus

__
R-help@r-project.org  mailing list --
To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Este e-mail foi verificado em termos de vírus pelo software antivírus Avast.
https://www.avast.com/antivirus

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange behavior when sampling rows of a data frame

2020-06-19 Thread William Dunlap via R-help
The first subscript argument is getting evaluated twice.
> trace(sample)
> set.seed(2020); df[i<-sample(10,3), ]$Treated <- TRUE
trace: sample(10, 3)
trace: sample(10, 3)
> i
[1]  1 10  4
> set.seed(2020); sample(10,3)
trace: sample(10, 3)
[1] 7 6 8
> sample(10,3)
trace: sample(10, 3)
[1]  1 10  4

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Fri, Jun 19, 2020 at 8:46 AM Rui Barradas  wrote:

> Hello,
>
> I don't have an answer on the reason why this happens but it seems like
> a bug. Where?
>
> In which of  `[<-.data.frame` or `[<-.default`?
>
> A solution is to subset and assign the vector:
>
>
> set.seed(2020)
> df2 <- data.frame(unit = 1:10)
> df2$treated <- FALSE
>
> df2$treated[sample(nrow(df2), 3)] <- TRUE
> df2
> #  unit treated
> #1 1   FALSE
> #2 2   FALSE
> #3 3   FALSE
> #4 4   FALSE
> #5 5   FALSE
> #6 6TRUE
> #7 7TRUE
> #8 8TRUE
> #9 9   FALSE
> #10   10   FALSE
>
>
> Or
>
>
> set.seed(2020)
> df3 <- data.frame(unit = 1:10)
> df3$treated <- FALSE
>
> df3[sample(nrow(df3), 3), "treated"] <- TRUE
> df3
> # result as expected
>
>
> Hope this helps,
>
> Rui  Barradas
>
>
>
> Às 13:49 de 19/06/2020, Sébastien Lahaie escreveu:
> > I ran into some strange behavior in R when trying to assign a treatment
> to
> > rows in a data frame. I'm wondering whether any R experts can explain
> > what's going on.
> >
> > First, let's assign a treatment to 3 out of 10 rows as follows.
> >
> >> df <- data.frame(unit = 1:10)
> >> df$treated <- FALSE
> >> s <- sample(nrow(df), 3)
> >> df[s,]$treated <- TRUE
> >> df
> > unit treated
> >
> > 1 1   FALSE
> >
> > 2 2TRUE
> >
> > 3 3   FALSE
> >
> > 4 4   FALSE
> >
> > 5 5TRUE
> >
> > 6 6   FALSE
> >
> > 7 7TRUE
> >
> > 8 8   FALSE
> >
> > 9 9   FALSE
> >
> > 10   10   FALSE
> >
> > This is as expected. Now we'll just skip the intermediate step of saving
> > the sampled indices, and apply the treatment directly as follows.
> >
> >> df <- data.frame(unit = 1:10)
> >> df$treated <- FALSE
> >> df[sample(nrow(df), 3),]$treated <- TRUE
> >> df
> > unit treated
> >
> > 1 6TRUE
> >
> > 2 2   FALSE
> >
> > 3 3   FALSE
> >
> > 4 9TRUE
> >
> > 5 5   FALSE
> >
> > 6 6   FALSE
> >
> > 7 7   FALSE
> >
> > 8 5TRUE
> >
> > 9 9   FALSE
> >
> > 10   10   FALSE
> >
> > Now the data frame still has 10 rows with 3 assigned to the treatment.
> But
> > the units are garbled. Units 1 and 4 have disappeared, for instance, and
> > there are duplicates for 6 and 9, one assigned to treatment and the other
> > to control. Why would this happen?
> >
> > Thanks,
> > Sebastien
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --
> Este e-mail foi verificado em termos de vírus pelo software antivírus
> Avast.
> https://www.avast.com/antivirus
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange behavior when sampling rows of a data frame

2020-06-19 Thread Rui Barradas

Hello,

I don't have an answer on the reason why this happens but it seems like 
a bug. Where?


In which of  `[<-.data.frame` or `[<-.default`?

A solution is to subset and assign the vector:


set.seed(2020)
df2 <- data.frame(unit = 1:10)
df2$treated <- FALSE

df2$treated[sample(nrow(df2), 3)] <- TRUE
df2
#  unit treated
#1 1   FALSE
#2 2   FALSE
#3 3   FALSE
#4 4   FALSE
#5 5   FALSE
#6 6    TRUE
#7 7    TRUE
#8 8    TRUE
#9 9   FALSE
#10   10   FALSE


Or


set.seed(2020)
df3 <- data.frame(unit = 1:10)
df3$treated <- FALSE

df3[sample(nrow(df3), 3), "treated"] <- TRUE
df3
# result as expected


Hope this helps,

Rui  Barradas



Às 13:49 de 19/06/2020, Sébastien Lahaie escreveu:

I ran into some strange behavior in R when trying to assign a treatment to
rows in a data frame. I'm wondering whether any R experts can explain
what's going on.

First, let's assign a treatment to 3 out of 10 rows as follows.


df <- data.frame(unit = 1:10)
df$treated <- FALSE
s <- sample(nrow(df), 3)
df[s,]$treated <- TRUE
df

unit treated

1 1   FALSE

2 2TRUE

3 3   FALSE

4 4   FALSE

5 5TRUE

6 6   FALSE

7 7TRUE

8 8   FALSE

9 9   FALSE

10   10   FALSE

This is as expected. Now we'll just skip the intermediate step of saving
the sampled indices, and apply the treatment directly as follows.


df <- data.frame(unit = 1:10)
df$treated <- FALSE
df[sample(nrow(df), 3),]$treated <- TRUE
df

unit treated

1 6TRUE

2 2   FALSE

3 3   FALSE

4 9TRUE

5 5   FALSE

6 6   FALSE

7 7   FALSE

8 5TRUE

9 9   FALSE

10   10   FALSE

Now the data frame still has 10 rows with 3 assigned to the treatment. But
the units are garbled. Units 1 and 4 have disappeared, for instance, and
there are duplicates for 6 and 9, one assigned to treatment and the other
to control. Why would this happen?

Thanks,
Sebastien

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Este e-mail foi verificado em termos de vírus pelo software antivírus Avast.
https://www.avast.com/antivirus

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Strange behavior when sampling rows of a data frame

2020-06-19 Thread Sébastien Lahaie
I ran into some strange behavior in R when trying to assign a treatment to
rows in a data frame. I'm wondering whether any R experts can explain
what's going on.

First, let's assign a treatment to 3 out of 10 rows as follows.

> df <- data.frame(unit = 1:10)

> df$treated <- FALSE

>

> s <- sample(nrow(df), 3)

> df[s,]$treated <- TRUE

>

> df

   unit treated

1 1   FALSE

2 2TRUE

3 3   FALSE

4 4   FALSE

5 5TRUE

6 6   FALSE

7 7TRUE

8 8   FALSE

9 9   FALSE

10   10   FALSE

This is as expected. Now we'll just skip the intermediate step of saving
the sampled indices, and apply the treatment directly as follows.

> df <- data.frame(unit = 1:10)

> df$treated <- FALSE

>

> df[sample(nrow(df), 3),]$treated <- TRUE

>

> df

   unit treated

1 6TRUE

2 2   FALSE

3 3   FALSE

4 9TRUE

5 5   FALSE

6 6   FALSE

7 7   FALSE

8 5TRUE

9 9   FALSE

10   10   FALSE

Now the data frame still has 10 rows with 3 assigned to the treatment. But
the units are garbled. Units 1 and 4 have disappeared, for instance, and
there are duplicates for 6 and 9, one assigned to treatment and the other
to control. Why would this happen?

Thanks,
Sebastien

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.