Re: [R] Confused about using data.table package,

2017-02-21 Thread P Tennant
aggregate(), tapply(), do.call(), rbind() (etc.) are extremely useful 
functions that have been available in R for a long time. They remain 
useful regardless what plotting approach you use - base graphics, 
lattice or the more recent ggplot.


Philip


On 22/02/2017 8:40 AM, C W wrote:

Hi Carl,

I have not fully learned dplyr, but it seems harder than tapply() and the
?apply() family in general.

Almost every ggplot2 data I have seen is manipulated using dplyr. Something
must be good about dplyr.

aggregate(), tapply(), do.call(), rbind() will be sorely missed! :(

Thanks!

On Tue, Feb 21, 2017 at 4:21 PM, Carl Sutton  wrote:


Hi

I have found that:
A)  Hadley's new book to be wonderful on how to use dplyr, ggplot2 and his
other packages.  Read this and using as a reference saves major frustration.
b)  Data Camps courses on ggplot2 are also wonderful.  GGPLOT2 has more
capability than I have mastered or needed.  To be an expert with ggplot2
will take some effort.  To just get run of the mill helpful, beautiful
plots, no major time needed for that.

I use both of these sources regularly, especially when what is in my grey
matter memory banks is not working.  Refreshers are sometimes needed.

If your data sets are large and available memory limited, then data.table
is the package I use.   I am amazed at the difference of memory usage with
data.table versus other packages.  My laptop has 16gb ram, and tidyr maxed
it but data.table melt used less than 6gb(if I remember correctly) on my
current work.  Since discovering fread and fwrite, read.table, read.csv,
and write have been benched.   Every script I have includes
library(data.table)

Carl Sutton


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Confused about using data.table package,

2017-02-21 Thread C W
Hi Carl,

I have not fully learned dplyr, but it seems harder than tapply() and the
?apply() family in general.

Almost every ggplot2 data I have seen is manipulated using dplyr. Something
must be good about dplyr.

aggregate(), tapply(), do.call(), rbind() will be sorely missed! :(

Thanks!

On Tue, Feb 21, 2017 at 4:21 PM, Carl Sutton  wrote:

> Hi
>
> I have found that:
> A)  Hadley's new book to be wonderful on how to use dplyr, ggplot2 and his
> other packages.  Read this and using as a reference saves major frustration.
> b)  Data Camps courses on ggplot2 are also wonderful.  GGPLOT2 has more
> capability than I have mastered or needed.  To be an expert with ggplot2
> will take some effort.  To just get run of the mill helpful, beautiful
> plots, no major time needed for that.
>
> I use both of these sources regularly, especially when what is in my grey
> matter memory banks is not working.  Refreshers are sometimes needed.
>
> If your data sets are large and available memory limited, then data.table
> is the package I use.   I am amazed at the difference of memory usage with
> data.table versus other packages.  My laptop has 16gb ram, and tidyr maxed
> it but data.table melt used less than 6gb(if I remember correctly) on my
> current work.  Since discovering fread and fwrite, read.table, read.csv,
> and write have been benched.   Every script I have includes
> library(data.table)
>
> Carl Sutton
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Confused about using data.table package,

2017-02-21 Thread Carl Sutton via R-help
Hi
I have found that:A)  Hadley's new book to be wonderful on how to use dplyr, 
ggplot2 and his other packages.  Read this and using as a reference saves major 
frustration.
b)  Data Camps courses on ggplot2 are also wonderful.  GGPLOT2 has more 
capability than I have mastered or needed.  To be an expert with ggplot2 will 
take some effort.  To just get run of the mill helpful, beautiful plots, no 
major time needed for that.
I use both of these sources regularly, especially when what is in my grey 
matter memory banks is not working.  Refreshers are sometimes needed. 

If your data sets are large and available memory limited, then data.table is 
the package I use.   I am amazed at the difference of memory usage with 
data.table versus other packages.  My laptop has 16gb ram, and tidyr maxed it 
but data.table melt used less than 6gb(if I remember correctly) on my current 
work.  Since discovering fread and fwrite, read.table, read.csv, and write have 
been benched.   Every script I have includes library(data.table)

Carl Sutton
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Confused about using data.table package,

2017-02-21 Thread peter dalgaard
Just. Don't. Do. This. (Hint: Threading mail readers.)

On 21 Feb 2017, at 03:53 , C W  wrote:

> Thanks Hadley!
> 
> While I got your attention, what is a good way to get started on ggplot2? ;)

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Confused about using data.table package,

2017-02-21 Thread Jeff Newmiller
I suspect Hadley would recommend reading his new book, R for Data Science 
(r4ds.had.co.nz), in particular Chapter 3. You don't need plyr, but it won't 
take long before you will want to be using dplyr and tidyr, which are covered 
in later chapters.
-- 
Sent from my phone. Please excuse my brevity.

On February 20, 2017 6:53:29 PM PST, C W  wrote:
>Thanks Hadley!
>
>While I got your attention, what is a good way to get started on
>ggplot2? ;)
>
>My impression is that I first need to learn plyr, dplyr, AND THEN
>ggplot2.
>That's A LOT!
>
>Suppose i have this:
>iris
>iris2 <- cbind(iris, grade = sample(1:5, 150, replace = TRUE))
>iris2
>
>I want to have some kind of graph conditioned on species, by grade .
>What's
>a good lead to learn about plotting this?
>
>Thank you!
>
>
>
>On Mon, Feb 20, 2017 at 11:12 AM, Hadley Wickham 
>wrote:
>
>> On Sun, Feb 19, 2017 at 3:01 PM, David Winsemius
>
>> wrote:
>> >
>> >> On Feb 19, 2017, at 11:37 AM, C W  wrote:
>> >>
>> >> Hi R,
>> >>
>> >> I am a little confused by the data.table package.
>> >>
>> >> library(data.table)
>> >>
>> >> df <- data.frame(w=rnorm(20, -10, 1), x= rnorm(20, 0, 1),
>y=rnorm(20,
>> 10, 1),
>> >> z=rnorm(20, 20, 1))
>> >>
>> >> df <- data.table(df)
>> >
>> >   df <- setDT(df) is preferred.
>>
>> Don't you mean just
>>
>> setDT(df)
>>
>> ?
>>
>> setDT() modifies by reference.
>>
>> >>
>> >> df_3 <- df[, a := x-y] # created new column a using x minus y, why
>are
>> we
>> >> using colon equals?
>> >
>> > You need to do more study of the extensive documentation. The
>behavior
>> of the ":=" function is discussed in detail there.
>>
>> You can get to that documentation with ?":="
>>
>> Hadley
>>
>> --
>> http://hadley.nz
>>
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Confused about using data.table package,

2017-02-20 Thread C W
Thanks Hadley!

While I got your attention, what is a good way to get started on ggplot2? ;)

My impression is that I first need to learn plyr, dplyr, AND THEN ggplot2.
That's A LOT!

Suppose i have this:
iris
iris2 <- cbind(iris, grade = sample(1:5, 150, replace = TRUE))
iris2

I want to have some kind of graph conditioned on species, by grade . What's
a good lead to learn about plotting this?

Thank you!



On Mon, Feb 20, 2017 at 11:12 AM, Hadley Wickham 
wrote:

> On Sun, Feb 19, 2017 at 3:01 PM, David Winsemius 
> wrote:
> >
> >> On Feb 19, 2017, at 11:37 AM, C W  wrote:
> >>
> >> Hi R,
> >>
> >> I am a little confused by the data.table package.
> >>
> >> library(data.table)
> >>
> >> df <- data.frame(w=rnorm(20, -10, 1), x= rnorm(20, 0, 1), y=rnorm(20,
> 10, 1),
> >> z=rnorm(20, 20, 1))
> >>
> >> df <- data.table(df)
> >
> >   df <- setDT(df) is preferred.
>
> Don't you mean just
>
> setDT(df)
>
> ?
>
> setDT() modifies by reference.
>
> >>
> >> df_3 <- df[, a := x-y] # created new column a using x minus y, why are
> we
> >> using colon equals?
> >
> > You need to do more study of the extensive documentation. The behavior
> of the ":=" function is discussed in detail there.
>
> You can get to that documentation with ?":="
>
> Hadley
>
> --
> http://hadley.nz
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Confused about using data.table package,

2017-02-20 Thread David Winsemius

> On Feb 20, 2017, at 8:12 AM, Hadley Wickham  wrote:
> 
> On Sun, Feb 19, 2017 at 3:01 PM, David Winsemius  
> wrote:
>> 
>>> On Feb 19, 2017, at 11:37 AM, C W  wrote:
>>> 
>>> Hi R,
>>> 
>>> I am a little confused by the data.table package.
>>> 
>>> library(data.table)
>>> 
>>> df <- data.frame(w=rnorm(20, -10, 1), x= rnorm(20, 0, 1), y=rnorm(20, 10, 
>>> 1),
>>> z=rnorm(20, 20, 1))
>>> 
>>> df <- data.table(df)
>> 
>>  df <- setDT(df) is preferred.
> 
> Don't you mean just
> 
> setDT(df)
> 
> ?
> 
> setDT() modifies by reference.

Thanks for the correction.


> 
>>> 
>>> df_3 <- df[, a := x-y] # created new column a using x minus y, why are we
>>> using colon equals?
>> 
>> You need to do more study of the extensive documentation. The behavior of 
>> the ":=" function is discussed in detail there.
> 
> You can get to that documentation with ?":="

That's a good place to start reading, but I was thinking of 
data.table::datatable-faq, data.table::datatable-intro which are on the 
Vignettes page from: help(pac=data.table).

> 
> Hadley
> 
> -- 
> http://hadley.nz

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Confused about using data.table package,

2017-02-20 Thread Hadley Wickham
On Sun, Feb 19, 2017 at 3:01 PM, David Winsemius  wrote:
>
>> On Feb 19, 2017, at 11:37 AM, C W  wrote:
>>
>> Hi R,
>>
>> I am a little confused by the data.table package.
>>
>> library(data.table)
>>
>> df <- data.frame(w=rnorm(20, -10, 1), x= rnorm(20, 0, 1), y=rnorm(20, 10, 1),
>> z=rnorm(20, 20, 1))
>>
>> df <- data.table(df)
>
>   df <- setDT(df) is preferred.

Don't you mean just

setDT(df)

?

setDT() modifies by reference.

>>
>> df_3 <- df[, a := x-y] # created new column a using x minus y, why are we
>> using colon equals?
>
> You need to do more study of the extensive documentation. The behavior of the 
> ":=" function is discussed in detail there.

You can get to that documentation with ?":="

Hadley

-- 
http://hadley.nz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Confused about using data.table package,

2017-02-19 Thread David Winsemius

> On Feb 19, 2017, at 11:37 AM, C W  wrote:
> 
> Hi R,
> 
> I am a little confused by the data.table package.
> 
> library(data.table)
> 
> df <- data.frame(w=rnorm(20, -10, 1), x= rnorm(20, 0, 1), y=rnorm(20, 10, 1),
> z=rnorm(20, 20, 1))
> 
> df <- data.table(df)

  df <- setDT(df) is preferred.
> 
> #drop column w
> 
> df_1 <- df[, w := NULL] # I thought you are supposed to do: df_1 <- df[, -w]

Nope. The "[.data.table" function is very different from the "[.data.frame' 
function. As you should be able to see, an expression in the `j` position for 
"[.data.table" gets evaluated in the environment of the data.table object, so 
unquoted column names get returned after application of any function. Here it's 
just a unary minus. 

Actually "nope" on two accounts. You cannot use a unary minus for column names 
in `[.data.frame` either. Would have needed to be df[ , !colnames(df) in "w"]  
# logical indexing


> 
> df_2 <- df[x 
> df_3 <- df[, a := x-y] # created new column a using x minus y, why are we
> using colon equals?

You need to do more study of the extensive documentation. The behavior of the 
":=" function is discussed in detail there.

> 
> I am a bit confused by this syntax.

It's non-standard for R but many people find the efficiencies of the package 
worth the extra effort to learn what is essentially a different evaluation 
strategy.


> 
> Thanks!
> 
>   [[alternative HTML version deleted]]

Rhelp is a plain text mailing list,

-- 
David
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Confused about using data.table package,

2017-02-19 Thread C W
Hi R,

I am a little confused by the data.table package.

library(data.table)

df <- data.frame(w=rnorm(20, -10, 1), x= rnorm(20, 0, 1), y=rnorm(20, 10, 1),
z=rnorm(20, 20, 1))

df <- data.table(df)

#drop column w

df_1 <- df[, w := NULL] # I thought you are supposed to do: df_1 <- df[, -w]

df_2 <- df[x