Steve H, Interesting thread. To add fuel to the fire, have you seen R Inferno? http://www.burns-stat.com/pages/Tutor/R_inferno.pdf
Matthew On Fri, 2011-05-06 at 15:26 -0400, Joseph Voelkel wrote: > Steve H, > > > > As a R user, I sometimes make fundamental mistakes (like forgetting to > use collapse with the paste function when I want to collapse). > > > > However, R is a powerful language. It assumes the user knows what he > or she is doing unless something is almost certainly wrong (Steve L > provided some examples. This seems like the 80-90% you mentioned, but > it’s probably more in the 95%-99% range.) In my opinion, it is > unrealistic for you to make what are really programming mistakes on > your part (for what you INTENDED—if you INTENDED something else it > would not be a mistake) and then expect the software to be able to > read your INTENT. > > > > I am not a great programmer, but having worked with software that > prints out too many warnings—or worse, that will not let you do some > things because the programmers decided a user would be unlikely to > want to do this—I prefer R’s approach. > > > > Regarding the recycling note recently posted—yes, that may be a nice > option. (But will you need to need to have a third option: “don’t > print out recycling warnings for vectors of length 1”? That’s usually > done intentionally. > > > > Regards, > > > > Joe V. > > From: [email protected] > [mailto:[email protected]] On Behalf Of > Steve Harman > Sent: Friday, May 06, 2011 2:05 PM > To: Steve Lianoglou > Cc: [email protected] > Subject: Re: [datatable-help] using paste function while grouping > gives strange results > > > > > Steve, > > > > > > These are good examples of confusing statements. > > In same cases, people might prefer to use them intentionally for > certain purposes, > > > (even in that case, it would detract from the readability or > maintainability of programs). > > > On the other side of the coin, they are masking program errors. > > > It is a mistake that R overlooked such usability issues (i.e., > programmer usability). > > > And, two wrongs will not make a right. > > > > > I wouldn't go as much as saying that R should have been > > > a typed language, but I do strongly believe that R libraries can be > made > > > more user or developer friendly (still using the command line). > > > Using appropriate warnings in the places where you suspect that, with > 80-90% > > > probability, the user or programmer might be doing something > unexpected, > > > just issue a warning. > > > > > > > > On Fri, May 6, 2011 at 10:48 AM, Steve Lianoglou > <[email protected]> wrote: > > Hi Steve, > > As (another :-) aside -- make sure you use "reply-all" when replying > to messages from this (and pretty much all other R-related) mailing > lists, otherwise your mail goes straight to the person, and not back > to the list. > > Other comments in line: > > On Fri, May 6, 2011 at 10:29 AM, Steve Harman <[email protected]> > wrote: > > Steve, this works. > > Great! Glad to hear it. > > > > However, this discussion shows that we need some error or > > at least warning messages in this case. > > > For this particular case, I'd respectfully have to disagree. > > > > It is important to pay attention to user (in this case programmer) > > experience and facilitate recovery from > > mistakes by providing the user with meaningful and timely messages. > > thanks for all your help, > > > I would argue that what happened to you is actually "expected > behavior." > > You'll find that in many contexts, if "R" thinks it can figure out > what you intended to do with two vectors that aren't the same length, > it will try to be smart and do it. > > For instance, this is similar to what happened to you -- notice how > TRUE is recycled to be as long as the first column here: > > R> data.frame(id=letters[1:5], huh=TRUE) > id huh > 1 a TRUE > 2 b TRUE > 3 c TRUE > 4 d TRUE > 5 e TRUE > > Perhaps more strangely, but still "R-correct" (note no warning): > > R> 1:3 + 1:6 ## == c(1:3,1:3) + 1:6 > [1] 2 4 6 5 7 9 8 > > R thinks this is strange, but still does "something" for you (but > gives a warning since the 2nd vector isn't a multiple of the first > > R> 1:3 + 1:7 > [1] 2 4 6 5 7 9 8 > Warning message: > In 1:3 + 1:7 : > longer object length is not a multiple of shorter object length > > Often times I actually take advantage of the situation that happened > to you to expand a result into several rows (instead of just into 1) > when doing split/summarize/merge stuff with data.table's [, > by='something'] mojo. > > My 2 cents, > > -steve > > > > On Fri, May 6, 2011 at 9:44 AM, Steve Harman <[email protected]> > wrote: > >> > >> Thanks, I'll try it today and let you know. > >> > >> On Fri, May 6, 2011 at 12:22 AM, Steve Lianoglou > >> <[email protected]> wrote: > >>> > >>> Hi, > >>> > >>> As an aside -- in the future, please provide some data in a form > that > >>> we can just copy and paste from your email into an R session so > that > >>> we can get a working object up quickly. > >>> > >>> For example: > >>> > >>> R> dt <- data.table(coursecode=c(NA, NA, NA, 101, 102, 101, 102, > 103), > >>> student_id=c(1, 1, 1, 1, 1, 2, 2, 2), > >>> key='student_id') > >>> > >>> On Thu, May 5, 2011 at 10:54 PM, Steve Harman > <[email protected]> > >>> wrote: > >>> > Hello > >>> > > >>> > I have a data table called dt in which each student can have > multiple > >>> > records (created using data.table) > >>> > > >>> > coursecode student_id > >>> > ---------------- ---------------- > >>> > NA 1 > >>> > NA 1 > >>> > NA 1 > >>> > .... 1 > >>> > .... 1 > >>> > NA 2 > >>> > 101 2 > >>> > 102 2 > >>> > NA 2 > >>> > 103 2 > >>> > > >>> > I am trying to group by student id and concatenate the > coursecode > >>> > strings in > >>> > student records. This string is mostly NA but it can also be > real > >>> > course code > >>> > (because of messy real life data coursecode was not always > entered) > >>> > There are 999999 records. > >>> > > >>> > So, I thought I would get results like > >>> > > >>> > 1 NA NA NA ..... > >>> > 2 NA 101 102 NA 123 .... > >>> > >>> What type of object are you expecting that result to be? > >>> > >>> > However, as seen below, it brings me a result with 999999 rows > >>> > and it fails to concatenate the coursecode's. > >>> > > >>> >> codes <- dt[,paste(coursecode),by=student_id] > >>> >> codes > >>> > student_id V1 > >>> > [1,] 1 NA > >>> > [2,] 1 NA > >>> > [3,] 1 NA > >>> > [4,] 1 NA > >>> > [5,] 1 NA > >>> > [6,] 1 NA > >>> > [7,] 1 NA > >>> > [8,] 1 NA > >>> > [9,] 1 NA > >>> > [10,] 1 NA > >>> > First 10 rows of 999999 printed. > >>> > > >>> > If I repeat the same example for a numeric attribute and use > some math > >>> > aggregation functions such as sum, mean, etc., then the number > of rows > >>> > returned is correct, it is indeed equal to the number of > students. > >>> > > >>> > I was wondering if the problem is with NA's or with the use of > paste > >>> > as the aggregation function. I can alternatively use RMySQL with > MySQL > >>> > to concatenate those strings but I would like to use data.table > if > >>> > possible. > >>> > >>> What if you try this (using my `dt` example from above): > >>> > >>> R> dt[, paste(coursecode, collapse=","), by=student_id] > >>> student_id V1 > >>> [1,] 1 NA,NA,NA,101,102 > >>> [2,] 2 101,102,103 > >>> > >>> Note that each element in the $V1 column is a character vector of > >>> length 1 and not individual course codes. > >>> > >>> Without using the `collapse` argument to your call to paste, you > just > >>> get a character vector which is the same length as you passed in, > eg: > >>> > >>> R> paste(c('A', 'B', NA, 'C')) > >>> [1] "A" "B" "NA" "C" > >>> > >>> vs. > >>> > >>> R> paste(c('A', 'B', NA, 'C'), collapse=",") > >>> [1] "A,B,NA,C" > >>> > >>> HTH, > >>> > >>> -steve > >>> > >>> -- > >>> Steve Lianoglou > >>> Graduate Student: Computational Systems Biology > >>> | Memorial Sloan-Kettering Cancer Center > >>> | Weill Medical College of Cornell University > >>> Contact Info: http://cbio.mskcc.org/~lianos/contact > >> > > > > > > > > > > -- > > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > > > > > > _______________________________________________ > datatable-help mailing list > [email protected] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
