Hi Tyler and Lukas, How about including these considerations in the Instructor Notes (http://swcarpentry.github.io/r-novice-inflammation/guide/)?
Best, Marianne On Thu, Dec 15, 2016 at 11:10 AM, Tyler Smith <[email protected]> wrote: > Hi Lukas, > > I stand corrected! > > I have had issues with inconsistent (among functions) type coercion before. > Some of these issues have been resolved over time, and I assumed this was > another case of that. However, with some trivial testing, I find that's not > the case. I found the following situation on R 3.3.2: > > - `min()` and `max()` call primitive (i.e., C) code, and work as expected on > data frames (and data frame rows, which are actually data frames) > - `rowMeans()` explicitly converts data frames with `as. matrix()`, and so > works as expected > - `sd()` explicitly converts data frames to `numeric()`, and works as > expected > - `mean()` does *not* do any coercion, and fails with a warning on data > frames (and rows) > > Which means the message in the lesson is basically sound: sometimes R > functions will treat data frame rows as vectors, and sometimes they don't, > and there's no a priori way to know which is which or why! > > With that in mind, I'll think about ways to improve the original callout to > clarify this, if I can. > > Best, > > Tyler > -- > plantarum.ca > > > > On Thu, Dec 15, 2016, at 07:59 AM, Lukas Weber wrote: > > Hi Tyler, > > Thanks for your comment. I added this passage in a pull request about a year > ago, after we had some problems at a workshop. > > I don't remember all the details, but we definitely had problems on multiple > machines. I think it may have been Windows computers only. We were using the > current version of R at the time. > > There are some more details in this pull request (closed): > https://github.com/swcarpentry/r-novice-inflammation/pull/177 > > We included this passage simply to provide an easy fix (convert using > "as.numeric()") for anyone else who has the same problem. I agree it's best > not to introduce any unnecessary concepts too early -- hence we put it in a > box and tried to keep it as simple and short as possible; while still > including it directly in the course materials in case other instructors have > the same problem. I remember it took us a few minutes to find a solution > during the workshop, since it wasn't immediately clear what was causing the > problem. > > I tried the example again just now on my Mac, and it worked fine, without > the fix. As you point out, the sliced row of the data frame should actually > be automatically coerced when you use max(). Sliced columns are already > numeric vectors, so no coercion is required there. > > Re-working the whole lesson to remove this edge case would be difficult, > since we would like to keep it consistent with the Python materials, > especially using the same inflammation data set. Maybe someone else also has > some views here? > > Best regards, > Lukas > > > On Wed, Dec 14, 2016 at 4:09 AM, Tyler Smith <[email protected]> wrote: > > Hi, > > I've been working through lesson one in the r-inflammation lesson. It > includes the following passage: > >> ## Forcing Conversion >> >> The code above may give you an error in some R installations, >> since R does not automatically convert a sliced row of a `data.frame` to a >> vector. >> (Confusingly, sliced columns are automatically converted.) >> If this happens, you can use the `as.numeric` command to convert the row >> of data to a numeric vector: >> >> `patient_1 <- as.numeric(dat[1, ])` > > The example data is entirely numeric, with no missing values, and no > non-numeric columns. In that case, type coercion should work as you > expect. If it doesn't, I would be very surprised if the results depend > on a particular R *installation*. It may be the case that older R > *versions* did different things. But I'm not sure about that. Can > someone confirm which R versions require the explicit conversion of data > to numeric in this example? > > coercion in R does have some ugly corner cases. If this is in fact one > of them, I think it would be a good idea to rework the example so that > it doesn't require this kind of fix. > > Incidentally, columns always work because a column by definition is > composed of a single vector (which therefore has a single type). Rows > can include data from different columns, and thus may have different > types that need to be coerced into the lowest common denominator before > we can use them. This isn't really confusing when you understand how a > dataframe is constructed, but it's perhaps an issue that we don't need > to throw at students in lesson 1. > > Best, > > Tyler > > -- > plantarum.ca > _______________________________________________ > Discuss mailing list > [email protected] > http://lists.software-carpentry.org/listinfo/discuss > > _______________________________________________ > Discuss mailing list > [email protected] > http://lists.software-carpentry.org/listinfo/discuss > > > > _______________________________________________ > Discuss mailing list > [email protected] > http://lists.software-carpentry.org/listinfo/discuss _______________________________________________ Discuss mailing list [email protected] http://lists.software-carpentry.org/listinfo/discuss
