Hi Lukas,


I stand corrected! 



I have had issues with inconsistent (among functions) type coercion
before. Some of these issues have been resolved over time, and I assumed
this was another case of that. However, with some trivial testing, I
find that's not the case. I found the following situation on R 3.3.2:


- `min()` and `max()` call primitive (i.e., C) code, and work as
  expected on data frames (and data frame rows, which are actually
  data frames)
- `rowMeans()` explicitly converts data frames with `as. matrix()`, and
  so works as expected
- `sd()` explicitly converts data frames to `numeric()`, and works
  as expected
- `mean()` does *not* do any coercion, and fails with a warning on data
  frames (and rows)


Which means the message in the lesson is basically sound: sometimes R
functions will treat data frame rows as vectors, and sometimes they
don't, and there's no a priori way to know which is which or why!


With that in mind, I'll think about ways to improve the original callout
to clarify this, if I can.


Best,



Tyler

--

plantarum.ca







On Thu, Dec 15, 2016, at 07:59 AM, Lukas Weber wrote:

> Hi Tyler,

> 

> Thanks for your comment. I added this passage in a pull request about
> a year ago, after we had some problems at a workshop.
> 

> I don't remember all the details, but we definitely had problems on
> multiple machines. I think it may have been Windows computers only. We
> were using the current version of R at the time.
> 

> There are some more details in this pull request (closed):
> https://github.com/swcarpentry/r-novice-inflammation/pull/177
> 

> We included this passage simply to provide an easy fix (convert using
> "as.numeric()") for anyone else who has the same problem. I agree it's
> best not to introduce any unnecessary concepts too early -- hence we
> put it in a box and tried to keep it as simple and short as possible;
> while still including it directly in the course materials in case
> other instructors have the same problem. I remember it took us a few
> minutes to find a solution during the workshop, since it wasn't
> immediately clear what was causing the problem.
> 

> I tried the example again just now on my Mac, and it worked fine,
> without the fix. As you point out, the sliced row of the data frame
> should actually be automatically coerced when you use max(). Sliced
> columns are already numeric vectors, so no coercion is required there.
> 

> Re-working the whole lesson to remove this edge case would be
> difficult, since we would like to keep it consistent with the Python
> materials, especially using the same inflammation data set. Maybe
> someone else also has some views here?
> 

> Best regards,

> Lukas

> 

> 

> On Wed, Dec 14, 2016 at 4:09 AM, Tyler Smith
> <[email protected]> wrote:
>> Hi,

>> 

>>  I've been working through lesson one in the r-inflammation
>>  lesson.  It
>>  includes the following passage:

>> 

>>  > ## Forcing Conversion

>>  >

>>  > The code above may give you an error in some R installations,

>>  > since R does not automatically convert a sliced row of a
>>  > `data.frame` to a vector.
>>  > (Confusingly, sliced columns are automatically converted.)

>>  > If this happens, you can use the `as.numeric` command to convert
>>  > the row of data to a numeric vector:
>>  >

>>  > `patient_1 <- as.numeric(dat[1, ])`

>> 

>>  The example data is entirely numeric, with no missing values, and no
>>  non-numeric columns. In that case, type coercion should work as you
>>  expect. If it doesn't, I would be very surprised if the results
>>  depend
>>  on a particular R *installation*. It may be the case that older R

>>  *versions* did different things.  But I'm not sure about that. Can

>>  someone confirm which R versions require the explicit conversion
>>  of data
>>  to numeric in this example?

>> 

>>  coercion in R does have some ugly corner cases. If this is in
>>  fact one
>>  of them, I think it would be a good idea to rework the example
>>  so that
>>  it doesn't require this kind of fix.

>> 

>>  Incidentally, columns always work because a column by definition is
>>  composed of a single vector (which therefore has a single
>>  type). Rows
>>  can include data from different columns, and thus may have different
>>  types that need to be coerced into the lowest common denominator
>>  before
>>  we can use them. This isn't really confusing when you
>>  understand how a
>>  dataframe is constructed, but it's perhaps an issue that we
>>  don't need
>>  to throw at students in lesson 1.

>> 

>>  Best,

>> 

>>  Tyler

>>
>>  --
>> plantarum.ca
>>  _______________________________________________
>>  Discuss mailing list [email protected]
>>  http://lists.software-carpentry.org/listinfo/discuss
> _________________________________________________

> Discuss mailing list

> [email protected]

> http://lists.software-carpentry.org/listinfo/discuss


_______________________________________________
Discuss mailing list
[email protected]
http://lists.software-carpentry.org/listinfo/discuss

Reply via email to