Hello everyone,

For anyone who followed/contributed to this thread and/or may be interested in doing something similar, a write-up of the session is now available [1], including the materials I developed for this purpose [2]. Comments, questions, and the like welcome, of course!

Laura

[1] http://neuroanatody.com/2017/12/oxford-reproducibility-lectures-laura-fortunato/ [2] http://neuroanatody.com/wp-content/uploads/2017/12/Oxford-Reproducibility-School.zip

On 16/10/17 21:15, Laura Fortunato wrote:

Thanks everyone for the input! I didn't get this until after the talk was set up, but there are definitely a number of useful ideas/links/resources here for future reference.

This is what I ended up doing, in case anyone is looking to do something similar --- I'd be happy to provide additional information and/or share the materials I prepared. Basically, the talk included a first part with some background/"motivational" material, and a second part with a demo to illustrate how I would go about doing a simple task manually vs. shell scripting. I emphasised that the second part aimed to provide a mental model of how to go about the task, rather than to teach a specific technique or tool.

The task was based on the "molecules" example/data in the Software Carpentry shell-novice lesson. I introduced it as: "assume that your supervisor gave you a set of text files (e.g. machine output) and asked you to find the file with the smallest number of lines."

Then I demonstrated how I would do this by "pointing-and-clicking", i.e. opening each file individually in a text editor, counting the number of lines, making a note in a separate file, etc. At each step, I emphasised where things could go wrong, e.g. mistakenly reading data for the same file twice, errors in transcribing the line count, etc. I also pointed out how this approach would not scale e.g. beyond the handful of files in the example ("what if instead of 6 output files, I had 600?"), or if new output files were added at a later stage, and so on.

Next, I completed the task in the shell. The presentation was projected on two screens. On one screen I had slides with the commands (heavily commented, so that people could follow along), and on the other I typed the commands at the terminal. I covered basic operations (cd, ls, more, head, wc, sort), redirection, and pipes --- all at the simplest level.

I concluded by executing the task with a simple script I had prepared beforehand, which effectively "recapitulated" the commands I had typed at the terminal. Also beforehand I had prepared a repository with the data files and the script under version control. The idea here was simply to show that you can keep track of who did what, when, etc, by printing the log to screen.

The session was quite interactive, and from the feedback I got from the students, it seemed that they did appreciate the pitfalls of the GUI-based approach, as well as the potential benefits of the alternative approach. The demo took ~15 mins. Overall, it seemed like a useful thing to do. If I were to expand it, I would include a simple visualization, as suggested by Tracy, Bianca, and others in the thread.

On 29/09/17 09:06, Bianca Peterson wrote:
Hi all,

I agree with Tracy about getting to the visualizatuons as quickly as possible. Sorry for the delayed reply Laura, but the following might be useful for future reference.

I've managed to convince a few people to consider using R (or rather RStudio), simply by running 4 commands in RStudio (from the DC R Ecology lesson): 1. download.file("https://ndownloader.figshare.com/files/2292169";, "data/portal_data_joined.csv")
2. surveys <-read.csv('data/portal_data_joined.csv')
3. summary(surveys)
4. plot(surveys$sex)

I usually emphasize the dimensions (or size) of this data and the speed with which R executes the commands, and then ask "How many clicks would it take to get these results in Excel?". They usually then smile and ask when the next Carpentry workshop will be.

Thanks to everyone for sharing great resources and advice!

Bianca

On 29 Sep 2017 15:51, "Tracy Teal" <[email protected] <mailto:[email protected]>> wrote:

    Hi Laura,

    This is a really neat idea, and I'm sorry, it sounds like it's
    too late already for ideas for more ideas for your presentation.
    Let us know how it went! This seems like a generally useful kind
    of presentation to have available though, and these ideas have
    been great.

    A class at UC Davis does an exercise where they have people fill
    out a survey on random things, like how many siblings do you
    have, what is your favorite color, what kind of shoes are you
    wearing, are you a cat person or a dog person? Create the survey
    so it makes intentionally confusing data, for instance leaving
    number of siblings as a fill in the blank rather than as a drop
    down numerical response.

    Then show the data, and show how messy the data is. Then demo how
    to clean it up and do some visualizations. In a half hour (if you
    knew generally what kind of data was going to be produced), you
    could have people fill out the survey, show the data and do a
    clean up and visualization with command line and Python or R. You
    could maybe get version control in there too to show how you
    could change the script. Maybe the messy data part is too much
    for a half hour, but you could have a survey that creates cleaner
    data.

    Getting to visualizations in a short amount of time seems to be
    the thing that really is exciting to people. Especially when they
    don't have a good idea of how they would have approached it in
    something like Excel.

    Best,
    -Tracy

    On Wed, Sep 27, 2017 at 3:06 PM, Moore, Nathan T
    <[email protected] <mailto:[email protected]>> wrote:

        I havn't tried what you're attempting, but here's a idea.
        Describe the computer/lab notebook side of a data intensive
        project, estimate the time associated with things like
        clicking and dragging and computing by hand, and then show a
        brief example in which that time is reduced (substantially). 
        Eg, tell the story from one of the learner profiles in more
        detail, in a context that the MS students would be familiar
        with.


        I assume you've seen learner profiles?

        https://software-carpentry.org/audience/
        <https://software-carpentry.org/audience/>

        Nathan

        ------------------------------------------------------------------------
        *From:* Discuss <[email protected]
        <mailto:[email protected]>> on
        behalf of Laura Fortunato <[email protected]
        <mailto:[email protected]>>
        *Sent:* Thursday, September 21, 2017 8:44:14 AM
        *To:* [email protected]
        <mailto:[email protected]>
        *Subject:* [Discuss] core concepts for novices in 30 mins

        Hello list,

        I am looking for input on how to introduce core concepts
        about reproducibility, effective research computing, etc to
        complete novices in a 1/2-hour slot. Any
        ideas/suggestions/materials welcome!

        The background: I have been asked to give a talk on effective
        computing for research reproducibility at the Oxford
        Reproducibility School next week. The target audience is a
        group of incoming masters-level students in psychology, most
        of whom I assume will be complete novices.

        Normally, given the format (30-min presentation + 10 mins for
        questions) I would give a "motivational" talk, and then point
        people to various resources (including Carpentry workshops,
        lessons). However, this slot is part of a much longer event,
        including "motivational" talks and talks on
        discipline-specific tools (e.g. open, reproducible
        neuroimaging) by several others.

        Looking at the programme, it seems that what will not be
        covered are the "basic" tools/skills taught in a standard
        Software Carpentry workshop --- shell, version control,
        programming.

        So, one idea I have been toying with is to do a brief
        demonstration of these tools to have the students see them
        "in action". However, I am not sure this is possible in a
        1/2-hour slot.

        Does anyone have experience doing something similar, or can
        anyone point me to resources that do this? If anyone has
        tried and failed, it would also be good to know, of course.

        Thanks for any input!
        Laura

-- *Laura Fortunato* || Associate Professor of Evolutionary
        Anthropology | University of Oxford || External Professor |
        Santa Fe Institute ||


_______________________________________________
Discuss mailing list
[email protected]
http://lists.software-carpentry.org/listinfo/discuss

Reply via email to