On Thu, May 05, 2016 at 01:38:57AM -0400, Spaced Girl wrote:
> Think I had my first reply vs. reply all mistake. Just commented on Jan's
> Word and Powerpoint 'all wrong' remarks. Now unsure how this is going to be
> filed. Guess I'll see. Hate the complexity of email.
> -S.G.
>
> ---------- Forwarded message ----------
> From: Spaced Girl <[email protected]>
> Date: Thu, May 5, 2016 at 1:33 AM
> Subject: Re: [Discuss] Word and PowerPoint "all wrong"? Was Re: SWC for
> high school (16-18)
> To: Jan T Kim <[email protected]>
>
>
> Re Jan's comment:
> *"Excel, on the other hand, is truly evil as soon as it's used for any data
> analysis. Execl is just one big violation of the principle of spearating
> data from code."*
> I completely disagree and am quite amused/curious about your own
> experiences with Excel cause mine have been pretty good.
I have virtually no experience with using Excel for my own work, but
plenty of experience of having to work with materials that others
give me in Excel files. From this perspective, my experience is that
spreadsheets handled by just one person are in good shape -- quite
probably like yours. However, there are others (in my discipline,
the biosciences, anyway) who just can't help themselves and will make
alterations to primary data when left to play / work with a spreadsheet.
Spreadsheets that have passed through the hands / computers of a few
people are almost always messed up. Typically, there are sheets with
several rectangles that contain tables of unrelated data floating
in one sheet, cells, rows and columns added and filled with random
remarks or materials that impede any attempt to programmatically
process the data, and often data are altered in place in an unsystematic
way (e.g. some columns are centred / normalised in preparation for
generating some plot, but those not selected for the plot are left
unaltered), or they are just plain corrupted (e.g. by sorting only
one or some columns, disrupting or deleting parts of those floating
rectangles by inserting / deleting / manipulating rows or columns
while the rectangle is scrolled out of view etc.etc.).
All these patterns could be considered as bad practice and the argument
can thus be made that it's not the tool's (Excel's) fault if it's
improperly used. I've seen so many cases of such practices, though,
that I've come to think there's much more hope in encouraging people
to use alternative tools that are more "naturally" conducive to
systematic and reproducible analysis than there's hope to educate
people to use spreadsheets in ways that are suitable for science.
> For background,
> was stuck using nothing but Excel for all of my masters and it's the only
> database/graphing program I've actually had formal lessons in that stuck.
> My thoughts on Excel is that it's a more ideal data analysis tool for small
> data sets and basic calculations.
True -- and it's part of the "evil", like the witches in "Macbeth", who
feed their victim some small truth in order to lure him to a path towards
tragedy and doom. Or, from a technical debt perspective: For analysing
a small set of data just once, any tool is ok -- if nobody ever has to
run the thing again, nobody will have to "pay off the debt" either. But
in my experience, anything that's even moderately interesting will prompt
requests for repeats. And that's when spreadsheets really start to become
a liability, because code and data is not separated and so it's not really
possible to run / re-use the same code on a new set of data (and repeatedly
running the same analysis on a data set that's changing over time is even
more of a nightmare). Basically, this is my interpretation of the relevance
of the "Nelle" scenario in the SWC shell materials.
> Especially love the interface [use of tabs etc] but one can only do so much
> with excel or by hand. If I could redo my masters....I'd have started
> learning python at the beginning of it and made some use of it but not
> exclusively. Didn't even bother with a programming approach [would have
> been some other program like Fortran or Matlab] given that the learning of
> it and checking to ensure I wasn't making an catastrophic errors would have
> been more time than it was worth.
>
> More than anything...think the continued value of word/excel/powerpoint is
> with their long term use and rather static nature. [haven't actually used
> new versions if any from the last couple of years].
As far as I'm aware, general experience suggests that Excel isn't that
static / consistent / reproducible when files are handled on multiple
computers and with multiple versions of Excel. Personally, I've repeatedly
seen weird / unexpected effects resulting from different locales (language
settings etc.), dates are particularly prone to being mangled by that.
> Not wrong from my point of view but definitely so for some things and
> easier to learn than all those other programs scientists/programmers are
> always telling people they ought to use. If making substantial complicated
> use of something; different software might be more ideal for all usage.
>
> Haven't read the second article yet. Hope to remember to do so. Curious as
> to what problems were presented there.
In addition to the Baggerly & Coombes article on arXiv, there's a
discussion of the case here in the Times Higher Education:
https://www.timeshighereducation.com/features/systems-failure/416000.article?sectioncode=26&storycode=416000
(sorry, paywalled, but they let you read a few articles per month for
free) -- which I think is an indication of growing awareness of the
issues.
Best regards, Jan
> On Wed, May 4, 2016 at 6:37 PM, Jan T Kim <[email protected]> wrote:
>
> > Dear Mike & All,
> >
> > On Wed, May 04, 2016 at 04:33:44PM +0100, Michael J Jackson wrote:
> > > Hi,
> > >
> > > Quoting Dirk Eddelbuettel <[email protected]> on Tue, 3 May 2016 15:53:14
> > -0500:
> > >
> > > >My two daughters are in that very age bracket. The older one is off to
> > > >college in the fall and just did a year-long reasearch project which,
> > per the
> > > >instructions of her teacher, did it 'all wrong' by our standards: data
> > > >analysis, regression, charts in Excel; write-up in Word and
> > presentations in
> > > >Powerpoint.
> > >
> > > As one who writes everything in MarkDown by preference, are Word and
> > > PowerPoint "all wrong"? Yes, their binary formats don't play so well
> > > with revision control than plain-text formats such as MarkDown or
> > > LaTeX, for example (but sticking them under revision control is
> > > still of great benefit). In other ways they're superior: WYSIWYG
> > > editors, no compilation steps, PDF-generation from within the tool,
> > > and they're ubiquitous. Similarly, for some tasks they allow a user
> > > to "do more in less time with less pain" than the alternatives*.
> >
> > I doubt that this is true in the long run -- it's a typical "short
> > term gain for a long term pain", or a case of technical debt [1],
> > which is a concept that may be considered to come from "hardcore"
> > software engineering but which I think is well worth pondering.
> >
> > Concretely, the pain manifests itself when it comes to further working
> > with the material. For Word, my main concern is the long term viability
> > of the format, i.e. ability to even just print a document. A LaTeX
> > paper is readable as long as the ASCII / Unicode is not lost from
> > humankind's collective knowledge base, but reading a Word document
> > that was never opened / migrated since the turn of the millennium is
> > a lottery game.
> >
> > Issues like playing more or less well with revision management etc.
> > are in my view rather secondary compared to that of long term viability.
> > An issue that irks me personally is that of "PDF generation from within
> > the tool" -- that's just inconsistent with the "do one thing and do
> > that well" approach, but I think that's secondary as well. On the
> > whole, I'm increasingly inclined to think people should use whatever
> > they like as long as it's only for writing and not for processing
> > data in any way. If a group of authors wants to send Word files by
> > email back and forth between them, that seems ok with me so long as
> > (1) they manage to safely store their final agreed version of the
> > manuscript (preferrably as a PDF in addition to Word) and (2) they
> > they generate figures and tables in a reproducible / re-executable
> > way.
> >
> > Excel, on the other hand, is truly evil as soon as it's used for any
> > data analysis. Execl is just one big violation of the principle of
> > spearating data from code. As far as I'm concerned, the issues
> > uncovered by Baggerly & Coombes [2] are quite sufficient to advise
> > against using spreadsheets for handling or analysing any kind of
> > scientific data. Through the lens of technical debt, that is paid by
> > the scientific community following after you, and that, in my view,
> > and incurring debt like that is something I consider just not
> > acceptable.
> >
> > Best regards, Jan
> >
> >
> > [1] http://martinfowler.com/bliki/TechnicalDebt.html
> > [2] http://arxiv.org/abs/1010.1092
> >
> >
> >
> > > cheers,
> > > mike
> > >
> > > * Having spent more than the 5 minutes it should have taken
> > > yesterday trying (and failing even with Google's help) to put a
> > > hyperlink to a Wikipedia page with multiple underscores in a LaTeX
> > > document and have it clickable in the resulting PDF.
> > >
> > >
> > > --
> > > The University of Edinburgh is a charitable body, registered in
> > > Scotland, with registration number SC005336.
> > >
> > >
> > >
> > > _______________________________________________
> > > Discuss mailing list
> > > [email protected]
> > >
> > http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org
> >
> > --
> > +- Jan T. Kim -------------------------------------------------------+
> > | email: [email protected] |
> > | WWW: http://www.jtkim.dreamhosters.com/ |
> > *-----=< hierarchical systems are for files, not for humans >=-----*
> >
> > _______________________________________________
> > Discuss mailing list
> > [email protected]
> >
> > http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org
> >
--
+- Jan T. Kim -------------------------------------------------------+
| email: [email protected] |
| WWW: http://www.jtkim.dreamhosters.com/ |
*-----=< hierarchical systems are for files, not for humans >=-----*
_______________________________________________
Discuss mailing list
[email protected]
http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org