Re: [Discuss] Data disassociation (or keeping it real)

Jan Kim Sat, 07 May 2016 08:51:01 -0700

Dear  Tom, dear All,

I think work patterns that involve "constant visualisation" are quite
tightly coupled to working with image data. In other fields, such as
sequence analysis, the objects of enquiry don't lend themselves to any
meaningful visualisation, and the predominant modes of working with them
are logical / non-visual. The scenario of Nelle in the shell lesson is a
good example of this -- visually checking thousands of pacific garbage
gyre protein files isn't really feasible, but the programmatic ways of
`ls *[^AB].txt`, `wc -l *.txt | sort | head` etc. are viable and they
scale. Perhaps these could be considered as simple "quality metrics".


In a more general perspective, it seems to me that becoming more
independent from data is an indicator of progress in scientific
understanding. If we don't understand a thing or phenomenon, all we can
do is gather and record data. But once we have a principled scientific
model, we can predict the phenomenon in question, and deduce from the
model which data is necessary for the prediction. And computing often
has great potential to facilitate such progress. From this perspective,
"dissociation" from data is not necessarily a bad thing.

Best regards, Jan


On Fri, May 06, 2016 at 05:07:24AM +0000, Tom Wright wrote:
> I was inspired to post this by by one of the posts in the "word /
> PowerPoint all wrong" thread.
> In my opionion, One of the pitfalls of 'our' programmatic way of working
> with data is that it is easy to move further away from the raw data.
> As a little background I typically work on biomedical imaging data (optical
> coherence tomography and very high resolution images of the human retina).
> In my own work I am often caught by two traps. The first is garbage in
> garbage out. I often lack suitable metrics of quality and when poor quality
> data is only processed in .CSV format this lack of quality becomes
> invisible. The second trap relates to the unknown nature of disease induced
> changes. Often the most interesting changes are only observed under careful
> examination of images. While these specific examples relate rro imaging
> data, I'm sure the problems are not limited to this modality.
> My approach to addressing these issues is constant visualisation of data,
> something made easier by R and knitr and where possible the development and
> use of quality metrics.
> My question and hope is that other people have addressed these issues. If
> you have any thoughts or suggestions I'd love to hear them.
> 
> Thx.

> _______________________________________________
> Discuss mailing list
> [email protected]
> http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org


-- 
 +- Jan T. Kim -------------------------------------------------------+
 |             email: [email protected]                                |
 |             WWW:   http://www.jtkim.dreamhosters.com/              |
 *-----=<  hierarchical systems are for files, not for humans  >=-----*

_______________________________________________
Discuss mailing list
[email protected]
http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org

Re: [Discuss] Data disassociation (or keeping it real)

Reply via email to