PPIG discuss: Re: Empirical Tools

Andrew Walenstein Tue, 11 Jul 2000 06:57:35 +0100 (BST)

> Anyone know of software comprehension experiments which used
> 'programmers' overview descriptions of the system' as a measure of
> comprehension.[..]
> Jim Buckley.

Hi Jim,

First an answer to your question, then followup discussion.

If I understand you correctly, you want to measure a
person's high-level or global understanding of a software
system by "reading" their minds for that knowledge.
The experimental instrument implied is a written statement of
the subjects' knowledge that was developed after experiencing
some experimental condition.  Did I understand you OK?

I can't think of any specific papers using the above described
method, but Ric Holt has done work in visualizing high-level
system structure, and I believe he has gone around real work
sites asking system experts for drawings of their
understanding of system structures, and also has presented
experts with generated diagrams and asked them for their
opinions on how well it represents the system.  Tim Letbridge
and Janice Singer also reported instances of asking programmers
to map out their understanding (WESS-98:Singer-Letbridge).
It seems to me that presenting subjects with pre-drawn system
diagrams as stimulus structures might be a better way of
approaching this type of question if you want to make comparisons
between subjects.  While you're at it, you should also look
at Gail Murphy's et. al.'s work on the Reflexion Model tool.
She sees people writing out high-level program structure
models and then refining them as they gain a better
understanding of the system.  That might give you some
ideas.  Hope someone else can help out....

Now for the discussion. 

Although that might be very interesting to see what subjects
output themselves, I can think of at least three important
questions that prospect raises:

 1.  Reliability and reconstruction.  It is not possible for
     anyone to "dump" their internal representations directly
     onto paper or screen.  The externalization process itself
     frequently changes a person's understanding of the subject.
     What you'd be measuring would more than likely be a
     post-experiment reconstruction.  That's why protocol analysis
     uses !concurrent! verbalization (see Ericsson-Simon); its
     also why accurate timings are taken to question answering
     so that priming effects can be utilized in order to
     reconstruct internal representations (see
     CogPsy-19:Pennington, or INTERACT-95:Green-Navarro, for
     example).
     
     You might also try hunting in the knowledge engineering
     literature (for the difficulty of knowledge elicitation
     and how to compare expert knowledge bases), or possibly
     in the constructivist education literature (for active
     learning and how to compare things like concept maps
     that express a subject's understanding of some material).
     One danger with certain experimental methods is that
     it may be difficult to separate the question
     "what do they know?" from "and how is it represented?"

 2.  Open-endedness.  The task you'd be asking the subjects to
     perform (externalizing current understanding) is
     more than likely very open ended for any but the most
     trivial programs.  It is quite different than the
     verbatim reconstruction tasks asked before (e.g. see Boehm
     Davis review in Handbook of HCI).  The variability in the
     output is likely to overwhelm anything but very informal
     analysis.

 3.  What is a high-level understanding?  Is it structural?
     Functional?  Domain?  Design decisions?  Some combination? 
     Is it expressed in patterns, architectures, or >gasp< 
     global goal-plan hierarchies (good luck finding subjects
     that know how to write these out...)?

     Also, are you interested in "meta-knowledge" such as survey
     knowledge?  Programmers may not be able to recall some
     particular bit of information, but might be able to rapidly
     find it by using survey knowledge, perhaps retrieving it
     from episodic memory (CHI-95:Altmann-Larkin-John). 
     Recall-based experiments might miss this fact if not
     designed to take it into account.  Note that survey
     knowledge is likely to be quite important if the subjects
     take a "just in time comprehension" approach to maintaining
     an understanding of their code (WESS-98:Singer-Lethbridge).

One other small note.  It might help you that there are a
reasonable number of case studies and experiments in reverse
engineering.  The goals of the subjects in such studies is
typically to reconstruct high-level documentation from existing
code.  These might generate more naturalistic data sets wherein 
the user-generated goals are to understand software at
a high-level and also produce representations of that
understanding.  Of course, these studies make also it difficult
to answer questions, for example, about internal representations.

Andrew
PPIG discuss: Re: Empirical Tools

Reply via email to