FW: RE: PPIG discuss: Similarity and ca tegorization

Hornsby Peter Thu, 18 Oct 2001 05:17:11 -0700

-----Original Message-----
From: Hornsby Peter 
Sent: Thursday, October 18, 2001 1:41 PM
To: 'Derek M Jones'
Subject: THIS MAIL IS UNCLASSIFIED RE: PPIG discuss: Similarity and
categorization


Derek,

> >I'm not convinced that formal analysis is yet sufficiently powerful,
> >given the range of variables to be addressed.  I would suggest that
> >certain types of software, particularly that in narrowly 
> defined domains
> >(e.g. mathematical modelling) may be more suitable for such
> 
> I am happy to solve what appears to be a small problem at the moment.
> In fact it is a large problem and getting worse.
> 
> A typical development platform can have over 20,000 identifiers
> made visible by libraries of one sort or another.  A typical developer
> could easily have to use 500 different external library 
> identifier (functions,
> macros, structures, etc) during a years development.  The 
> following year
> yet more could be added to those required, and some never used again.
> There is a constant turnover and developers are on a never 
> ending learning
> cycle.
>
> If I am given the job of writing the library for, say, the 
> widget and tegdiw
> interface I might be tempted to create a third module that contained
> functionality common to both.  Now which identifiers go in the widget
> module, which in tegdiw and which in common?
>
> Since these modules will be used by thousands of developers I want
> to make the interface intuitive (whatever that might be) and put the
> function foo in the module where developers expect to find 
> it.  I want to
> organise my modules so that category membership is easy to learn.
> Developers will like my library (because it is easy to use), they
> recommend it to their friends, who buy a copy and recommend it to
> their friends, ... and I get to buy a Greek island.

It's a nice idea, and a worthy goal (though I would tend towards a
castle somewhere remote).  A major problem is that when given a problem
to address, developers learn about the nature of the problem in the
process of solving it.  This means that the solution (i.e. the guiding
force behind the developer using the retrieval system to identify
reusable modules) will be different at different times, depending on the
developers state of understanding.  You say that you want to put
function foo in the module where the developer expects to find it; but
this will be determined by the context within which the developer is
working.  Fundamentally, people are different: different developers will
design an identically-specified system in different ways, depending on
factors such as:

- the importance they believe each requirement has
- their experience in the desired technologies
- their desire to incorporate the latest all-singing, all-dancing
technology
- the paradigm within which they are working
- etc. 

This in turn will affect their perceptions of a system and the way in
which they use a retrieval system to access a particular module. 

> >is not just a problem of finding a good classification 
> scheme; it takes
> >time to classify software, to maintain a software library, 
> to understand
> >the classification scheme in sufficient detail to retrieve software
> >effectively.
> 
> Changing a classification scheme once lots of people have coded
> its dependencies into their program is very expensive.
>
> >I agree with the aims; however I do not believe they can be achieved
> >easily.  I don't know Estes' work; but from what you are saying, the
> >classification is being performed against the relatively 
> small number of
> >abstraction possibilities available with shapes.
> 
> The small number is not really an issue.  The problems start to
> occur if they are not independent of each other, and the messy task of
> assigning a similarity measure to each attribute.

Agreed; but the problem with software systems of any size is that
requirements are generally connected to one another.  Actually the
problem is worse than even this suggests, because it implies that
requirements are well understood, which is not always (if ever) the
case.  Attempts have been made to use formal techniques to address this
problem, but it seems to be the nature of humans to deal with things in
fairly general terms, particularly when processing information.  

> >My point about the relationship between classification schemes and
> >abstraction earlier was that the massive amount of 
> information contained
> >within a given piece of software invites multiple classification
> >schemes.
> 
> Software that already exists is a different problem.  It 
> exists, rewriting would
> be very expensive and people are have no sensible choice but 
> to go with what
> they are given.

Incidentally, the paper that studied the classification of code by
novices and experts MAY have been one of:

Corritore and Wiedenbeck, 1999, Mental representation of expert
procedural and OO programmers in a software maintenance task.  In:
international journal of human-computer studies.  

Davies, Gilmore and Green, 1996, "Are objects that important?  Effects
of expertise and familiarity on classification of OO code.  In: HCI, 10,
pp. 227-248.  

I don't have this last paper to hand, but I'm sure Thomas will be able
to let us know who the subjects were.  I would also suggest that 

Dvorak and Moher, 1991, "A feasibility study of early class hierarchy
construction in OO development", in ESP 4th Workshop may be useful.  

Pete


-- 
The Information contained in this E-Mail and any subsequent correspondence
is private and is intended solely for the intended recipient(s).
For those other than the recipient any disclosure, copying, distribution, 
or any action taken or omitted to be taken in reliance on such information is
prohibited and may be unlawful.

- Automatic footer for [EMAIL PROTECTED] ----------------------------------
To unsubscribe from this list, mail [EMAIL PROTECTED]  unsubscribe discuss
To join the announcements list, mail [EMAIL PROTECTED] subscribe announce
To receive a help file, mail [EMAIL PROTECTED]         help
This list is archived at http://www.mail-archive.com/discuss%40ppig.org/
If you have any problems or questions, please mail [EMAIL PROTECTED]
FW: RE: PPIG discuss: Similarity and ca tegorization

Reply via email to