Hello everybody,

 My name is Alberto and I am pursuing a Ph.D. in software engineering. 
Currently we are developing a technique for recognizing structured content in 
natural language documents (for example emails). In practice, we can recognize 
and parse source code, stack traces, etc. that are embedded in mailing lists.

Having this information, we are trying to explore different ways in which we 
can exploit it. So, I ran our technique on argouml dev mailing list and I 
extracted some information. I would love to discuss this information with you, 
in order to know what you think about it, and whether you find this meaningful 
and useful. That would be very helpful for me, to guide my research, and 
hopefully interesting to you, too.

The first finding is this:
- of all the classes in Argouml source code (ca. 2000), in the mailing list I 
found information only about 44% of them (ca. 800). This is not surprising, but 
_why_ does this happen? To try to understand this I run another experiment: I 
only consider classes developed (committed) by at least N developers, and I see 
how many of these are discussed in the mailing list. These are the results:

devs classes mentioned_in_emails
2 1818 819
3 1548 781
4 1252 684
5 956 565
6 688 478
7 559 417
8 439 353
9 339 289
10 239 212

We see that, for example, there are 688 classes developed by at least 6 
developers, and 478 of those are mentioned in the mailing list. I stopped at 10 
developers, but the trend is that the more people worked on a class, the more 
this class is probably mentioned in the mailing list.
We had two hypothesis about this observation:
(1) the code is highly modularized, so only entities exposing behavior to other 
modules are discussed in the mailing list,
(2) entities which are under the responsibility of a single or few developers 
are less likely to be discussed with the whole community.
Do you think these hypothesis can be confirmed? If so, I would have a couple of 
questions about it, then :)

Is it appropriate that some parts of the system, being under the responsibility 
of a few developers, are not discussed with the community? What would be the 
impact in case such developers leave the project? Our technique is able to 
reveal that a relevant part of the system is never discussed on the official 
channel. Should they be discussed more, or documented externally? Do you find 
this information useful?

If you find this interesting I can go on showing you other findings, but I 
don't want to bother you if you think this is not relevant for you :)

Thank you in advance!

Cheers,
 Alberto


Reply via email to