Hello everybody, My name is Alberto and I am pursuing a Ph.D. in software engineering. Currently we are developing a technique for recognizing structured content in natural language documents (for example emails). In practice, we can recognize and parse source code, stack traces, etc. that are embedded in mailing lists.
Having this information, we are trying to explore different ways in which we can exploit it. So, I ran our technique on argouml dev mailing list and I extracted some information. I would love to discuss this information with you, in order to know what you think about it, and whether you find this meaningful and useful. That would be very helpful for me, to guide my research, and hopefully interesting to you, too. The first finding is this: - of all the classes in Argouml source code (ca. 2000), in the mailing list I found information only about 44% of them (ca. 800). This is not surprising, but _why_ does this happen? To try to understand this I run another experiment: I only consider classes developed (committed) by at least N developers, and I see how many of these are discussed in the mailing list. These are the results: devs classes mentioned_in_emails 2 1818 819 3 1548 781 4 1252 684 5 956 565 6 688 478 7 559 417 8 439 353 9 339 289 10 239 212 We see that, for example, there are 688 classes developed by at least 6 developers, and 478 of those are mentioned in the mailing list. I stopped at 10 developers, but the trend is that the more people worked on a class, the more this class is probably mentioned in the mailing list. We had two hypothesis about this observation: (1) the code is highly modularized, so only entities exposing behavior to other modules are discussed in the mailing list, (2) entities which are under the responsibility of a single or few developers are less likely to be discussed with the whole community. Do you think these hypothesis can be confirmed? If so, I would have a couple of questions about it, then :) Is it appropriate that some parts of the system, being under the responsibility of a few developers, are not discussed with the community? What would be the impact in case such developers leave the project? Our technique is able to reveal that a relevant part of the system is never discussed on the official channel. Should they be discussed more, or documented externally? Do you find this information useful? If you find this interesting I can go on showing you other findings, but I don't want to bother you if you think this is not relevant for you :) Thank you in advance! Cheers, Alberto
