Hello again; As it happens, I have been unable to meet my goal of delivering the completed essay this weekend. This was a result of classic scheduling errors - the time-vacuum and the job underestimation. Instead of the complete essay, I have instead included those sections which *are* ready for critique and review. Please note that these are not the *meat* of the essay. They are merely the qualititative framework from which the quantitative will hang. I will strive to deliver the rest of the essay this week. As my programming skills are poor to minimal, I cannot predict when I will work out the means by which the needed data will be coaxed into a usable form. I am currently punting on the "sleep on the problem" approach to some currently intractable problems. Of course, they are of the "one step forward, two steps back" variety. In any case, enjoy and review. Thankyou all again; JC.Title: Does Free Software Production in a Bazaar Obey the Law of Diminishing Returns?
Abstract
Free Software is defined. Brook's Law is introduced as a limiting factor in software production. A parallel is drawn between Brook's Law and the Law of Diminishing Returns. Eric S. Raymond's "Bazaar" model is introduced as a possible exception to the rule. Statistics gathered from an existing Free Software project are used to demonstrate the nonadherence of "Bazaars" to the law of diminishing returns. A possible flaw in Raymond's hypothesis is identified, and an economic alternative is offered.Contents
Those items with an asterisk next to them are included in this copy of the essay.
- Abstract *
- Contents *
- Introduction *
- Free Software Defined *
- Brook's Law and The Law of Diminishing Returns *
- The "Bazaar" Introduced *
- The GNOME Project *
- Examination and Discussion of GNOME Data
- Conclusion
- Endnotes *
- Bibliography
- Appendices
- Acknowledgements and Thanks
- The Opensource Definition
- Caveats
- Economic Principles in Free Software
Introduction
This essay seeks to examine the rapidly emerging production method of "Free Software" or "Opensource Software". It outlines analogues between Economic theory and Software Engineering in order to bring economic analysis to bear on the area.
Specifically, it aims to provide a quantitative analysis of what has until now been primarily examined in a qualititative way. Free Software has existed in one form or another since the very early days of computing, but very little attention has been paid to it until recently. Many Free Software projects have achieved significant or even dominant positions in their marketplace, and more firms are starting to utilise or release Free Software.
Free Software Defined
Definitions of Free Software (also know as "OSS", for Open Source Software) are typically framed in two perspectives: Ideological and Legal.The two major ideological principles underlying Free Software are the protection of user/programmer choice, and the belief that the best solutions must be shared.
The first principle arose from Richard M. Stallman's dismay at the rise of proprietary software as the dominant format. Stallman believes that proprietary (also known as closed-source) software is a violation of the individual's right to choose other packages. He argues that access to the sourcecode grants freedoms to modify and augment without being "locked in" to one company's whims. Further, he argues that sourcecode access gives users a choice to go their own way, in defiance of the company's wishes should they be detrimental.
The second principle, that good solutions should be shared, arises from the so-called "Hacker Culture". Within this culture, brainpower is seen as a limited resource, which should not be wasted on unnecessarily reinventing the wheel. It is reasonable, therefore, that all solutions (embodied in sourcecode) should be available for anyone to use. The corrollary is that withholding solutions (or source) is effectively evil, inasmuch as it is wasteful of resources.
The legal foundations of Free Software stem from a careful blend of Contract and Copyright laws. Free Software Licenses use the principles of Contract law to create their terms. Typically, these ensure that an author's version of the source is always available and that any modification made by anyone is likewise available under the same terms. Some licenses go so far as to impose these terms onto software where opensource code has been added as an external source.
If the user doesn't agree to the License, the law of Contract renders the terms of the License effectively powerless without punitive terms. However, at this point, standard Copyright laws take effect, and the user is granted no rights whatsoever. There is significant coercion, then, to accept the terms of the License on a contractual basis.
To place Free Software in an economic framework is considerably more difficult, but quite profitable. There are a range of issues and outcomes that emerge naturally from the application of elementary economic thought to a Free Software "economy". The main body of this essay assumes that just such a framework has been established. However, the description of such a framework is long and outside the scope of the main body of this essay, soh a short treatise on the elementary economic framework of Free Software can be found in Appendix IV.
Brook's Law and The Law of Diminishing Returns
According to Fred Brook's law, adding people to a late project makes it later. It's like adding gas to a fire. New people need time to familiarise ... their training takes up the time of ... [other] people ... and merely increasing the number of people increases the complexity and amount of project communication. Brooks points out that the fact that a woman can have a baby in nine months does not imply that nine women can have a baby in one month.Managers need to understand ... More workers working doesn't necessarily mean more work will get done.
--Steve McConnell, "Code Complete"
In his essay "The Magic Cauldron", Eric S. Raymond estimates that almost 95% of software development is "in-house". This is the traditional meal-ticket of programmers and software engineers. It is from this heartland that Brook's Law is drawn.
Brook's Law - specifically - is that adding people to a late project willy-nilly will only make it later. Brooks derived this law from his own personal experience as a project manager on IBM's original OS/360 project. In his book, The Mythical Man-Month, Brooks pointed out the fallacy of simply throwing more "man-hours" (labour units) at the project in order to deliver it earlier.
According to McConnell, Brook's analysis of his own laws suggests an exception to the rule. " ... if a project's tasks are partitionable, you can divide them further and assign them to ... people who are added late to the project."
In short, we can summarise Brook's Law in two parts:
- If, in a (late) project, we can further subdivide tasks efficiently, we can add extra programmers or software engineers without penalty.
- Otherwise, if tasks cannot be easily or effeciently subdivided, there will be a penalty for adding extra programmers or software engineers.
Brooks gave several justifications for his law, outlined by McConnell above. One of the easier to seize upon, in economic and mathematical terms, is the complexity problems. It is argued that programming requires a large amount of communications between workers. It can be shown mathematically that if the number of programmers rises linearly, the number of possible communications paths between them rises quadratically. This is illustrated by the diagram below.
{DIAGRAM ILLUSTRATING QUADRATIC COMMS PATHS}
But Brook's Law is not original. In fact, the first famous instances of the principles that Brooks expounded are not found in software engineering - they are found in a rice paddy.
In the classic example of the Law of Diminishing Returns, Adam Smith asked us to imagine a rice paddy. We might start with one worker on this paddy, who is barely able to care for and harvest even a fraction of the paddy. In comparison to neighbouring paddies, this paddy is woefully inefficient compared to its neighbours.
So another worker is added. Productivity rises sharply, as two workers can now work the field. We can measure this rise in productivity in terms of the total output and the change in the output - the marginal output.
We continue to add workers. At first, they replicate each other's work, all, say, take a quadrant of the paddy and work it. Later, some will specialise. Some will care for the rice, some will harvest it. Productivity continues to rise.
But the trend is not endless. At a certain point, adding more workers no longer causes a rise in productivity. Perhaps these extra workers need to be trained by other workers. Perhaps they get in one another's way, or there are workers standing by, idle, as excess working capacity. In any case, the marginal rate of productivity begins to fall, followed by the average total output.
It is not difficult to draw parallels with software engineering. Indeed, if Adam Smith had been working today, he may have used a software project as his example!
Let us take a project with one programmer. This programmer has begun to code, but the project is large. He is unable to produce many lines of code on his own, having to continuously stop and refer to manuals for unfamiliar areas, even to remind himself of what part of the project he is dealing with.
Let us add another programmer. Suddenly, they can divide the work amongst them, working on two different parts of the program at once. Then we can add more programmers - including specialists. The program is divided and subdivided into smaller units of specific purpose, and the specialists can focus on these parts.
The subdivision of units and the matching of specialists with program components means that the productivity rises.
But again, we come to a certain point where it begins to falter. Programmers are added who need to be trained on the deep secrets of the existing work. They need to be introduced to procedures and their tools, diverting time for existing programmers. They add overhead to communications paths, and some may spend time fallow, dragging down the average total output once more.
In economic terminology, Brook's Law might be re-summarised thusly:
- If, in a production process, we can further subdivide tasks efficiently, we can add extra Labour units without penalty.
- Otherwise, if tasks cannot be easily or effeciently subdivided, there will be a penalty for Labour units.
{LODR GRAPH}
This is an example of a typical Law of Diminishing Returns graph. The RED? line shows Average Total Output (ATO), and the BLUE? line shows Marginal Output (MO).
The real signature of diminishing returns is the marginal output line. The MO line in mathematical terms, is the derivative of the ATO line. It shows the rate of change of the ATO at any given "x-value", or units of labour.
The classic Law of Diminishing Returns MO is shown. It rises, peaks, and then crosses zero at the point where ATO peaks. It becomes negative, causing the ATO curve to nose over and dive.
Examples from where hard, actual data could be drawn are legion. In this case, it is generally applicable that this pattern will occur where the Law is true. Indeed, it is the recurrence of this pattern that is often used as proof of the Law's applicability.
{BROOK'S LAW GRAPH}
This is a graph based on Brook's Law. In fact, it is adapted from a graph in The Mythical Man-Month, the book in which Brook's Law is introduced.
The graph is rendered with "programmes" on the x-axis, and "statements" on the y-axis. Again, we can see the classic hill-shape of the Law of Diminishing Returns. And, more importantly, we see the classic Marginal Output derivative repeat its rise and fall.
So it becomes reasonable to assert that the Law of Diminishing Returns and Brook's Law are roughly equivalent. The terminology between them fluctuates, but the meaning and the consequent graphs are highly similar. And, just as the Law of Diminishing Returns has been demonstrated to appear over and over, so has Brook's Law. It is not a case of a coincidental match of graphs.
For the rest of this essay, Brook's Law and the Law of Diminishing Returns will be assumed to be functionally equivalent. This being so, it becomes viable to apply certain basic economic analyses to Brook's Law. In particular, we investigate the bold claim that Brook's Law can be broken.
The Bazaar Introduced
In his influential paper, "The Cathedral and the Bazaar", Eric S. Raymond proposes a theory about how Brook's Law can be overcome. He describes a model of development which he calls the "Bazaar".The Bazaar concept is somewhat multi-faceted. Some key elements of Bazaar projects include:
- Internet or Internet-like ease of communications
- Open-source licensing environment
- Easily-accessible sourcecode
- Easily contactable author/maintainer
- Transparent processes and decision-making
In essence, Raymond argues that when Bazaar conditions exist, the opposite of Brook's Law becomes true: more programmers mean higher productivity. In doing so, he proposes several reasons why this might be so. These arise, almost naturally, from a combination of what Raymond considers the 'Hacker mindset'[4], ease of communication, and the open availability of sourcecode.
- Low Management Overhead: Programmers are able to modify sourcecode freely. They can add new features and fix bugs without being directed to do so, and without the necessity of broad and deep coordination.
- Competition between Solutions: As problems to solve become known in an open-source project, several competing solutions may emerge from multiple sources. The best solution can then be chosen and integrated. If there is disagreement, parties can split off their version of the solution by "forking the tree", in which case, both solutions are implemented.
- Parallel Handling of Vertical Problems: A vertical problem is one which is standalone (it stands up without support: vertically). Because these problems can be solved in isolation (the best example being debugging), the amount that can be solved scales easily with additional workers. An open-source environment has no practicable physical limits on how many programmers can participate, which means they excel in solving vertical problems in parallel.
Raymond asserts that, for these reasons, open-source projects can break Brook's Law. In particular, he points to the parallel nature of open-source development, going so far as to say "Given enough eyeballs, all bugs are shallow", dubbing this "Linus' Law", in honour of the Linux system.
Previously, we sought to establish a link between Brook's Law and the Law of Diminishing Returns. The outcome was that Brook's Law is analogous to the Law of Diminishing Returns, but having been derived as a single case from one field of human endeavour, rather than as a general law. The two were shown to be equivalent in describing a process.
By implication, then, Raymond has asserted that the Law of Diminishing Returns can be broken in a Bazaar environment.
The GNOME Project
In order to assess Raymond's claim, this essay uses a high-profile Bazaar-style project. The GNOME project (as with many newer Free Software projects) explicitly adopted Raymond's theory of the Bazaar as its working principles[5].The GNOME has properties which make it an ideal source of data:
- It is a Bazaar project
- There is a wide range of sub-projects, from the small to the large
- Several hundred Developers and Debuggers are involved, making the data-set large enough to be useful
- Activity logs available at all times through a public access CVS[6] system
It is also the last property - the extensive use of CVS - that renders GNOME a useful source of data. The CVS system automatically keeps extensive logs of all programmer activity. For the GNOME project, all of this data is publicly available. It is these logs that form the tables on which this essay is based.
GNOME also happens to be a quite large project. It is not the largest Free Software project (the largest probably being the Linux operating system), but it is one of the largest with public-access CVS logs. It is its size which allows for the construction of smooth curves.
Endnotes
- Sourcecode is another form of software. It is from sourcecode that the
harder-to-modify "executable" is derived.
Sourcecode is the 'recipe' for a task a computer might operate. It tells a computer how the data it is using is structured, how to access it, and what to do with it. This can be expressed in a number of artificial "programming languages". If one has access to the sourcecode, it is possible to intimately understand how a program works. It also becomes possible to expand or modify its functionality; or to reuse pre-existing sourcecode in new programs.
- In more detailed terms, we might define the factors thus:
- Land: The basic resources which are used as 'ingredients' in the production process. In computer science, these might be algorithms, or previously existing sourcecode.
- Labour: The people who do the majority of the human work, as directed by the Enterprise. In Free Software, they are called "Developers" and "Debuggers"
- Capital: Essentially, "things that help make things". Capital typically constitutes machinery. In Free Software, these are the Networks, Hardware and Software tools used in the writing and support of the sourcecode.
- Enterprise: The factor which brings the other three together to create the final product. In Free Software, the line between Labour and Enterprise is blurry.
- Meaning "Lines Of Code". Probably the most-used and most popular metric in software engineering. It refers to a single line of instructions for the computer that is not blank or a comment intended for human use. Because, typically, only one instruction is placed per line, LOC provides a usable measurement of both the gross size and general complexity of a given project.
- "Hacker" is a term of considerable stigma in mainstream
society, usually taken to mean a person who maliciously
trespasses on other people's computer systems.
Raymond's use of the term is, in fact, the original computer-world meaning. He takes a hacker to be a person who delights in problem solving (especially in programming), and who believes that once a problem is solved, the solution ought to be shared.
The more common use - intruder - is delineated by traditional hackers with the word "crackers". Hackers take great lengths to distance themselves from crackers and their activities.
- Ironically, Miguel de Icaaza, in private conversation with the essay's author, said that he did not himself believe that it was a case of "The Cathedral versus the Bazaar". He saw them as two extremes between which there are many comprimises - two ends of the same stick.
- Concurrent Versioning System. CVS is a change-management system. It provides
a central point of work for multiple programmers working on the same
sourcecode. Programmers can 'check out' code, work on it on their own
computer, and then 'merge' their changes back into the original.
Apart from its usefulness in centralising code storage and management, CVS provides change-tracking capabilities. It keeps 'delta-files', which list every change made to any file, at any time, by any programmer. This is normally used as a sort of super-powered "undo" function.

