Nico Klasens wrote:To get this thread going, I have put together a hacker's guide to mmbase with a lot of help of some other excellent guides on the net.
What format is it in? Can you submit it to the list?
I opened it with a generic text editor. It's just text :-)
---Andr�
If you are contributing code to the MMBase project, please read this first.
============================
HACKER'S GUIDE TO MMBASE
============================TABLE OF CONTENTS
* Participating in the community * What to read * Directory layout * Coding style * Document everything * Error message conventions * Other conventions * Exception handling * Writing log messages * Patch submission guidelines * Filing bugs / issues * Commit access * Release numbering, compatibility, and deprecation * Stabilizing and maintaining releases
Participating in the community ==============================
MMBase is originally build by the VPRO, a dutch broadcasting corporation
(http://www.vpro.nl), and has been opensourced in april 2000. MMbase is a
open-source project under a MPL-style license. A number of developers work for
broadcasting corporations in the Netherlands, some work for other large companies,
and many others are simply excellent volunteers who are interested in building a
better general purpose object oriented content management system.
The community exists mainly through mailing lists and a source repository. To participate:
Go to http://www.mmbase.org/ and
* Join the "developers", "cvs", and "announce" mailing lists. The developers
list, [email protected], is where almost all discussion takes
place. All questions should go there, though you might want to check the
list archives first. The "cvs" list receives automated commit emails.
* Read the Architectural Overview document. This document can be found in
the "Documentation" area of the website in the Documentation->general
section. The architecture will give you a theoretical overview of
MMBase's design.There are many ways to join the project, either by writing code, or by testing and/or helping to manage the bug database. If you'd like to contribute, then look at:
* The bugs/issues/wishes database
http://www.mmbase.org/bugsTo submit code, simply send your patches to [email protected] after you have read the rest of this file.
To help manage the issues database, read over the issue summaries, looking and
testing for issues that are either invalid, or are duplicates of other issues.
Both kinds are very common, the first because bugs often get unknowingly fixed
as side effects of other changes in the code, and the second because people
sometimes file an issue without noticing that it has already been reported. If
you are not sure about an issue, post a question to [EMAIL PROTECTED]
What to read ============
Before you can contribute code, you'll need to familiarize yourself with the existing code base and interfaces.
Check out a copy of MMBase (anonymously, so you can look at the code.
Within 'src/org/mmbase/bridge/' are a bunch of interfaces with doc comments. If you read through these, you'll have a pretty good understanding of the concepts of MMBase.
Directory layout ================
A rough guide to the source tree:
documentation/
User and Developer documentation.
config/
Default MMBase configuration files
html/
MMBase general web pages like editors and admin interface
src/
Source code to MMBase itself (as opposed to external libraries).
tests/
Automated test suite.
applications/
Stuff that works with MMBase, but that MMBase doesn't depend on.
applications/bugtracker
Sources of the bugs/issues database
applications/cloudcontext
Sources of the cloudcontext security implementation
applications/crontab
Sources of scheduling framework
applications/dove
Sources of xml communication interface
applications/editwizard
Sources of editors which can be defined by xml
applications/email
Sources of email application
applications/media
Sources of media framework
applications/packaging
Sources of packaging application
applications/rmmci
Sources of RMI communication
applications/taglib
Sources of taglibrary
applications/xmlimporter
Sources of xml import applicationCoding style ============
To understand how things work, read documentation/backenddevelopers/codingstandards
We're following the Sun coding standards for Java.
Read http://java.sun.com/docs/codeconv/ for a full description of the Sun coding standards; but here is a short example demonstrating the most important formatting guidelines:
/**
* Documentation of method
*
* @param state Description of parameter
* @return expected return values
*/
private static final int foo(int arg1, int arg2) { /* Use visibility modifier first */
try { /* Brace on same line, indent 4 */
if ((some_very_long_condition && arg2)
|| remaining_condition) { /* Generally use the 8-space rule */
arg1 = 10 + some_func (arg1, arg2); /* Use space around operators */
return arg1;
} else { /* Else on same line as close brace */
return 1;
} /* close brace on own line */
} catch (Exception e) { /* Catch on same line as close brace */
//documentation of code
} finally { /* Finally on same line as close brace */
}
return 3;
}
In general, be generous with parentheses even when you're sure about the operator precedence, and be willing to add spaces and newlines to avoid "code crunch". Don't worry too much about vertical density; it's more important to make code readable than to fit that extra line on the screen.
We understamd that formatting is a matter of taste and that many developers
prefer other formmatting styles, but try to stick to the guidelines. We are using
them to keep the code readable. Don't take it personally when someone points you
to the coding standards when you submit a some code. We will accept your code
anyway.
Input validation
The first lines of a method are usually devoted to checking the validity of all arguments. The idea is to fail as quickly as possible in the event of an error. This is particularly important for constructors.
For non-private methods, an Exception is thrown if a test on an argument fails. This is often IllegalArgumentException or NullPointerException. (These are RuntimeExceptions. Checked exceptions may also be thrown Document these exceptions in the @throws clause of the method's javadoc, since they clearly state the method's requirements to the caller (the pre-conditions).
If every object parameter of every method in a class needs to be non-null in
order to avoid throwing NullPointerException, then it is acceptable to state
this once in the general class javadoc, instead of repeating it for each method.
String operations
To build Strings dynamically, one may use either the String concatenation operator + or the StringBuffer class. In the great majority of cases, only a few items are concatenated, and either style may be used freely, according to taste, without concern for performance.
On relatively rare occasions, however, when performing extensive String manipulation, replacing + with StringBuffer.append is recommended. This is because the + operator does not scale well to large numbers of operations, and is much slower than StringBuffer.append under such circumstances.
Cases in which + is very likely acceptable :
* if concatenating only a small number of items together, then the
difference in relative performance is very small, and possibly
not even measurable (the great majority of use cases of the concatenation
operator fall into this category).
* in branches of code which represent a failure in the system - for example,
a lost database connection, or an invalid method parameter. Since these
branches are very rarely exercised, the speed at which the system fails
is usually not important
Portability
Portability is one of the principal advantages of using Java. Guidelines for
ensuring that a Java application remains portable are:
* do not use hard coded file names and paths - use the File class, and
allow file paths to be configured during deployment.
* remember that some systems have case sensitive file names (Unix), while
others do not (Windows)
* do not use hard coded constants
Use the System.getProperty method to refer to items which depend on the
system, such as line terminators and path separators.
* be very wary of Runtime.exec and Method.invoke
* do not rely on thread scheduling to define program logic
* do not use native methodsMinimize ripple effects
Much of object programming is centered on minimizing the ripple effects caused by
changes to a program. This is done simply by keeping details secret (information
hiding or encapsulation).
The principal ways of doing this are
* indirection - named constants replacing "magic numbers", for example
* minimizing visibility - private fields, package-private classes, for example
* generic references (polymorphism) - using high level references (interfaces
or abstract classes) instead of low level references (concrete classes)
All of these techniques accomplish the same thing - they confine knowledge of implementation details to the smallest possible part of a program. That is, they keep a secret of some sort.
Constant and liberal use of the above techniques is recommended. For example :
* fields are almost always private
* Collections, IO streams usually allow use of interface references
@@TODO@@ Add more code standards
Document everything ===================
Every method, whether public or private, must start out with a documentation comment that describes what the method does. The documentation should mention every parameter received by the method, every possible return value, and (if not obvious) the conditions under which the method could return an error. Put the parameter names in the same case in the doc string as in the method signature,
Read over the MMBase code to get an overview of how documentation looks in practice; in particular, see src/org/mmbase/bridge/* for examples.
If you are one of those developers who has a personal issue with documentation then at least document the public methods of your code. Public methods can be used by everyone and should have a clear contract. One exception to this rule are getter and setter methods.
Error message conventions =========================
For error messages the following conventions apply:
* Provide specific error messages with all the relevant information
(variables, exceptions, etc.)
* Messages start with a capital letter.
* Try keeping messages below 70 characters.
* Don't end the error message with a '.'.
* Don't use wildcard characters (* ? \) inside the error string. A search in
the logs for text containing these characters are always difficult.
* Don't include newline characters in error messages.
* Quoting information is done using single quotes ('some info').
* Don't include the name of the method where the error occurs in the error
message. Log systems will provide this information by itself.
* When including path or filenames in the error string, be sure to quote
them. (i.e. "Can't find '/path/to/repos/userfile'")
* Suggestions or other additions can be added after a semi-colon, like this:
"Can't write to 'file': object of same name already exists; remove "
"before retrying"
* Try to stay within the boundaries of these conventions, so please avoid
separating different parts of error messages by other separators such
as '--' and others.
Other conventions =================
In addition to the Sun standards, MMBase uses these conventions:
* Use only spaces for indenting code, never tabs. Tab display width is not
standardized enough, and anyway it's easier to manually adjust indentation
that uses spaces.
* @@TODO@@ Add some more conventions which require special attention (CVS keywords?)
* There are many other unspoken conventions maintained throughout
the code, that are only noticed when someone unintentionally
fails to follow them. Just try to have a sensitive eye for the
way things are done, and when in doubt, ask.
Exception handling ==================
Exceptions in java are used to communicate errors between a callee and a caller.
They aren't supposed to be used for control flow (throw and catch in the same
method). Exception declarations are usually found on the boundaries of an API.
A section on writing error messages can be found elsewhere in this document under the title 'Error message conventions'.
Here are some of the guidelines for exception handling in MMBase.
1 In the throws clause of a method header, be as specific as possible. Do not
group together related exceptions in a generic exception class that would
represent a loss of possibly important information.2 Do not catch or throw Throwable, Error, RuntimeException or Exception.
Narrow types are more informational. Use for example:
IllegalArgumentException or UnsupportedOperation3 Never throw the error upwards, unmodified:
try {
......
} catch(Exception e) {
throw e;
}
This code only pollutes the code base and adds nothing.4. If you *receive* an error, you have two choices:
a) Handle the error yourself. Use either your own code, or just log the
error. Never leave a catch clause empty, log the error on debug or trace.
b) Throw the error upwards, wrapping it in a new exception and including
the error as a "child" argument:
Writing log messages ====================
Certain guidelines should be adhered to when writing log messages for code changes:
Make a log message for every change. The value of the log becomes much less if developers cannot rely on its completeness. Even if you've only changed comments, write a log that says "Doc fix." or something.
Start off the log message with one line indicating the general nature of the change. This helps put developers in the right frame of mind for reading the rest of the log message.
Use full sentences, not sentence fragments. Fragments are more often ambiguous,
and it takes only a few more seconds to write out what you mean. Fragments like
"Doc fix", "New file", or "New mewthod" are acceptable because they are standard
idioms, and all further details should appear in the source code.
The log message should name every affected method, variable, etc, including the names of variables that are being removed in this commit. This helps people searching through the logs later. Don't hide names in wildcards, because the globbed portion may be what someone searches for later.
If your change is related to a specific issue in the issue tracker, then include
a string like "issue #N" in the log message. For example, if a patch resolves
issue 1729, then the log message might be:
Fix issue #1729: Don't crash because of a missing file.
For large changes or change groups, group the log entry into paragraphs separated
by blank lines. Each paragraph should be a set of changes that accomplishes a
single goal, and each group should start with a sentence or two summarizing the
change. Truly independent changes should be made in separate commits, of course.
One should never need the log entries to understand the current code. If you find
yourself writing a significant explanation in the log, you should consider
carefully whether your text doesn't actually belong in a comment, alongside the
code it explains.
There are some common-sense exceptions to the need to name everything that was changed:
* If you have made a change which requires trivial changes throughout the
rest of the program (e.g., renaming a variable), you needn't name all the
functions affected, you can just say "All callers changed".
* If you have rewritten a file completely, the reader understands that
everything in it has changed, so your log entry may simply give the file
name, and say "Rewritten".
* If your change was to multiple files, provide a brief summary of the change
in the log message
In general, there is a tension between making entries easy to find by searching for identifiers, and wasting time or producing unreadable entries by being exhaustive. Use your best judgment --- and be considerate of your fellow developers. (Also, run "cvs log" to see how others have been writing their log entries.)
Log messages for documentation have somewhat looser guidelines. The requirement to name every symbol obviously does not apply, and if the change is just one more increment in a continuous process it's not even necessary to name every file. Just briefly summarize the change, When you finish a distinct stage of work, note it in the log message. Please write your log messages in English, so everybody involved in the project can understand the changes you made.
Patch submission guidelines ===========================
Mail patches to [EMAIL PROTECTED]', with a subject line that contains
the word "PATCH" in all uppercase, for example
Subject: [PATCH] fix for validation bug in editors
A patch submission should contain one logical change; please don't mix N unrelated changes in one submission -- send N separate emails instead.
The email message should start off with a log message, as described in "Writing log messages" above. The patch itself should be in unified diff format (e.g., with "cvs diff"). Send the patch as an attachment, with a mime-type of text/x-diff, text/x-patch, or text/plain. (Most people's mailreaders can display those inline, and having the patch as an attachment allows them to extract the patch from the message conveniently.)
If you can't attach the patch with one of these mime-types, or if the patch is very short, then just include it directly in the body of your message. But if your mailer forces wrapping of long lines, then you must attach the patch, otherwise it is likely to get munged. Never send patches in archived or compressed form (e.g., tar, gzip, zip, bzip2), because that prevents people from reviewing the patch directly in their mailreaders.
If the patch implements a new feature, make sure to describe the feature completely in your mail; if the patch fixes a bug, describe the bug in detail and give a reproduction recipe. An exception to these guidelines is when the patch addresses a specific issue in the issues database -- in that case, just make sure to refer to the issue number in your log message, as described in "Writing log messages".
It is normal for patches to undergo several rounds of feedback and change before
being applied. Don't be discouraged if your patch is not accepted immediately
-- it doesn't mean you goofed, it just means that there are a *lot* of eyes
looking at every code submission, and it's a rare patch that doesn't have at
least a little room for improvement. After reading people's responses to your
patch, make the appropriate changes and resubmit, wait for the next round of
feedback, and lather, rinse, repeat, until some committer applies it.
If you don't get a response for a while, and don't see the patch applied, it may
just mean that people are really busy. Go ahead and repost, and don't hesitate
to point out that you're still waiting for a response. One way to think of it
is that patch management is highly parallizable, and we need you to shoulder
your share of the management as well as the coding. Every patch needs someone
to shepherd it through the process, and the person best qualified to do that is
the original submitter.
The "Patch Manager" Role: -------------------------
MMBase usually has a Patch Manager, whose job is to watch the developers mailing list and make sure that no patches "slip through the cracks".
This means watching every thread containing "[PATCH]" mails, and taking
appropriate action based on the progress of the thread. If the thread resolves
on its own (because the patch gets committed, or because there is consensus that
the patch doesn't need to be applied, or whatever) then no further action need
be taken. But if the thread fades out without any clear decision, then the patch
needs to be saved in the issue tracker. This means that a summary of any
discussion threads around that patch, and links to relevant mailing list
archives, will be added to some issue in the tracker. For a patch which
addresses an existing issue tracker item, the patch is saved to that item.
Otherwise, a new issue of type 'PATCH' is filed, and the patch is saved to that
new issue.
The Patch Manager needs a basic technical understanding of MMBase, and the ability to skim a thread and get a rough understanding of whether consensus has been reached, and if so, of what kind. It does *not* require actual MMBase development experience or commit access. Expertise in using one's mail reading software is optional, but recommended :-).
@@TODO@@ The current patch manager is ??????. @@TODO@@ There is not yet an "PATCH" type in the bugtracker
Filing bugs / issues ====================
If you encounter a situation where MMBase is clearly behaving wrongly, or behaving opposite to what the documentation says, then it's okay to file the issue right away (after searching to make sure it isn't already filed, of course!). But if you're
* Requesting a new feature, or * Having build problems, or * Not sure what the behavior should be, or * Disagreeing with current intended behavior, or * Not TOTALLY sure that others would agree this is a bug, or * For any reason at all not sure this should be filed,
...then please post to the dev list first. You'll get a faster response, and others won't be forced to use the issues database to have the initial real-time conversations.
Nothing is lost this way. If we eventually conclude that it should be in the issue tracker, then we can still file it later, after the description and reproduction recipe have been honed on the dev list.
*Please* be conservative about filing issues. The issues database is physically
much more cumbersome than email. It wastes people's time to have conversations
in the issues database that should be had in email. (This is not a libel against
the issue tracker, it's just a result of the fact that the issues database is
for permanent storage and flow annotation, not for real-time conversation.)
Commit access =============
@@TODO@@ This is what I prefer. This is different from how it is now in MMBase.
There are two types of commit access: full and partial. Full means anywhere in the tree, partial means only in that committer's specific area(s) of expertise.
How full commit access is granted:
After someone has successfully contributed a few non-trivial patches, some full
committer, usually whoever has reviewed and applied the most patches from that
contributor, proposes them for commit access. This proposal is sent only to the
other full committers -- the ensuing discussion is private, so that everyone can
feel comfortable speaking their minds. Assuming there are no objections, the
contributor is granted commit access. The decision is made by consensus; there
are no formal rules governing the procedure, though generally if someone
strongly objects the access is not offered, or is offered on a provisional basis.
The criteria for commit access are that the person's patches adhere to the guidelines in this file, adhere to all the usual unquantifiable rules of coding (code should be readable, robust, maintainable, etc), and that the person respects the "Hippocratic Principle": first, do no harm. In other words, what is significant is not the size or quantity of patches submitted, but the degree of care shown in avoiding bugs and minimizing unnecessary impact on the rest of the code. Many full committers are people who have not made major code contributions, but rather lots of small, clean fixes, each of which was an unambiguous improvement to the code.
How partial commit access is granted:
A full committer sponsors the partial committer. Usually this means the full
committer has applied several patches to the same area from the proposed partial
committer, and realizes things would be easier if the person were just
committing directly. Approval is not required from the full committers; it is
assumed that sponsors know what they're doing and will watch the partial
committer's first few commits to make sure everything's going smoothly.
Release numbering, compatibility, and deprecation =================================================
MMBase uses "MAJOR.MINOR.PATCH" release numbers. The general idea is:
1) Upgrading/downgrading between different patch releases in the same
MAJOR.MINOR line never breaks code. It may cause bugfixes to
disappear/reappear, but API signatures and semantics remain the same.
(Of course, the semantics may change in the trivial ways appropriate for
bugfixes, just not in ways that would force adjustments in calling code.)
2) Upgrading to a new minor release in the same major line may cause new APIs
to appear, but not remove any APIs. Any code written to the old minor
number will work with any later minor number in that line. However,
downgrading afterwards may not work, if new code has been written that
takes advantage of the new APIs.
3) When the major number changes, all bets are off. This is the only
opportunity for a full reset of the APIs, and while we try not to
gratuitously remove interfaces, we will use it to clean house a bit.MMBase does not use the "even==stable, odd==unstable" convention; any unqualified triplet indicates a stable release:
1.0.1 --> first stable patch release of 1.0 1.1.0 --> next stable minor release of 1.x after 1.0.x 1.1.1 --> first stable patch release of 1.1.x 1.1.2 --> second stable patch release of 1.1.x 1.2.0 --> next stable minor release after that
The order of releases is semi-nonlinear -- a 1.0.3 *might* come out after a 1.1.0. But it's only "semi"-nonlinear because eventually we declare a patch line defunct and tell people to upgrade to the next minor release, so over the long run the numbering is basically linear.
@@TODO@@ Working copies? nightly builds?
Deprecation: ------------
When a new, improved version of an API is introduced, the old one remains for compatibility, at least until the next major release. However, we mark the old one as deprecated and point to the new one, so people know to write to the new API if at all possible. When deprecating, mention the release after which the deprecation was introduced, and point to the new API. If possible, replace the old API documentation with a diff to the new one. For example:
/**
* @deprecated Provided for backward compatibility with the 1.0.0 API.
*
* Similar to newMethod(), but with the last
* parameter always set to FALSE.
*/When the major release number changes, the "best" new API in a series generally replaces all the previous ones (assuming it subsumes their functionality), and it will take the name of the original API. Thus, marking 'oldMethod' as deprecated in 1.1.x doesn't mean that 2.0.0 doesn't have 'oldMehtod', it just means the function's signature will be different: it will have the signature held by newMethod (or newestMethod, or whatever) in 1.1.x.
One exception to this replacement strategy is when the old function has a
totally unsatisfying name anyway. Deprecation is a chance to fix that: we give
the new API a totally new name, mark the old API as deprecated, point to the new
API; then at the major version change, we remove the old API, but don't rename
the new one to the old name, because its new name is fine.
Stabilizing and maintaining releases ====================================
Minor and major number releases go through a stabilization period before release,
and remain in maintenance (bugfix) mode after release. To start the release
process, we create an "A.B.x" branch based on the latest trunk.
The stabilization period for a new A.B.0 release normally lasts four weeks, and allows us to make conservative bugfixes and discover showstopper issues. The stabilization period begins with a release candidate tarball with the version A.B.0-rc1. Further release candidate tarballs may be made as blocking bugs are fixed.
At the beginning of the final week of the stabilization period, a new release
candidate tarball should be made if there are any changes pending since the last
one. The final week of the stabilization period is reserved for critical
bugfixes; fixes for minor bugs should be deferred to the A.B.1 release.
A critical bug is a non-edge-case crash, a data corruption problem, a major
security hole, or something equally serious.
Under some circumstances, the stabilization period will be extended:
* If a potentially destabilizing change must be made in order to fix a bug,
the entire four-week stabilization period is restarted. A potentially
destabilizing change is one which could affect many parts of MMBase in
unpredictable ways, or which involves adding a substantial amount of new
code. Any incompatible API change (only allowable in the first place if the
new release is an A.0.0 release) should be considered a potentially
destabilizing change.
* If a critical bugfix is made during the final week of the stabilization
period, the final week is restarted. The final A.B.0 release is always
identical to the release candidate made one week before (with the
exceptions discussed below).If there are disagreements over whether a change is potentially destabilizing or over whether a bug is critical, they may be settled with a committer vote.
After the A.B.0 release is out, patch releases (A.B.1, A.B.2, etc.) follow when bugfixes warrant them. Patch releases do not require a four week soak, because only conservative changes go into the line.
Certain kinds of commits can go into A.B.0 without restarting the soak period, or into a later release without affecting the testing schedule or release date:
Without voting:
- Changes that are a normal part of release bookkeeping, for example,
the steps listed in releasesnotes.txt.
- Changes to build.xml by, or approved by, the release manager.
With voting:
- Doc fixes in source files.Core code changes, of course, require voting, and restart the soak or test period, since otherwise the change could be undertested.
The voting system works like this:
A change needs three +1 votes from full committers (or partial committers for the involved areas), and no vetoes, to go into A.B.x.
If you cast a veto (i.e. -1), please state the reason in the concerns field, and include a url / message-id for the list discussion if any.
Voting +1 on a change doesn't just mean you approve of it in principle. It means
you have thoroughly reviewed the change, and find it correct and as nondisruptive
as possible. When it is committed to the release branch, all who voted for it,
as well as the original author and the person making the commit are considered
equally answerable for bugs.
If you've reviewed a patch, and like it but have some reservations, you can write "+1 (concept)" and then ask questions on the list about your concerns. You can write "+0" if you like the general idea but haven't reviewed the patch carefully. Neither of these votes counts toward the total, but they can be useful for tracking down people who are following the change and might be willing to spend more time on it.
@@TODO@@ should the applications module have a looser voting system? They
are not core code, and because there are usually fewer experts available to
review changes there. A change in these areas can go in with a +1 from a full
committer or a partial committer for that area, at least one +0 or "concept +1"
from any other committer, and no vetoes. (If a change affects the build system,
however, it is considered a core change, and needs three +1's.)
Before proposing a change, you should try merging it onto the branch to ensure that it doesn't produce merge conflicts. If conflicts occur, please create a new temporary branch from the release branch with your changed merged and the conflicts resolved. The branch should be named A.B.x-<NAME>. Add a note in the proposal about the existence of the temporary branch. If the change involves further work, you can merge those revisions to the branch.
-- Andr� van Toly http://www.toly.nl mobile +31(0)627233562 ------------------------------------------------------------------>><<-- _______________________________________________ Developers mailing list [email protected] http://lists.mmbase.org/mailman/listinfo/developers
