I think the primary focus is grouping packages with the following rules:
1) Group packages that are strongly connected
2) Start with the largest group and try to merge groups into it that do
not cause additional dependencies
3) repeat for all groups
I think you need a cost/entropy function to calculate the optimum division. If
you group package A and B then the cost is zero if A and B have identical
imports. Even if A and B are not cohesive they could be placed in the same
bundle because they do not drag in additional dependencies. So the interesting
question is what the cost is if B would add one additional import. Is it worth
it? Or not?
Part of that analysis is then to analyze if you could do more grouping if
classes were moved from packages to other (new?) packages. I.e. sometimes you
have a package where there is only one class that makes the package not
groupable. I think there should be the concept of a "dependency cost". If you
import X by 15 packages and 254 classes it is likely that you get your moneys
worth for that dependency. However, if you find that a single class drags in
dependencies that nobody else uses it is likely that that class is expensive.
It is interesting how much automation we can do there but I expect you need
people to look at the details.
One of the biggest modularity problems are usually when you get bridge classes.
I.e. someone has a library doing X but wants to make it available with for
example Spring. There is usually then a few classes bridging the library to the
Spring world, which can be extremely expensive. For example, bnd is coupled to
ant but I made sure that was a separate package.
This all seems closely related to the concept of entropy and it might be
interesting to take a look at Shannon et al. You have to find a decomposition
that has minimum entropy where entropy is somehow defined in terms of imports
versus contents. You want to group as much as possible while minimizing the
connections between the groups. Again, this normally means you need a cost
function and optimize that cost function.
However, start with the mechanic grouping and apply that idea to open source
projects to see how this would look like. If you could calculate the "entropy"
of existing bundles that would also be very interesting.
Kind regards,
Peter Kriens
On 8 jun 2011, at 10:35, Tiger Gui wrote:
> Hi Peter,
>
> I am working about source code dependencies analyse algorithm design
> and implement job, i will finish the whole analyse algorithm in the
> coming month. This algorithm include two sections: package and class.
>
> 1. Package section
>
> a. It can analyse package cycles in project source code
> b. Analyse all the necessary packages for each package
> c. Tell us who use it about each package
>
> 2. Class section
>
> a. This algorithm will tell us all the class cycles in project source
> code (for example A -> B -> C -> A)
> b. Analyse all the necessary classes for each class (for example, it
> can tell us class A use class B, C and D)
> c. Tell us who use it about each class (for example, it can tell us
> class A was used by class B and C)
>
> After we get the source code analyse report, we should split the
> project into several OSGi bundles, so the problems is how should we
> split the project according to the report.
>
> In my initial option:
>
> A. classes in a cycle should be in the same bundle
> B. classes (or interfaces ) which were used much by other classes, but
> does not require any other class, can be in the same bundle. (Usually,
> these are basic interface or abstract class). These classes usually
> be API define classes.
>
> I am very clear about these two situations, but there should be many
> other situations. So, you advises ?
>
> --
> Best Regards
> ----------------------------------------------------
> Tiger Gui [[email protected]]