On 1/6/11 2:36 AM, Alex Karasulu wrote:
Hi all,
Excuse the cross post but this also has significance to the API list.
Problem
------------
For our benefit and the benefit of our users we need to be uber careful with
changes after a major GA release. We have another thread where it seems
people agree with the Eclipse scheme of versioning and this sounds really
flexible for our needs. We can do a 2.0.0-M1 release at any time without
clamping down on API's. Only when we do a RC do we have to freeze changes to
interfaces.
The debate still remains as to what constitues an interface. Emmanuel seems
to disagree with configuration, schema, and partition db formats as being
interfaces of concern but for the time being we can just discuss those we do
agree on. There's no doubt about APIs and SPIs.
I don't disagree with Schema, but Schema are clearly defined by RFCs,
there is no possible interpretation about their syntax and definition.
However, the schema manipulation API is in the scope of this discussion.
Partition and configuration are not part of the Ldap API, thus are
irrelevant in this discussion about shared refactoring.
Solution
------------
So how do we make this as painless to us and users as much as is possible?
The best way is to keep the surface area of the SPI or API small, create
solid boundaries, and avoid exposing implementation details and
implementation classes.
By reducing the surface area with implementation hiding we can effectively
limit exposure and reduce the probability of needing to make a change that
breaks with our user contract. You might be asking what's a real world
example of this for us in shared?
And incidentally this is one of the things I've been working on in my
branch.
Real World Example in Shared
--------------------------------------------
Let's take the o.a.d.s.ldap.message package as an example. This package
contains classes and interfaces modeling LDAP requests and responses: i.e.
AddRequest, DeleteResponse etc. It's in the shared-ldap module.
In this package, in addition to request response interfaces, we're exposing
implementation classes for them. The implementation classes, in turn have
dependencies on o.a.d.s.ldap.codec.* packages.
Not any more, I hope. We did a big refactoring last september in order
to remove this coupling. Of course, we may have some remaining
dependencies, but this is more or less not intentional.
This is because some
implementation classes depend on codec functionality which is an
implementation detail.
Not true anymore (or is it?).
This might be due to eager reuse or the addition of
utility methods into codec classes for convenience. Some of these
dependencies can be removed by breaking out non-implementation specific
methods and constants in codec classes into utility methods outside of the
package or the module all together. Furthermore the codec implementation
that handles [de]marshaling has to access package friendly (non-API) methods
on implementation classes while encoding.
Not sure that I get what you mean here. Can you be a bit more explicit ?
In the end, dependency upon further transitive dependencies are making us
expose almost all implementation classes in shared, and most can easily be
decoupled and hidden. It's effectively making everything in shared come
together in one big heap exposing way more than we want to.
It's quite impossible in Java to 'hide' all the classes that a user
should not manipulate. Unless you use package protected classes, and it
quickly has a limit, I would rather think in term of 'exposed' (ie
documented) API. That this documented API is gathered in one separate
module for convenience is another aspect, but the user will still have
to depend on all the other modules.
So all in all, should we define a module (a maven module) containing the
public API and the associated implementation ? Probably (But this is not
an absolute necessity). I guess this is what you have in mind, so let's
see what's the proposal is...
LDAP Client API
------------------------
Everyone agrees that this API is very important to get right with a 1.0.
Right now this API pulls in several public interfaces directly from shared.
Those interfaces also pull in some implementation classes. The logical API
extends into shared this way. Effectively the majority of shared is exposed
by the client API. The client API does not end at it's jar boundary.
All this exposure increases the chances of API change when all
implementation details are wide open and part of the client API. And this
is what I'm trying to limit. There are ways we can decouple these
dependencies very nicely with a mixed bag of refactoring techniques while
breaking up shared-ldap into lesser more coherent modules. The idea is to
expose the bare minimum of only what we need to expose. Yes the shared code
has become very stable over time but the most stability is in the interfaces
and if we only expose these instead of implementation classes then we'll
have an awesome API that may remain 1.X for a while and not require
deprecations as new functionality is introduced.
How will you limit the visibility of the modules you don't want the user
to be exposed to ?
Finishing Up the Example
-------------------------------------
So what concrete things can we do?
The biggest step is to hide as many of the implementation classes as
possible. In my experimental branch I started by:
(1) Moving out methods and constants in codec classes causing
unnecessary dependencies from message package classes and interfaces. There
was a situation even where StringTools for example depended on codec
classes, and virtually everything doing string related operations used
StringTools there by causing man interdependencies. It then becomes a web of
dependencies across packages.
There is *one* method in StringTools that calls a codec method :
Hex.encodeHex. It's a mistake, as we already have another StringTools
method (toHexString) doing the same thing (to be double chekced). This
is typically a wrong usage of a class from a wrong package, and we
should get rid of such coupling.
This is extremely painful to do such a cleanup without first decoupling
all the pieces by creating separate jars, before regrouping the packages
back again.
The question here is more to know how far we want to go, considering
that shared contains 900 classes, more than 5600 methods and around 80
packages.
(2) Breaking up shared into multiple Maven modules so now there's the
following modules:
o shared-util
o asn1-api
o asn1-ber
o ldap-model
- name pkg
- message pkg (no impl classes)
- schema pkg
- cursor pkg
- filter pkg
- entry pkg
- constants pkg
o ldap-codec (not complete)
I would not have 2 maven modules for asn1. It's probably overkilling. I
would rather name the ldap-model ldap-api, because this is exactly what
it is.
Otherwise, I like this decomposition.
There are a few more things we will have to discuss about :
- ldif (part of ldap-model/ldap-api)
- aci (but it may be in a separate module, a ADS specific one, as it's
only good for ADS
- trigger (same as above)
- csn (maybe part of shared-util)
- dsml (a separate module ?)
- client api (connection, futures, exceptions) (part of ldap-model/ldap-api)
- i18n (separate module would be good)
- the schema loader probably deserves a separate ADS module too
- the schema converter too
The next step would be to make these artifacts into OSGi bundles. There will
be nothing special about it. I'm just going to leverage bundle packaging to
hide implementation classes which you cannot do as easily with regular jars
with explicit package exports.
That should be a no brainer.
Once this is done, we can export a minimal set of classes from the codec,
hide it's remainder, and have the model interfaces be the primary dependency
used by the client API without exposing implementation classes and keeping
the API weight (surface area) down.
There's a lot more to do, the job is 40% complete. The wait for the AP merge
makes this work feel moot since the merge is going to be nasty so I might
just redo this again after Emmanuel merges. That lets me be a bit more
agressive and experimental for now.
go for it. As soon as you have something stable, as it's all about
moving pieces, we can do that bit by bit, instead of merging.
Plus if Pierre and Seelman decide to opt for using m2eclipse+Maven+Tyco (as
Jesse mentioned) for the Studio build then these refactorings a second time
will not incur manual fixing in Studio which depends on shared now. I can
refactor Studio at the same time.
The real issue here is m2eclipse : it's everything but usable for a
project as big as ADS. I have tried it again one month ago, and it smell
like Maven 1 to me...
Conclusions
-----------------
So this example shows some things we can do to make things tighter and
easier for us to better manage our API's. We can do anything we like to the
implementation to fix bugs and to improve performance in point releases
without impacting the minimal interfaces we expose for the API.
And it can be a good opportunity to clean up the shared module which has
become a giant plate of spaghetti (with bolognese sauce).
We take similar steps inside the server to restrict down the exposed SPI
however using OSGi is probably not going to be an option there right away
since it gets more complicated. Here in shared I would use bundle packaging
just to hide implementation classes, not to define services etc.
Also there are some classes that were proposed for shared, i.e. DnNode which
at this point in time are specific to the server. Sure Studio might use
these classes eventually, however these classes are not generic LDAP. These
classes can stay in shared but they should be kept in a module separate from
the ldap-model for example.
Agreed. There may be other classes to, they have to be identified.
Why you may ask? Because these classes are not
generic LDAP classes (like Entry, or Dn, or Cursor is generic and) are not
needed by every client, nor are they viable for every server a client
connects to. They only serve a purpose when used in Studio, connecting to
ApacheDS.
They are helper classes. They certainly don't belongs to
ldap-model/ldap-api, and if they have to stay in shared, I would like to
move it to utils.
DnNode might be needed by Studio in the future for making a plugin and
widget that allows users to graphically manage the boundaries of
administrative areas, however it's not something every client needs, and it
certainly is not something needed by a generic client connecting to every
server.
--> utils.
So things like this as well as the category of interfaces and classes used
for modeling ApacheDS specific features which also are used by Studio should
be in their own modules, if kept in shared, separate from the model or the
codec bundles. This way they can remain in shared, used by both Studio and
ApacheDS without polluting the client API. As an example, the ACI mechanism
we use is very ApacheDS specific and is used by Studio's ACI editor. I
wanted to say X.500 specific, but we've changed our ACIs a tiny bit. So we
might have an ldap-aci module that pulls these things out of the ldap-model
so our standard client API remains clean and light, free of our ApacheDS
specific features.
+1. See upper.
The power behind this API is the number of people and projects that will use
it. We don't want the OpenDS folks for example to avoid it just because they
don't want our ApacheDS specific interfaces weighing it down and
contaminating it. I'd love to see the API used with a light footprint on
mobile devices, so footprint will matter in this odd ball case as well.
--
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com