Hi all,
I will add more about the reasons on why the existing atom pub client
is not using Abdera.
As Florent said, the existing client was written in a hurry for a
client and it was not aligned yet with last chemistry API.
The objective was to build a very responsive rich client (based on
eclipse RCP framework) to be used to view/edit/publish stories by
people writing news.
Here are two requirements on the application (to have an idea about
the application constraints - e.g. performance / memory /
responsiveness constraints):
- People should be able to fetch news raw text (by using atom client),
create / edit news and publish them as faster as possible (several
second to create and publish a news)
- The application should display many views on remote feeds (loaded
using the atom client)
- Feed views must be refreshed in a 2 seconds interval.
Initially, I started using abdera but for several reasons (that I
explain below) I decided that it was not appropriate for the type of
application I wanted to build.
So, lets see how can we use abdera to build a chemistry object model
implementation for an atom client. We have 2 choices:
1. Either you wrap abdera objects in your own chemistry objects
2. Either you use abdera to parse the feed and build your chemistry
objects that are totally detached from abdera objects.
Let discuss both of these 2 approaches. I will start with the worst one.
1. Wrap abdera objects in chemistry object
This was my first approach. Here are the pros/cons:
Pros:
- Simplify a bit the implementation of the chemistry model - atom
validation included. No need to use Stax (or SAX) code to read your
objects from the remote feed.
In fact the simplification added by abdera is relative. You still
need to write code to parse your CMIS objects from abdera DOM.
Feed and entry parsing are anyway not complicated (atom is a nice
and simple format).
So the only thing abdera is really providing is atom validation and
an atom aware XML DOM. The rest should be implemented anyway (like the
CMIS Object parsing from the abdera DOM).
If you don't want atom validation then using another XML DOM library
will be the same from chemistry code perspective - where you need to
parse the CMIS object.
Here is a link to benchmarks on several XML DOM parsers including
AXIOM (the one used by Abdera):
http://www.xml.com/lpt/a/1703
Cons:
- Add to your application many extra dependencies (If I remember well
3 or 4 abdera JARs + 2 axiom JARs)
- Your CMIS objects will be larger (embed Abdera objects which
contains additional data not used by the CMIS code).
- Debug is difficult.
An annoying side effect is that debug becomes difficult. When you are
introspecting CMIS objects (that wraps abdera objects) you will need
to introspect abdera objects that are based on AXIOM model which is a
lazy DOM model (it is reading XML data into the DOM object only when
required). To understand what your object contain you need to
understand the AXIOM model.
- A technical issue I had with AXIOM way of doing things.
I will describe it here:
As mentioned above, AXIOM is loading data from XML into the DOM only
at client demand. For example if the client don't need to access the
30th entry in the feed the data of that entry will not be read from
XML input stream.
This is a very interesting AXIOM feature that I like but this feature
has a side effect in my application case.
Because AXIOM read the input stream only at demand it requires to have
the input stream opened until you read all the data you want from the
stream.
This means if you close the stream before the UI is completely updated
you will have an exception like this one:
Exception in thread "main" org.apache.abdera.parser.ParseException:
java.lang.RuntimeException: [was class java.io.IOException] Stream
closed
at org.apache.abdera.parser.stax.FOMBuilder.next(FOMBuilder.java:260)
at
org
.apache
.axiom.om.impl.llom.OMElementImpl.getNextOMSibling(OMElementImpl.java:
265)
at
org
.apache
.axiom
.om
.impl
.traverse.OMChildrenQNameIterator.next(OMChildrenQNameIterator.java:93)
at
org
.apache
.abdera
.parser
.stax
.util.FOMElementIteratorWrapper.next(FOMElementIteratorWrapper.java:41)
at org.apache.abdera.parser.stax.util.FOMList.buffer(FOMList.java:74)
at org.apache.abdera.parser.stax.util.FOMList.size(FOMList.java:88)
at
org
.nuxeo
.chemistry
.client.app.test.TestAbderaConn.parseWithAbdera(TestAbderaConn.java:61)
at
org
.nuxeo
.chemistry.client.app.test.TestAbderaConn.main(TestAbderaConn.java:39)
Caused by: java.lang.RuntimeException: [was class java.io.IOException]
Stream closed
at
com
.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:
18)
at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:706)
at
com
.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:
3655)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:
809)
at
org
.apache
.axiom.om.impl.builder.StAXBuilder.createOMText(StAXBuilder.java:245)
at
org
.apache
.axiom.om.impl.builder.StAXBuilder.createOMText(StAXBuilder.java:216)
at
org
.apache.abdera.parser.stax.FOMBuilder.applyTextFilter(FOMBuilder.java:
158)
at org.apache.abdera.parser.stax.FOMBuilder.next(FOMBuilder.java:206)
... 7 more
And in order to have a responsive application you need to update a
feed refresh asynchronously from the UI thread. You cannot control
when the stream is closed and when the UI is completely loaded.
So, this cool feature of AXIOM makes the AXIOM DOM unusable (or
hardly usable) in live objects displayed in rich client applications.
Let's look at the second option of integrating abdera
2. Using Abdera to parse the feed and build your chemistry objects
that are totally detached from abdera objects.
Pros:
- Provides ATOM validation and ATOM oriented DOM objects.
- Easy way to parse feeds and CMIS objects using the high level Adbera
DOM.
- Efficient way of parsing the feed input stream due to "load on
demand" feature of AXIOM. (sections in the feed you are not interested
in will not be loaded in memory)
Cons:
- Extra dependencies required by the applications as mentioned above
(~6 extra JARs)
This is an acceptable approach. The only issue for me are the extra
dependencies. To parse an atom feed you need 6 jars!?
It's true that for a chemistry client - parsing the feed is not
interesting at all. The client code must concentrate on implementing
the chemistry model in an efficient way to be able to use that client
in both simple administration applications and highly responsive rich
client applications.
My problem is that correctly parsing an atom feed with the focus on
CMIS objects can be done using Stax in a very efficient way and only
by writing several classes (no more than 10).
So after thinking more on this I also envisaged to use only AXIOM.
(without abdera). With only 2 extra jars I am able to efficiently load
my feeds. But I lose the atom model.
Anyway finally I realized that by only writing a few helper classes
over Stax I am able to do a more high level parsing in the style AXIOM
is doing - so I adopted this extreme approach. :)
May be my lists with pros and cons is not complete - but anyway it may
help you in adopting a solution.
Personally, If we absolutely want abdera I will vote for solution 2.
since 1. is not acceptable for my use cases.
If Abdera is not required then we can either use AXIOM, either
directly use Stax API as in the current client.
Regards,
Bogdan
On 4 juin 09, at 16:25, Florent Guillaume wrote:
On 4 Jun 2009, at 15:34, Gabriele Columbro wrote:
G'day Chemicals,
as I finally found some time to spend on the mighty Chemistry, I
was able to go trough the ongoing mail threads and look a little
bit better at the status of the Chemistry codebase (with an eye on
which parts of Alfresco that may be suitable for contribution).
I would like to start working a bit on the client / TCK / build
automation part of the project, but, before discussing the details
with you guys and get into action, I saw a couple of open mail
threads (forwarded one and [4]) on a topic that can impact a lot
the way I can contribute to this project:
I'm talking about the implementation of the AtomPub Java Client.
As I understand Florent is working on the AtomPub Java Client and
IIUC it isn't going to be based on Abdera. Though I could not find
yet any code in SVN (@Florent: nor in the Nuxeo HG 'default' [1]
revision, am I pointing the right one or 'integrate-atom-pub' [2]
is the one to look at?),
Yes the code we have is in branch integrate-atompub-client in http://hg.nuxeo.org/sandbox/chemistry/
-- the old repo used before switching to Apache svn.
But as it happens I'm studying this code right now to adapt it to
the newest Chemistry API refactorings, and I'll commit code in svn
before tonight, although it may be nonfunctional and not very unit
tested at all :( This code was written in a hurry by Bogdan for a
customer (although we have the IP on it) and is not up to the
standards I expected of it, so don't hesitate to criticize it and
discuss refactorings.
so I was finally wondering:
1__ What's the state of the art of the AtomPub Java client impl?
What the dev's opinion on the usage of Abdera? Is that already been
discussed and I missed it? :)
No real discussion in these lists.
After having worked with Abdera for the server part, I've come to
the conclusion that it's a big library, rewrapping a lot of Axiom.
Also it's still very young, and not well designed for extensibility
if you stray from the simple "one feed with entries in it" model.
Bogdan, for the client part, decided to not use Abdera because one
of his goals was to allow it to be a small embedded library, so StAX
was all that was really needed. Abdera apparently is creating lots
and lots of objects and use lots of memory, when a simple StAX-based
parser gave him huge performance boosts.
There are a couple of reasons why I ask you guys suggestions/
clarifications on this topic:
- Adbera is the standard Apache Atom implementation and we can rely
on a good cooperation between Apache projects
Agreed, however note that working with SNAPSHOTs of other projects
is a headache in terms of release. So if we start modifying Abdera
then we'll have to think about how to release.
- In terms of maintenance overhead, I see good improvements if
Abdera is used both in the server (IIUC) and client part
Do you see any factoring between client and server beyond the Abdera
extensions, beyond the few ElementWrapper subclasses?
Note also that I have already started using Abdera's
ExtensibleElementWrapper in chemistry-atompub, however I don't
register them as an Abdera extension (I instantiate directly)
because Abdera extensions are global and I don't want to step on the
toes of any other code that would like to work with Chemistry but
already uses its own Abdera extension (like Alfresco). chemistry-
atompub only has the methods useful for the server though, not yet
the client.
- In terms of dependencies explosion, I don't see a big deal in the
Abdera (client) chain of (runtime) dependencies, especially if you
consider that the (Java) client is going to be most likely to be
used for Java based Content Repositories (or custom applications)
integration and these are typically library-flooded applications
anyways.
I can't disagree with the fact that projects usually already use
lots of libraries, so what's one more. Note however that Abdera is
huge, abdera-core + abdera-i18n + abdera-parser are already at 900
Kb (Mostly due to Unicode data in abdera-i18n by the way).
- Choosing for Abdera, may enable me to contribute the already
functional Abdera extension of Alfresco, so to give quite of a jump
start on the TCK/Client side
That's a good point.
BTW we also have an Abdera extension in yet another (older) CMIS
sandbox (http://hg.nuxeo.org/sandbox/nuxeo-cmis/file/tip/src/main/java/org/apache/abdera/ext/cmis/
) which could be used as well. If you contribute yours, I'll look at
merging useful things we may have into it (although Abdera
extensions are in fact rather simple).
- The usage of Abdera seems to be an enabler for contributions
already built on top of it (see Sourcesense CMIS portlet [3])
2__ Do you think the Abdera extension could be a valid
contribution? And in such a chase, would it belong to Chemistry or
Abdera itself?
I would leave it in Chemistry until we consider it mature enough to
be moved to Abdera -- barring any dependency problems. This way
we'll get much more rapid turnaround in its update. It could move to
Abdera once CMIS 1.0 is released, for instance.
As I'm not sure what the status of Florent implementation and
particularly I don't want to waste any effort already done, and
this is actually my first interaction with the list,
so please forgive me if I'm missing some blatantly obvious point ;)
No problem, these are all worthwhile points.
My next steps are to study Bogdan's client code, and if I (or the
list) feel its inadequate the I'll scrap it to go back to a simple
Abdera-based implementation. I'll commit something tonight in svn so
that others can look at it.
Florent
--
Florent Guillaume, Head of R&D, Nuxeo
Open Source, Java EE based, Enterprise Content Management (ECM)
http://www.nuxeo.com http://www.nuxeo.org +33 1 40 33 79 87