Hi all,

I will add more about the reasons on why the existing atom pub client is not using Abdera.

As Florent said, the existing client was written in a hurry for a client and it was not aligned yet with last chemistry API. The objective was to build a very responsive rich client (based on eclipse RCP framework) to be used to view/edit/publish stories by people writing news. Here are two requirements on the application (to have an idea about the application constraints - e.g. performance / memory / responsiveness constraints): - People should be able to fetch news raw text (by using atom client), create / edit news and publish them as faster as possible (several second to create and publish a news) - The application should display many views on remote feeds (loaded using the atom client)
- Feed views must be refreshed in a 2 seconds interval.

Initially, I started using abdera but for several reasons (that I explain below) I decided that it was not appropriate for the type of application I wanted to build. So, lets see how can we use abdera to build a chemistry object model implementation for an atom client. We have 2 choices:
1. Either you wrap abdera objects in your own chemistry objects
2. Either you use abdera to parse the feed and build your chemistry objects that are totally detached from abdera objects.

Let discuss both of these 2 approaches. I will start with the worst one.

1. Wrap abdera objects in  chemistry object
This was my first approach. Here are the pros/cons:
Pros:
- Simplify a bit the implementation of the chemistry model - atom validation included. No need to use Stax (or SAX) code to read your objects from the remote feed.

In fact the simplification added by abdera is relative. You still need to write code to parse your CMIS objects from abdera DOM. Feed and entry parsing are anyway not complicated (atom is a nice and simple format). So the only thing abdera is really providing is atom validation and an atom aware XML DOM. The rest should be implemented anyway (like the CMIS Object parsing from the abdera DOM). If you don't want atom validation then using another XML DOM library will be the same from chemistry code perspective - where you need to parse the CMIS object. Here is a link to benchmarks on several XML DOM parsers including AXIOM (the one used by Abdera):
http://www.xml.com/lpt/a/1703


Cons:
- Add to your application many extra dependencies (If I remember well 3 or 4 abdera JARs + 2 axiom JARs) - Your CMIS objects will be larger (embed Abdera objects which contains additional data not used by the CMIS code).
- Debug is difficult.
An annoying side effect is that debug becomes difficult. When you are introspecting CMIS objects (that wraps abdera objects) you will need to introspect abdera objects that are based on AXIOM model which is a lazy DOM model (it is reading XML data into the DOM object only when required). To understand what your object contain you need to understand the AXIOM model.
- A technical issue I had with AXIOM way of doing things.
I will describe it here:
As mentioned above, AXIOM is loading data from XML into the DOM only at client demand. For example if the client don't need to access the 30th entry in the feed the data of that entry will not be read from XML input stream. This is a very interesting AXIOM feature that I like but this feature has a side effect in my application case. Because AXIOM read the input stream only at demand it requires to have the input stream opened until you read all the data you want from the stream. This means if you close the stream before the UI is completely updated you will have an exception like this one:


Exception in thread "main" org.apache.abdera.parser.ParseException: java.lang.RuntimeException: [was class java.io.IOException] Stream closed
        at org.apache.abdera.parser.stax.FOMBuilder.next(FOMBuilder.java:260)
at org .apache .axiom.om.impl.llom.OMElementImpl.getNextOMSibling(OMElementImpl.java: 265) at org .apache .axiom .om .impl .traverse.OMChildrenQNameIterator.next(OMChildrenQNameIterator.java:93) at org .apache .abdera .parser .stax .util.FOMElementIteratorWrapper.next(FOMElementIteratorWrapper.java:41)
        at org.apache.abdera.parser.stax.util.FOMList.buffer(FOMList.java:74)
        at org.apache.abdera.parser.stax.util.FOMList.size(FOMList.java:88)
at org .nuxeo .chemistry .client.app.test.TestAbderaConn.parseWithAbdera(TestAbderaConn.java:61) at org .nuxeo .chemistry.client.app.test.TestAbderaConn.main(TestAbderaConn.java:39) Caused by: java.lang.RuntimeException: [was class java.io.IOException] Stream closed at com .ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java: 18)
        at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:706)
at com .ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java: 3655) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java: 809) at org .apache .axiom.om.impl.builder.StAXBuilder.createOMText(StAXBuilder.java:245) at org .apache .axiom.om.impl.builder.StAXBuilder.createOMText(StAXBuilder.java:216) at org .apache.abdera.parser.stax.FOMBuilder.applyTextFilter(FOMBuilder.java: 158)
        at org.apache.abdera.parser.stax.FOMBuilder.next(FOMBuilder.java:206)
        ... 7 more

And in order to have a responsive application you need to update a feed refresh asynchronously from the UI thread. You cannot control when the stream is closed and when the UI is completely loaded.

So, this cool feature of AXIOM makes the AXIOM DOM unusable (or hardly usable) in live objects displayed in rich client applications.

Let's look at the second option of integrating abdera


2. Using Abdera to parse the feed and build your chemistry objects that are totally detached from abdera objects.

Pros:
- Provides ATOM validation and ATOM oriented DOM objects.
- Easy way to parse feeds and CMIS objects using the high level Adbera DOM. - Efficient way of parsing the feed input stream due to "load on demand" feature of AXIOM. (sections in the feed you are not interested in will not be loaded in memory)

Cons:
- Extra dependencies required by the applications as mentioned above (~6 extra JARs)

This is an acceptable approach. The only issue for me are the extra dependencies. To parse an atom feed you need 6 jars!? It's true that for a chemistry client - parsing the feed is not interesting at all. The client code must concentrate on implementing the chemistry model in an efficient way to be able to use that client in both simple administration applications and highly responsive rich client applications.

My problem is that correctly parsing an atom feed with the focus on CMIS objects can be done using Stax in a very efficient way and only by writing several classes (no more than 10).

So after thinking more on this I also envisaged to use only AXIOM. (without abdera). With only 2 extra jars I am able to efficiently load my feeds. But I lose the atom model.

Anyway finally I realized that by only writing a few helper classes over Stax I am able to do a more high level parsing in the style AXIOM is doing - so I adopted this extreme approach. :)


May be my lists with pros and cons is not complete - but anyway it may help you in adopting a solution. Personally, If we absolutely want abdera I will vote for solution 2. since 1. is not acceptable for my use cases. If Abdera is not required then we can either use AXIOM, either directly use Stax API as in the current client.


Regards,
Bogdan


On 4 juin 09, at 16:25, Florent Guillaume wrote:

On 4 Jun 2009, at 15:34, Gabriele Columbro wrote:
G'day Chemicals,
as I finally found some time to spend on the mighty Chemistry, I was able to go trough the ongoing mail threads and look a little bit better at the status of the Chemistry codebase (with an eye on which parts of Alfresco that may be suitable for contribution).

I would like to start working a bit on the client / TCK / build automation part of the project, but, before discussing the details with you guys and get into action, I saw a couple of open mail threads (forwarded one and [4]) on a topic that can impact a lot the way I can contribute to this project:
I'm talking about the implementation of the AtomPub Java Client.

As I understand Florent is working on the AtomPub Java Client and IIUC it isn't going to be based on Abdera. Though I could not find yet any code in SVN (@Florent: nor in the Nuxeo HG 'default' [1] revision, am I pointing the right one or 'integrate-atom-pub' [2] is the one to look at?),

Yes the code we have is in branch integrate-atompub-client in http://hg.nuxeo.org/sandbox/chemistry/ -- the old repo used before switching to Apache svn.

But as it happens I'm studying this code right now to adapt it to the newest Chemistry API refactorings, and I'll commit code in svn before tonight, although it may be nonfunctional and not very unit tested at all :( This code was written in a hurry by Bogdan for a customer (although we have the IP on it) and is not up to the standards I expected of it, so don't hesitate to criticize it and discuss refactorings.

so I was finally wondering:
1__ What's the state of the art of the AtomPub Java client impl? What the dev's opinion on the usage of Abdera? Is that already been discussed and I missed it? :)

No real discussion in these lists.
After having worked with Abdera for the server part, I've come to the conclusion that it's a big library, rewrapping a lot of Axiom. Also it's still very young, and not well designed for extensibility if you stray from the simple "one feed with entries in it" model. Bogdan, for the client part, decided to not use Abdera because one of his goals was to allow it to be a small embedded library, so StAX was all that was really needed. Abdera apparently is creating lots and lots of objects and use lots of memory, when a simple StAX-based parser gave him huge performance boosts.

There are a couple of reasons why I ask you guys suggestions/ clarifications on this topic: - Adbera is the standard Apache Atom implementation and we can rely on a good cooperation between Apache projects

Agreed, however note that working with SNAPSHOTs of other projects is a headache in terms of release. So if we start modifying Abdera then we'll have to think about how to release.

- In terms of maintenance overhead, I see good improvements if Abdera is used both in the server (IIUC) and client part

Do you see any factoring between client and server beyond the Abdera extensions, beyond the few ElementWrapper subclasses?

Note also that I have already started using Abdera's ExtensibleElementWrapper in chemistry-atompub, however I don't register them as an Abdera extension (I instantiate directly) because Abdera extensions are global and I don't want to step on the toes of any other code that would like to work with Chemistry but already uses its own Abdera extension (like Alfresco). chemistry- atompub only has the methods useful for the server though, not yet the client.

- In terms of dependencies explosion, I don't see a big deal in the Abdera (client) chain of (runtime) dependencies, especially if you consider that the (Java) client is going to be most likely to be used for Java based Content Repositories (or custom applications) integration and these are typically library-flooded applications anyways.

I can't disagree with the fact that projects usually already use lots of libraries, so what's one more. Note however that Abdera is huge, abdera-core + abdera-i18n + abdera-parser are already at 900 Kb (Mostly due to Unicode data in abdera-i18n by the way).

- Choosing for Abdera, may enable me to contribute the already functional Abdera extension of Alfresco, so to give quite of a jump start on the TCK/Client side

That's a good point.
BTW we also have an Abdera extension in yet another (older) CMIS sandbox (http://hg.nuxeo.org/sandbox/nuxeo-cmis/file/tip/src/main/java/org/apache/abdera/ext/cmis/ ) which could be used as well. If you contribute yours, I'll look at merging useful things we may have into it (although Abdera extensions are in fact rather simple).

- The usage of Abdera seems to be an enabler for contributions already built on top of it (see Sourcesense CMIS portlet [3])

2__ Do you think the Abdera extension could be a valid contribution? And in such a chase, would it belong to Chemistry or Abdera itself?

I would leave it in Chemistry until we consider it mature enough to be moved to Abdera -- barring any dependency problems. This way we'll get much more rapid turnaround in its update. It could move to Abdera once CMIS 1.0 is released, for instance.

As I'm not sure what the status of Florent implementation and particularly I don't want to waste any effort already done, and this is actually my first interaction with the list,
so please forgive me if I'm missing some blatantly obvious point ;)

No problem, these are all worthwhile points.

My next steps are to study Bogdan's client code, and if I (or the list) feel its inadequate the I'll scrap it to go back to a simple Abdera-based implementation. I'll commit something tonight in svn so that others can look at it.

Florent

--
Florent Guillaume, Head of R&D, Nuxeo
Open Source, Java EE based, Enterprise Content Management (ECM)
http://www.nuxeo.com   http://www.nuxeo.org   +33 1 40 33 79 87


Reply via email to