Author: veithen
Date: Tue Jul 21 00:37:08 2009
New Revision: 796089
URL: http://svn.apache.org/viewvc?rev=796089&view=rev
Log:
Added some material to the tutorial/documentation.
Modified:
webservices/commons/trunk/modules/axiom/src/docbkx/tutorial.xml
Modified: webservices/commons/trunk/modules/axiom/src/docbkx/tutorial.xml
URL:
http://svn.apache.org/viewvc/webservices/commons/trunk/modules/axiom/src/docbkx/tutorial.xml?rev=796089&r1=796088&r2=796089&view=diff
==============================================================================
--- webservices/commons/trunk/modules/axiom/src/docbkx/tutorial.xml (original)
+++ webservices/commons/trunk/modules/axiom/src/docbkx/tutorial.xml Tue Jul 21
00:37:08 2009
@@ -1043,6 +1043,304 @@
</section>
</chapter>
+ <chapter>
+ <title>Common mistakes, problems and anti-patterns</title>
+ <para>
+ This chapter presents some of the common mistakes and problems
people face when writing code
+ using Axiom, as well as anti-patterns that should be avoided.
+ </para>
+ <section>
+ <title>Violating the
<classname>javax.activation.DataSource</classname> contract</title>
+ <para>
+ When working with binary (base64) content, it is sometimes
necessary to write a
+ custom <classname>DataSource</classname> implementation to
wrap binary data that is
+ available in a different form (and for which Axiom or the Java
Activation Framework
+ has no out-of-the-box data source implementation). Data
sources are also sometimes
+ (but less frequently) used in conjunction with
<classname>OMSourcedElement</classname>
+ and <classname>OMDataSource</classname>.
+ </para>
+ <para>
+ The documentation of the <classname>DataSource</classname> is
very clear on the expected
+ behavior of the <methodname>getInputStream</methodname> method:
+ </para>
+<programlisting>/**
+ * This method returns an InputStream representing
+ * the data and throws the appropriate exception if it can
+ * not do so. Note that a new InputStream object must be
+ * returned each time this method is called, and the stream must be
+ * positioned at the beginning of the data.
+ *
+ * @return an InputStream
+ */
+public InputStream getInputStream() throws IOException;</programlisting>
+ <para>
+ A common mistake is to implement the data source in a way that
makes
+ <methodname>getInputStream</methodname>
<quote>destructive</quote>. Consider
+ the implementation shown in <xref
linkend="InputStreamDataSource"/><footnote><para>The example
+ shown is actually a simplified version of code that is
+ <ulink
url="http://svn.apache.org/repos/asf/webservices/axis2/tags/java/v1.5/modules/kernel/src/org/apache/axis2/builder/unknowncontent/InputStreamDataSource.java">part
of Axis2 1.5</ulink>.</para></footnote>.
+ It is clear that this data source can only be read once and
that any subsequent call to
+ <methodname>getInputStream</methodname> will return an already
closed input stream.
+ </para>
+ <example id="InputStreamDataSource">
+ <title><classname>DataSource</classname> implementation that
violates the interface contract</title>
+<programlisting>public class InputStreamDataSource implements DataSource {
+ private final InputStream is;
+
+ public InputStreamDataSource(InputStream is) {
+ this.is = is;
+ }
+
+ public String getContentType() {
+ return "application/octet-stream";
+ }
+
+ public InputStream getInputStream() throws IOException {
+ return is;
+ }
+
+ public String getName() {
+ return null;
+ }
+
+ public OutputStream getOutputStream() throws IOException {
+ throw new UnsupportedOperationException();
+ }
+}</programlisting>
+ </example>
+ <para>
+ What makes this mistake so vicious is that very likely it will
not cause
+ problems immediately. The reason is that Axiom is optimized to
read the data
+ only when necessary, which in most cases means only once!
However, in some cases
+ it is unavoidable to read the data several times. When that
happens, the broken
+ <classname>DataSource</classname> implementation will cause
problems that may
+ be extremely hard to debug.
+ </para>
+ <para>
+ Imagine for example<footnote><para>For another example, see
+ <ulink
url="http://markmail.org/thread/omx7umk5fnpb6dnc"/>.</para></footnote>
+ that the implementation shown above is used to produce an
+ MTOM message. At first this will work without any problems
because the data
+ source is read only once when serializing the message. If
later on the MTOM
+ threshold feature is enabled, the broken implementation will
(in the worst case)
+ cause the corresponding MIME parts to be empty or (in the best
case) trigger an
+ I/O error because Axiom attempts to read from an already
closed stream.
+ The reason for this is that when an MTOM threshold is set,
Axiom reads the data
+ source twice: once to determine if its size exceeds the
+ threshold<footnote><para>To do this, Axiom doesn't read the
entire data source,
+ but only reads up to the threshold.</para></footnote> and once
during
+ serialization of the message.
+ </para>
+ </section>
+ <section>
+ <title>Issues that <quote>magically</quote> disappear</title>
+ <para>
+ Quite frequently users post messages on the Axiom related
mailing lists about
+ issues that seem to disappear by <quote>magic</quote> when
they try to debug
+ them. The reason why this can happen is simple. As explained
earlier, Axiom uses
+ deferred building, but at the same time does its best to hide
that from the user,
+ so that he doesn't need to worry about whether the object
model has already been
+ built or not. On the other hand, when serializing the object
model to XML or when
+ requesting a pull parser
(<classname>XMLStreamReader</classname>) from a node,
+ the code paths taken may be radically different depending on
whether or not
+ the corresponding part of the tree has already been built.
This is especially
+ true when caching is disabled.
+ </para>
+ <para>
+ While the end result should be the same in all cases, it is
also clear that
+ in some circumstances an issue that occurs with an
incompletely built tree may
+ disappear if there is something that causes Axiom to build the
rest of the object
+ model. What is important to understand is that the
<quote>something</quote> may
+ be as trivial as a call to the
<methodname>toString</methodname> method of an
+ <classname>OMNode</classname>! The fact that adding
+ <methodname>System.out.println</methodname> statements or
logging instructions
+ is a common debugging technique then explains why issues
sometimes seem to
+ <quote>magically</quote> disappear during debugging.
+ </para>
+ <para>
+ Finally, it should be noted that inspecting an
<classname>OMNode</classname>
+ in a debugger also causes a call to the
<methodname>toString</methodname>
+ method on that object. This means that by just clicking on
something in the
+ <quote>Variables</quote> window of your debugger, you may
completely change the
+ state of the process that is being debugged!
+ </para>
+ </section>
+ <section>
+ <title>The OM-inside-OMDataSource anti-pattern</title>
+ <section>
+ <title>Weak version</title>
+ <para>
+ <classname>OMDataSource</classname> objects are used in
conjunction with
+ <classname>OMSourcedElement</classname> to build Axiom
object model instances
+ that contain information items that are represented using
a framework or API
+ other than Axiom. Wrapping this <quote>foreign</quote>
data in an
+ <classname>OMDataSource</classname> and adding it to the
Axiom object model
+ using an <classname>OMSourcedElement</classname> in most
cases avoids the
+ conversion of the data to the <quote>native</quote> Axiom
object
+ model<footnote><para>An exception is when code tries to
access the children
+ of the <classname>OMSourcedElement</classname>. In this
case, the
+ <classname>OMSourcedElement</classname> will be
<firstterm>expanded</firstterm>,
+ i.e. the data will be converted to the native Axiom object
model.</para></footnote>.
+ The <classname>OMDataSource</classname> contract requires
the implementation
+ to support two different ways of providing the data, both
relying on StAX:
+ </para>
+ <itemizedlist>
+ <listitem>
+ <para>
+ The implementation must be able to provide a pull
parser
+ (<classname>XMLStreamReader</classname>) from
which the infoset can be
+ read.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ The data source must be able to serialize the
infoset to an
+ <classname>XMLStreamWriter</classname> (push).
+ </para>
+ </listitem>
+ </itemizedlist>
+ <para>
+ For the consumer of an event based representation of an
XML infoset, it is in
+ general easier to work in pull mode. That is the reason
why StAX has gained
+ popularity over push based approaches such as SAX. On the
other hand for a producer
+ such as an <classname>OMDataSource</classname>
implementation, it's exactly the
+ other way round: it is far easier to serialize an infoset
to an
+ <classname>XMLStreamWriter</classname> (push) than to
build an
+ <classname>XMLStreamReader</classname> from which a
consumer can read (pull) events.
+ </para>
+ <para>
+ Experience indeed shows that the most challenging part in
creating an
+ <classname>OMDataSource</classname> implementation is to
write the
+ <methodname>getReader</methodname> method. To avoid that
difficulty some
+ implementations simply build an Axiom tree and return the
+ <classname>XMLStreamReader</classname> provided by
+ <methodname>OMElement#getXMLStreamReader()</methodname>.
For example, some ADB
+ (Axis2 Data Binding) versions use the following
code<footnote><para>For the complete
+ code, see <ulink
url="http://svn.apache.org/repos/asf/webservices/axis2/tags/java/v1.5/modules/adb/src/org/apache/axis2/databinding/ADBDataSource.java"/>.</para></footnote>:
+ </para>
+ <example id="adb-getReader">
+ <title><methodname>OMDataSource#getReader()</methodname>
implementation used by ADB</title>
+<programlisting>public XMLStreamReader getReader() throws XMLStreamException {
+ MTOMAwareOMBuilder mtomAwareOMBuilder = new MTOMAwareOMBuilder();
+ serialize(mtomAwareOMBuilder);
+ return mtomAwareOMBuilder.getOMElement().getXMLStreamReader();
+}</programlisting>
+ </example>
+ <para>
+ The <classname>MTOMAwareOMBuilder</classname> class
referenced by this code is a special
+ implementation of <classname>XMLStreamWriter</classname>
that builds an Axiom tree from the
+ sequence of events send to it. The code than uses this
Axiom tree to get the
+ <classname>XMLStreamReader</classname> implementation.
While this is a functionally correct
+ implementation of the <methodname>getReader</methodname>
method, it is not a good
+ solution from a performance perspective and also
contradicts some of the ideas on
+ which Axiom is based, namely that the object model should
only be built when necessary.
+ </para>
+ <para>
+ Indeed, it should not be necessary to build an
intermediary tree when requesting a pull
+ parser from the <classname>OMDataSource</classname>
because all the required information
+ is already present in the ADB beans. Worse, if the
<classname>OMSourcedElement</classname>
+ is expanded, the object model instance will be built
twice: once by the
+ <methodname>getReader</methodname> and once by Axiom
itself!
+ </para>
+ <para>
+ While constructing an Axiom tree inside the
<methodname>getReader</methodname> method is clearly
+ an anti-pattern, at least in the case of ADB it is not as
bad as it seems at first glance.
+ The reason is that in the case which is the most relevant
for performance
+ (which is sending a Web Service response prepared using
ADB), Axiom will only invoke
+ the <methodname>serialize</methodname> method and not make
use of
+ <methodname>getReader</methodname>.
+ </para>
+ <note>
+ <para>
+ At the time of writing there is no general solution
available to avoid the
+ weak version of the OM-inside-OMDataSource
anti-pattern in cases where it would be far
+ too difficult to build a proper
<classname>XMLStreamReader</classname>
+ implementation. Future versions of Axiom may implement
a solution that
+ avoids the complexity of implementing
<classname>XMLStreamReader</classname>
+ without too much performance trade-offs.
+ </para>
+ </note>
+ </section>
+ <section>
+ <title>Strong version</title>
+ <para>
+ There is also a stronger version of the anti-pattern which
consists in
+ implementing the <methodname>serialize</methodname> method
by building an Axiom tree
+ and then serializing the tree to the
<classname>XMLStreamWriter</classname>.
+ Except for very special cases, there is <emphasis
role="strong">no valid reason
+ whatsoever</emphasis> to do this! To see why this is so,
consider the two
+ possible cases:
+ </para>
+ <orderedlist>
+ <listitem>
+ <para>
+ The <classname>OMDataSource</classname> already
implements the
+ <methodname>getReader</methodname> method in a
proper way, i.e. without
+ building an intermediary Axiom tree. To properly
implement
+ <methodname>serialize</methodname>, it is then
sufficient
+ to pull the events from the reader returned by a
call to
+ <methodname>getReader</methodname> and copy them
to the
+ <classname>XMLStreamReader</classname>. The
easiest and most efficient
+ way to do this is using
<classname>StreamingOMSerializer</classname>:
+ </para>
+ <example id="OMDataSource-serialize">
+ <title>Proper implementation of the
<methodname>OMDataSource#serialize</methodname> method</title>
+<programlisting>public void serialize(XMLStreamWriter xmlWriter) throws
XMLStreamException {
+ StreamingOMSerializer serializer = new StreamingOMSerializer();
+ serializer.serialize(getReader(), xmlWriter);
+}</programlisting>
+ </example>
+ <para>
+ There is thus no need to build an intermediary
object model in this case.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ The <methodname>getReader</methodname> method also
uses an intermediary
+ Axiom tree<footnote><para>See e.g.
+ <ulink
url="http://svn.apache.org/repos/asf/webservices/axis2/tags/java/v1.5/modules/kernel/src/org/apache/axis2/builder/unknowncontent/UnknownContentOMDataSource.java"/>.</para></footnote>.
+ In that case it doesn't make sense to use an
<classname>OMSourcedElement</classname>
+ in the first place! At least it doesn't make sense
if one assumes that
+ in general the
<classname>OMSourcedElement</classname> will either be
+ serialized or its content accessed after being
added to the tree. Indeed,
+ in this case the Axiom tree will be built at least
once (if not multiple times),
+ so that the code might as well use a normal
<classname>OMElement</classname>.
+ </para>
+ <para>
+ This only leaves the very special case where the
<classname>OMSourcedElement</classname>
+ is in general neither accessed nor serialized,
either because it will usually be somehow
+ discarded or because the code uses
<methodname>OMDataSourceExt#getObject()</methodname>
+ to retrieve the raw data. Even in that case one
can argue that in general
+ it should not be too hard to implement at least
the <methodname>serialize</methodname>
+ method properly by transforming the raw or foreign
data directly to StAX events written to the
+ <classname>XMLStreamWriter</classname>.
+ </para>
+ <note>
+ <para>
+ Implementing the
<methodname>serialize</methodname> method to serialize
+ directly to an
<classname>XMLStreamWriter</classname>
+ instead of using an intermediary Axiom tree of
course still leaves the question about
+ the <methodname>getReader</methodname> method
open.
+ Since we are assuming that implementing
<methodname>getReader</methodname>
+ properly would be too complex (otherwise one
could use the code shown in
+ <xref linkend="OMDataSource-serialize"/> to
avoid the
+ OM-inside-OMDataSource anti-pattern entirely),
one is forced to
+ use the code shown in <xref
linkend="adb-getReader"/> (and thus the weaker version of
+ the anti-pattern). However this code depends
on the <classname>MTOMAwareOMBuilder</classname>
+ class which is part of
<literal>axis2-adb</literal>. In some cases, depending on
+ that library may not be an option. Therefore
this class should probably
+ be moved to Axiom.
+ </para>
+ </note>
+ </listitem>
+ </orderedlist>
+ <para>
+ QED
+ </para>
+ </section>
+ </section>
+ </chapter>
+
<chapter id="appendix">
<title>Appendix</title>
<section>