XML has been extremely popular, its memory binding (Programming Model) is
hard to ignore. Current memory bindings such as JavaBean, Service Data
Objects and Eclipse Modeling Framework, have room to improve efficiency, by
streaming data.


*Many people know DOM is much less efficient than SAX*

Document Object Model (org.w3c.dom) is fully populated before available, it
costs time and space(memory), that's the perspective it's much less
efficient than Simple API for XML (org.xml.sax).

*Current memory bindings are as inefficient as DOM even if SAXed*

Current memory bindings such as JavaBean, SDO and EMF, are also fully
populated before available, even if SAX even StAX is used to populate data
from XML into memory data structure, that's the perspective they're as
inefficient as DOM because of the cost of both time and space(memory).

*While SAX/DOM PUSHes, StAX PULLs which offers an opportunity to load on

Streaming API for XML (javax.xml.stream) works completely opposite direction
against SAX/DOM from driving perspective. While SAX/DOM parser drives the
processing and *PUSH*es data from XML to handlers or directly into memory
data structure, StAX processing is driven by demand and demand *PULL*s data
out of XML. It offers an opportunity to load on demand if memory bindings
themselves drive the StAX processing.

*Loading on demand improves efficiency, a lot*

  - *ZERO* cost scenario
     - Execution Path

           execute  (Order order,Product fromUpStream)
             if( order.paid() )
               toDownStream( fromUpStream);
               toDownStream( fromUpStream);  /*  *"fromUpStream" does
NOT need to be read and parsed at all,*
                                                 *the data can be
DIRECTLY PIPED to down stream,*
                                                 *NEITHER time NOR
space(memory) cost at all*  */

     - Update only

           <complexType name="Product">
               <element name="Property1" type="int"/>
               <element name="Property2" type="float" maxOccurs="unbounded"/>
               <element name="Property100" type="date"/>

     Given that definition and this instance:


     and this code:

           execute  (Product fromUpStream)
             fromUpStream.setProperty100( "2006-06-25");
             toDownStream( fromUpStream);

     *"fromUpStream" does NOT need to be read and parsed at all, the
     data can be PIPED to down stream with
     "<Property100>2006-06-25</Property100>" inserted, NEITHER time NOR
     space(memory) cost at all.*

     A more interesting scenario is, given above same instance and
     this code:

           execute  (Product fromUpStream)
             fromUpStream.setProperty1( "3");
             toDownStream( fromUpStream);

     *"fromUpStream" does NOT need to be read and parsed at all, the
     data can be PIPED to down stream with "1" ignored and replaced with "3",
     NEITHER time NOR space(memory) cost at all.*

     - (Collection) Append only

     Given above definition and this instance:


     and this code:

           execute  (Product fromUpStream)
             fromUpStream.getProperty2().add( 2.2000001);
             toDownStream( fromUpStream);

     *"fromUpStream" does NOT need to be read and parsed at all, the
     data can be PIPED to down stream with "<Property2>2.2000001</Property2>"
     inserted, NEITHER time NOR space(memory) cost at all.*

     - Lower cost scenario

  Many people know XML is string (human readable) based, while memory
  binding is binary. The binding has TWO stages:
     1. READ literal string out of XML
     2. PARSE the literal string to binary
  The parsing costs time more or less, and sometimes space(memory)
  depending on complexity and algorithm.

  Given above definition and this instance:


  and this code:

      execute  (Product fromUpStream)
        fromUpStream.getProperty2().get( 1);
        toDownStream( fromUpStream);

  Since Property2[1] is demanded, the XML instance can be read through
  "<Property2>2.1</Property2>" and the literal string ("2.1") can be
  parsed into memory before returning the binary(float). The literal string ("
  2.1") itself can also be weakly cached to speed up XML exporting if no
  more change to Property2[1].

   - *The rest of "fromUpStream" do NOT need to be read and parsed at
     all, they can be PIPED to down stream, both time and space(memory) are
     spared, simetimes a lot.*

     - Since the XML processing is streaming instead of random
     accessing, the data ahead of Property2[1] are read and the
literal strings
     can be stored, however *parsing is NOT required right away,
     parsing space(memory) if any and time can be spared if NEVER demanded
     Later on whenever Property1 or Property2[0] is ever demanded,
     the stored literal string can then be parsed into memory before returning
     the binary. Then the literal string storage can become a weak
cache to speed
     up XML exporting if no more change to the property. Any more
change to the
     property can invalidate the weak cache to release space(memory)

  *The cached literal strings can spare some time of XML exporting
  without space(memory) sacrifice since references are weak (Java). The stored
  literal strings (of properties whose values are never demanded) can also
  spare some time of XML exporting*, as for space(memory) gain/loss,
  it's case by case since some binaries are less than its literal
  representation while some others are more.
  Property accesses include "isSet" and "unset", besides "get" and
  "set". While "get" demands reading and parsing, "isSet" only needs reading
  and can defer parsing which may never be demanded.

*Streaming Object*

Loading on demand is driven by memory binding, however streaming reading may
reach other data before the demanded one, so the streaming reading
(StreamReader) needs to notify reached literal strings which are not
demanded yet.
Here's the protocol which can be used to communicate:

 interface StreamObject<Type,Property,C>
   Object  get (int propertyID);  //  StreamList
   Type  getType();
   List<Property>  getInstanceProperties();
   C getContainer();  //  StreamObject<Type,Property,?>

   void  set (StreamReader<Type,Property> reader);
   StreamObject<Type,Property,?> createUnlessRead  (int
propertyID,QName typeXSI,Type type);
   void  setUnlessRead (int propertyID,String stringPropertyValue);
   void  setLiteralValue  (int propertyID,QName typeXSI,String value);
   Object  parseLiteralValue  (int propertyID,QName typeXSI,String
value,Type type);

*Streaming List*

Loading on demand is driven by
however StreamReader may reach maxOccurs>1 property value(s) before the
demanded one, so the StreamReader needs to notify reached literal strings
which are not demanded yet.
Here's the protocol which can be used to communicate:

 interface StreamList<Type>
   void  addStreamValue  (Object value);
   void  addLiteralValue (QName typeXSI,String value);
   Object  parseLiteralValue  (QName typeXSI,String value,Type type);

*Modeling Frameworks*

  - Modeling-neutral StreamReader

  There're many Modeling Frameworks, in order for StreamReader to
  support as many of them as possible, here's a Modeling Framework adapter

      interface ModelingFramework<Type,Property>
        Type  type  (Property property);
        boolean many  (Property property);
        Collection  getAliasNames (Property property);
        Class getInstanceClass  (Type type);
        List<Property>  properties  (Type type);

        Property  element (String space,String name);
        Object  getNameSpace  (Property property);
        Object  getLocalName  (Property property);
        enum  PropertyKind
        PropertyKind  kind  (Property property);

        int property  (Type type,List<Property> properties,String
space,String name,boolean element);

        StreamObject<Type,Property,?> create  (String space,String name);
        StreamObject<Type,Property,?> create  (Type type);

  - JavaBean

      class JavaBeans implements  ModelingFramework<Class,PropertyDescriptor>
        public  final*/*many*/* Class type  (PropertyDescriptor property)
          return property.getPropertyType();
        public  boolean many  (PropertyDescriptor property)
          return List.class.isAssignableFrom( type( property));
        public  Collection  getAliasNames (PropertyDescriptor property)
        {*//TODO cache*
          return Collections.singleton( property.getName());
        public  Class getInstanceClass  (Class type)
          return type;
        public  List<PropertyDescriptor>  properties  (Class type)
        {*//TODO cache*
            return Arrays.asList( Introspector.getBeanInfo(
          catch(IntrospectionException e)
          return Collections.EMPTY_LIST;
        public  StreamObject<Class,PropertyDescriptor,?> create  (Class type)
          catch(Exception e)
          return null;

  - Service Data Objects

      class SDO implements  ModelingFramework<Type,Property>
        public  Type  type  (Property property)
          return property.getType();
        public  boolean many  (Property property)
          return property.isMany();
        public  Collection  getAliasNames (Property property)
          return property.getAliasNames();
        public  Class getInstanceClass  (Type type)
          return type.getInstanceClass();
        public  List<Property>  properties  (Type type)
          return type.getProperties();
        public  Property  element (String space,String name)
          return XSDHelper.INSTANCE.getGlobalProperty( space, name, true);
        public  final Object  getNameSpace  (Property property)
          return XSDHelper.INSTANCE.getNamespaceURI( property);
        public  final Object  getLocalName  (Property property)
          return XSDHelper.INSTANCE.getLocalName( property);
        public  PropertyKind  kind  (Property property)
          return XSDHelper.INSTANCE.isElement( property)
               ? PropertyKind.ELEMENT
               : XSDHelper.INSTANCE.isAttribute( property)
               ? PropertyKind.ATTRIBUTE
               : PropertyKind.OTHER;
        public  StreamObject<Type,Property,?>  create  (String
space,String name)
(StreamObject<Type,Property,?>)DataFactory.INSTANCE.create( space,
        public  StreamObject<Type,Property,?>  create  (Type type)
(StreamObject<Type,Property,?>)DataFactory.INSTANCE.create( type);

  - Eclipse Modeling Framework

      class EMF implements  ModelingFramework<EClassifier,EStructuralFeature>
        public  EClassifier type  (EStructuralFeature property)
          return property.getEType();
        public  boolean many  (EStructuralFeature property)
          return property.isMany();
        public  Collection  getAliasNames (EStructuralFeature property)
        {*//TODO cache*
          return Collections.singleton( property.getName());
        public  Class getInstanceClass  (EClassifier type)
          return type.getInstanceClass();
        public  List<EStructuralFeature>  properties  (EClassifier type)
          return ((EClass)type).getEAllStructuralFeatures();
        public  EStructuralFeature  element (String space,String name)
          return ExtendedMetaData.INSTANCE.getElement( space, name);
        public  final Object  getNameSpace  (EStructuralFeature property)
          return ExtendedMetaData.INSTANCE.getNamespace( property);
        public  final Object  getLocalName  (EStructuralFeature property)
          return ExtendedMetaData.INSTANCE.getName( property);
        public  PropertyKind  kind  (EStructuralFeature property)
          switch( ExtendedMetaData.INSTANCE.getFeatureKind( property) )
            case ExtendedMetaData.ELEMENT_FEATURE:
              return PropertyKind.ELEMENT;
            case ExtendedMetaData.ATTRIBUTE_FEATURE:
              return PropertyKind.ATTRIBUTE;
          return PropertyKind.OTHER;
        public  int property  (EClassifier
type,List<EStructuralFeature> properties,String space,String
name,boolean element)
          final EStructuralFeature  property = element
ExtendedMetaData.INSTANCE.getElement( (EClass)type, space, name)
ExtendedMetaData.INSTANCE.getAttribute( (EClass)type, space, name);
          return null == property
               ? -1
               : property.getFeatureID();
        public  StreamObject<EClassifier,EStructuralFeature,?> create
(String space,String name)
space, name);
        public  StreamObject<EClassifier,EStructuralFeature,?> create
(EClassifier type)



  For existed code, if change to support
desired, injection may be utilitized.
  - Code Generation (static object)

  Code can be regenerated, or new code can be generated, to support
  - Dynamic object

  Memory bindings such as Service Data Objects and Eclipse Modeling
  Framework, enable dynamic objects besides the static ones (CodeGen). Their
  implementation can be extended to support
  - Concurrent access

loading on demand, synchronization may be necessary for concurrent
  accesses. And there may be multiple objects loading from one stream, the
  synchronization may need to consider the shared one stream.

You're much more than welcomed to comment.
And if you find it happen to be interesting, I can also post/wiki the
Help will be appreciated very much, especially areas such as code injection,
JavaBean ModelingFramework implementation conforming to JAXB and test cases
demonstrating performance gain by loading on demand.



