Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Xerces Wiki" for change 
notification.

The "ishanjayawardena/scd_proposal" page has been changed by ishanjayawardena.
http://wiki.apache.org/xerces/ishanjayawardena/scd_proposal?action=diff&rev1=3&rev2=4

--------------------------------------------------

  || '''Time zone''' || '''UTC+5:30 (Sri Lanka)''' ||
  
  == Abstract ==
- Apache Xerces2 is a high-performance, standard complaint processor written in 
Java for parsing, validating, serializing and manipulating XML documents. At 
present, it implements a collection of standard APIs for XML processing and its 
development is in progress to achieve the latest W3C XML Schema 1.1 
specification support. The objective of this project is to implement a parser 
and an evaluator for schema component designators (SCD) that can be used to 
identify and retrieve XML schema component(s) from the XML schema data model 
used by Xerces. Schema components that are defined in two W3C recommendations; 
''XML Schema Part 1: structures''[[#1 | [1]]] and ''XML Schema part 2: Data 
types''[[#2|[2]]] act as the building blocks of an XML schema document.
+ Apache Xerces2 is a high-performance, standard complaint processor written in 
Java for parsing, validating, serializing and manipulating XML documents. At 
present, it implements a collection of standard APIs for XML processing and its 
development is in progress to achieve the latest W3C XML Schema 1.1 
specification support. The objective of this project is to implement a parser 
and an evaluator for schema component designators (SCD) that can be used to 
identify and retrieve XML schema component(s) from the XML schema data model 
used by Xerces. Schema components that are defined in two W3C recommendations; 
''XML Schema Part 1: structures''[[#1 | [1] ]] and ''XML Schema part 2: Data 
types''[[#2|[2] ]] act as the building blocks of an XML schema document.
  == Description ==
- ''W3C XML Schema Definition Language (XSD): Component Designators'' is a 
specification that reached W3C candidate recommendation in January 
2010[[#3|[3]]] with W3C inviting the community to start implementing 
it[[#4|[4]]]. The main advantage SCD provides for the programmers is making it 
easier to navigate an XML schema object model more efficiently by reducing the 
amount of code that they have to write to retrieve a set of specific schema 
components. This is achieved by using a path expression similar to an XPath 
expression. The W3C SCD specification defines two basic types of 
SCDs[[#5|[5]]],<<BR>>
+ ''W3C XML Schema Definition Language (XSD): Component Designators'' is a 
specification that reached W3C candidate recommendation in January 2010[[#3|[3] 
]] with W3C inviting the community to start implementing it[[#4|[4] ]]. The 
main advantage SCD provides for the programmers is making it easier to navigate 
an XML schema object model more efficiently by reducing the amount of code that 
they have to write to retrieve a set of specific schema components. This is 
achieved by using a path expression similar to an XPath expression. The W3C SCD 
specification defines two basic types of SCDs[[#5|[5] ]],<<BR>>
-  1. Absolute SCDs (ASCD): An ASCD identifies a particular schema component; 
it consists of two parts: a designator for the assembled schema (a schema 
designator), and a designator for a particular schema component or schema 
components relative to that assembled schema (a relative schema component 
designator). Syntactically, an ASCD consists of a URI without a fragment 
identifier part which identifies the schema and an XPointer fragment identifier 
which encapsulates a schema component path (SCP)[[#6|[6]]] to designate a set 
of components in the context of that schema.<<BR>>
+  1. Absolute SCDs (ASCD): An ASCD identifies a particular schema component; 
it consists of two parts: a designator for the assembled schema (a schema 
designator), and a designator for a particular schema component or schema 
components relative to that assembled schema (a relative schema component 
designator). Syntactically, an ASCD consists of a URI without a fragment 
identifier part which identifies the schema and an XPointer fragment identifier 
which encapsulates a schema component path (SCP)[[#6|[6] ]] to designate a set 
of components in the context of that schema.<<BR>>
  
   2. Relative SCDs (RSCD): An RSCD identifies a particular schema component 
relative to some current assembled schema; it is expressed as an XPointer 
scheme xscd() that uses a schema component path as the scheme data. This 
XPointer scheme may be used in combination with the XPointer xmlns() scheme.
  For instance, consider the following ASCD,<<BR>>
  {{{http://example.org/schemas/po.xsd#xscd(/type::purchaseOrderType)}}}<<BR>>
  In here, the URI {{{http://example.org/schemas/po.xsd}}} refers to an 
assembled schema and the XPointer fragment (which is an RSCD) 
{{{xscd(/type::purchaseOrderType)}}} refers to a particular schema component by 
using the SCP {{{/type::purchaseOrderType}}}. Following is an ASCD with a 
namespace binding,<<BR>>
  
{{{http://example.org/schemas/po.xsd#xmlns(p=http://example.com/schema/po)xscd(/type::p:USAddress)}}}<<BR>>
- In here, {{{xmlns(p=http://example.com/schema/po)xscd(/type::p:USAddress)}}} 
represents an RSCD. The W3C SCD specification consists of a more comprehensive 
set of examples[[#7|[7]]][[#8|[8]]][[#9|[9]]] that illustrates a number of 
possible usages and types of SCDs/SCPs.<<BR>>
+ In here, {{{xmlns(p=http://example.com/schema/po)xscd(/type::p:USAddress)}}} 
represents an RSCD. The W3C SCD specification consists of a more comprehensive 
set of examples[[#7|[7] ]][[#8|[8] ]][[#9|[9] ]] that illustrates a number of 
possible usages and types of SCDs/SCPs.<<BR>>
  
- Please note that the term ''assembled schema'' (or ''schema'' or ''the schema 
description schema component'') refers to a resulting logical namespace that is 
generated by the combination of one or more such schemas and these schemas may 
be physically represented as schema documents. In Xerces, the schema 
description schema component(i.e. the XML schema object model) is represented 
by the XSModel[[#10|[10]]] interface and the schema components are represented 
by the org.apache.xerces.xs interfaces.<<BR>>
+ Please note that the term ''assembled schema'' (or ''schema'' or ''the schema 
description schema component'') refers to a resulting logical namespace that is 
generated by the combination of one or more such schemas and these schemas may 
be physically represented as schema documents. In Xerces, the schema 
description schema component(i.e. the XML schema object model) is represented 
by the XSModel[[#10|[10] ]] interface and the schema components are represented 
by the org.apache.xerces.xs interfaces.<<BR>>
  
  In this project, I am focusing only on implementing the RSCD support for 
Xerces because according to the feedback I received from the Xerces community, 
it will often be more difficult and less useful to work with ASCDs since there 
is no defined way to resolve a URI to a schema. For example, consider the first 
ASCD that I have mentioned. It is obvious that if we try to evaluate that ASCD 
against the schema in that URI, first we have to resolve that URI and build a 
schema from it, which is not possible with Xerces. Even the W3C specification 
does not specify any convention to build a schema by referring to such URI. 
Therefore it will be much more appropriate to implement only the RSCD support 
for Xerces.<<BR>>
  
@@ -32, +32 @@

  
   1. SCP is the main component in any ASCD or RSCD(but we are only interested 
in RSCDs)<<BR>>
  
-  2. SCPs have many usages; according to the W3C specification, SCPs can be 
used in contexts other than SCDs as long as proper namespace bindings are 
provided [[#11|[11]]]. For instance, we could use an SCP inside an XML element 
by properly binding namespaces<<BR>>
+  2. SCPs have many usages; according to the W3C specification, SCPs can be 
used in contexts other than SCDs as long as proper namespace bindings are 
provided [[#11|[11] ]]. For instance, we could use an SCP inside an XML element 
by properly binding namespaces<<BR>>
  
-  3. Another useful type of SCPs is the incomplete SCPs[[#12|[12]]]. An 
incomplete SCP can be evaluated against a given schema component to retrieve a 
set of schema components within it(i.e. similar to the way an RSCD is evaluated 
relative to a given schema, an incomplete SCP can be evaluated relative to a 
given schema component)<<BR>>
+  3. Another useful type of SCPs is the incomplete SCPs[[#12|[12] ]]. An 
incomplete SCP can be evaluated against a given schema component to retrieve a 
set of schema components within it(i.e. similar to the way an RSCD is evaluated 
relative to a given schema, an incomplete SCP can be evaluated relative to a 
given schema component)<<BR>>
  
- Therefore, it is highly desirable to come up with a more loosely coupled 
design in which SCP resolving capability is provided in a separate interface to 
serve potential requirements as well as to improve overall extendability and 
modularity. Following are the two primary operations that would reflect the 
RSCD implementation, and that would yield a number of SCD use 
cases[[#13|[13]]],<<BR>>
+ Therefore, it is highly desirable to come up with a more loosely coupled 
design in which SCP resolving capability is provided in a separate interface to 
serve potential requirements as well as to improve overall extendability and 
modularity. Following are the two primary operations that would reflect the 
RSCD implementation, and that would yield a number of SCD use cases[[#13|[13] 
]],<<BR>>
  
   1. to resolve a relative SCD. i.e. given a schema and an RSCD as the inputs, 
return a list of schema components.<<BR>>
  
-  2. to obtain the canonical SCP[[#14|[14]]] of a schema component (if 
available). i.e. given a schema component and the schema that contains the 
component along with the necessary namespace bindings as the inputs, return the 
canonical SCP<<BR>>
+  2. to obtain the canonical SCP[[#14|[14] ]] of a schema component (if 
available). i.e. given a schema component and the schema that contains the 
component along with the necessary namespace bindings as the inputs, return the 
canonical SCP<<BR>>
  
  Based on these two operations and the incomplete SCP resolving capability, we 
can suggest following essential operations for the SCP interface.<<BR>>
  
@@ -56, +56 @@

  
  At the initial stage, the parser and the evaluator is implemented to support 
only XML schema 1.0 object model and the system would be easy to extend due to 
the loosely coupled nature of its design, to support XML schema 1.1 object 
model as well. As I believe, speed and efficiency are the two most critical 
factors that must be met to a higher possible degree because the introduction 
of this new feature must not degrade the existing performance of Xerces under 
any circumstances. However, initially more attention is given to design a solid 
API and to come up with a more modular, extendable and a loosely coupled design 
as I mentioned earlier.<<BR>>
  
- The parser can be generated with an automatic code generation tool similar to 
JavaCC and, to write the evaluator, a good understanding of the XML Schema 
API[[#15|[15]]] and an understanding about how to navigate and XSModel is 
required. The SCD W3C specification defines the EBNF syntax for both 
SCD[[#16|[16]]] and SCP[[#17|[17]]] which can be used in the generation of the 
parser. However, it does not suggest any semantics for evaluating such 
expressions.<<BR>>
+ The parser can be generated with an automatic code generation tool similar to 
JavaCC and, to write the evaluator, a good understanding of the XML Schema 
API[[#15|[15] ]] and an understanding about how to navigate and XSModel is 
required. The SCD W3C specification defines the EBNF syntax for both 
SCD[[#16|[16] ]] and SCP[[#17|[17] ]] which can be used in the generation of 
the parser. However, it does not suggest any semantics for evaluating such 
expressions.<<BR>>
  == Deliverables ==
   1. Source code and necessary build files for the SCD parser and 
evaluator<<BR>>
   2. Required patches if any<<BR>>
   3. A collection of tests that can be used to verify the functionality of the 
SCD parser and evaluator<<BR>>
   4. SCD API Documentation<<BR>>
  == Things I have done so far ==
- I checked out and built the Xerces trunk and then I tried out some samples 
and tests and started to study the code, specially, the coding standards and 
styles that have been used and the package structure etc. Because I am not very 
much familiar with Java tools like annotations, packaging, unit tests and 
documentation generation, etc., I also started to learn them and I looked at 
existing issues of Xerces related to XML Schema API and searched if there are 
issues related to SCD in JIRA. I could also find couple of existing 
implementations of SCD[[#18|[18]]][[#19|[19]]] and they will also be considered 
in designing the API where necessary. I spent most of my time to research on 
SCD, specially to trying to understand the W3C SCD specification and setting up 
measurable goals for my project, and to study various related technologies to 
SCD.<<BR>>
+ I checked out and built the Xerces trunk and then I tried out some samples 
and tests and started to study the code, specially, the coding standards and 
styles that have been used and the package structure etc. Because I am not very 
much familiar with Java tools like annotations, packaging, unit tests and 
documentation generation, etc., I also started to learn them and I looked at 
existing issues of Xerces related to XML Schema API and searched if there are 
issues related to SCD in JIRA. I could also find couple of existing 
implementations of SCD[[#18|[18] ]][[#19|[19] ]] and they will also be 
considered in designing the API where necessary. I spent most of my time to 
research on SCD, specially to trying to understand the W3C SCD specification 
and setting up measurable goals for my project, and to study various related 
technologies to SCD.<<BR>>
  
  == Development Schedule ==
  I am planning to learn most of the required programming skills while doing 
the development. But initially (i.e. during the community bonding period) I 
will learn advanced Java skills and the required knowledge of XML schema and 
XML schema API since they are essential to start designing the components and 
to begin coding. I will dedicate the complete four-month period starting from 
April and lasting until the end of August for this project and I could work 
between forty to fifty hours per week.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to