Dear Wiki user, You have subscribed to a wiki page or wiki category on "Xerces Wiki" for change notification.
The "ishanjayawardena/scd_proposal" page has been changed by ishanjayawardena. http://wiki.apache.org/xerces/ishanjayawardena/scd_proposal?action=diff&rev1=3&rev2=4 -------------------------------------------------- || '''Time zone''' || '''UTC+5:30 (Sri Lanka)''' || == Abstract == - Apache Xerces2 is a high-performance, standard complaint processor written in Java for parsing, validating, serializing and manipulating XML documents. At present, it implements a collection of standard APIs for XML processing and its development is in progress to achieve the latest W3C XML Schema 1.1 specification support. The objective of this project is to implement a parser and an evaluator for schema component designators (SCD) that can be used to identify and retrieve XML schema component(s) from the XML schema data model used by Xerces. Schema components that are defined in two W3C recommendations; ''XML Schema Part 1: structures''[[#1 | [1]]] and ''XML Schema part 2: Data types''[[#2|[2]]] act as the building blocks of an XML schema document. + Apache Xerces2 is a high-performance, standard complaint processor written in Java for parsing, validating, serializing and manipulating XML documents. At present, it implements a collection of standard APIs for XML processing and its development is in progress to achieve the latest W3C XML Schema 1.1 specification support. The objective of this project is to implement a parser and an evaluator for schema component designators (SCD) that can be used to identify and retrieve XML schema component(s) from the XML schema data model used by Xerces. Schema components that are defined in two W3C recommendations; ''XML Schema Part 1: structures''[[#1 | [1] ]] and ''XML Schema part 2: Data types''[[#2|[2] ]] act as the building blocks of an XML schema document. == Description == - ''W3C XML Schema Definition Language (XSD): Component Designators'' is a specification that reached W3C candidate recommendation in January 2010[[#3|[3]]] with W3C inviting the community to start implementing it[[#4|[4]]]. The main advantage SCD provides for the programmers is making it easier to navigate an XML schema object model more efficiently by reducing the amount of code that they have to write to retrieve a set of specific schema components. This is achieved by using a path expression similar to an XPath expression. The W3C SCD specification defines two basic types of SCDs[[#5|[5]]],<<BR>> + ''W3C XML Schema Definition Language (XSD): Component Designators'' is a specification that reached W3C candidate recommendation in January 2010[[#3|[3] ]] with W3C inviting the community to start implementing it[[#4|[4] ]]. The main advantage SCD provides for the programmers is making it easier to navigate an XML schema object model more efficiently by reducing the amount of code that they have to write to retrieve a set of specific schema components. This is achieved by using a path expression similar to an XPath expression. The W3C SCD specification defines two basic types of SCDs[[#5|[5] ]],<<BR>> - 1. Absolute SCDs (ASCD): An ASCD identifies a particular schema component; it consists of two parts: a designator for the assembled schema (a schema designator), and a designator for a particular schema component or schema components relative to that assembled schema (a relative schema component designator). Syntactically, an ASCD consists of a URI without a fragment identifier part which identifies the schema and an XPointer fragment identifier which encapsulates a schema component path (SCP)[[#6|[6]]] to designate a set of components in the context of that schema.<<BR>> + 1. Absolute SCDs (ASCD): An ASCD identifies a particular schema component; it consists of two parts: a designator for the assembled schema (a schema designator), and a designator for a particular schema component or schema components relative to that assembled schema (a relative schema component designator). Syntactically, an ASCD consists of a URI without a fragment identifier part which identifies the schema and an XPointer fragment identifier which encapsulates a schema component path (SCP)[[#6|[6] ]] to designate a set of components in the context of that schema.<<BR>> 2. Relative SCDs (RSCD): An RSCD identifies a particular schema component relative to some current assembled schema; it is expressed as an XPointer scheme xscd() that uses a schema component path as the scheme data. This XPointer scheme may be used in combination with the XPointer xmlns() scheme. For instance, consider the following ASCD,<<BR>> {{{http://example.org/schemas/po.xsd#xscd(/type::purchaseOrderType)}}}<<BR>> In here, the URI {{{http://example.org/schemas/po.xsd}}} refers to an assembled schema and the XPointer fragment (which is an RSCD) {{{xscd(/type::purchaseOrderType)}}} refers to a particular schema component by using the SCP {{{/type::purchaseOrderType}}}. Following is an ASCD with a namespace binding,<<BR>> {{{http://example.org/schemas/po.xsd#xmlns(p=http://example.com/schema/po)xscd(/type::p:USAddress)}}}<<BR>> - In here, {{{xmlns(p=http://example.com/schema/po)xscd(/type::p:USAddress)}}} represents an RSCD. The W3C SCD specification consists of a more comprehensive set of examples[[#7|[7]]][[#8|[8]]][[#9|[9]]] that illustrates a number of possible usages and types of SCDs/SCPs.<<BR>> + In here, {{{xmlns(p=http://example.com/schema/po)xscd(/type::p:USAddress)}}} represents an RSCD. The W3C SCD specification consists of a more comprehensive set of examples[[#7|[7] ]][[#8|[8] ]][[#9|[9] ]] that illustrates a number of possible usages and types of SCDs/SCPs.<<BR>> - Please note that the term ''assembled schema'' (or ''schema'' or ''the schema description schema component'') refers to a resulting logical namespace that is generated by the combination of one or more such schemas and these schemas may be physically represented as schema documents. In Xerces, the schema description schema component(i.e. the XML schema object model) is represented by the XSModel[[#10|[10]]] interface and the schema components are represented by the org.apache.xerces.xs interfaces.<<BR>> + Please note that the term ''assembled schema'' (or ''schema'' or ''the schema description schema component'') refers to a resulting logical namespace that is generated by the combination of one or more such schemas and these schemas may be physically represented as schema documents. In Xerces, the schema description schema component(i.e. the XML schema object model) is represented by the XSModel[[#10|[10] ]] interface and the schema components are represented by the org.apache.xerces.xs interfaces.<<BR>> In this project, I am focusing only on implementing the RSCD support for Xerces because according to the feedback I received from the Xerces community, it will often be more difficult and less useful to work with ASCDs since there is no defined way to resolve a URI to a schema. For example, consider the first ASCD that I have mentioned. It is obvious that if we try to evaluate that ASCD against the schema in that URI, first we have to resolve that URI and build a schema from it, which is not possible with Xerces. Even the W3C specification does not specify any convention to build a schema by referring to such URI. Therefore it will be much more appropriate to implement only the RSCD support for Xerces.<<BR>> @@ -32, +32 @@ 1. SCP is the main component in any ASCD or RSCD(but we are only interested in RSCDs)<<BR>> - 2. SCPs have many usages; according to the W3C specification, SCPs can be used in contexts other than SCDs as long as proper namespace bindings are provided [[#11|[11]]]. For instance, we could use an SCP inside an XML element by properly binding namespaces<<BR>> + 2. SCPs have many usages; according to the W3C specification, SCPs can be used in contexts other than SCDs as long as proper namespace bindings are provided [[#11|[11] ]]. For instance, we could use an SCP inside an XML element by properly binding namespaces<<BR>> - 3. Another useful type of SCPs is the incomplete SCPs[[#12|[12]]]. An incomplete SCP can be evaluated against a given schema component to retrieve a set of schema components within it(i.e. similar to the way an RSCD is evaluated relative to a given schema, an incomplete SCP can be evaluated relative to a given schema component)<<BR>> + 3. Another useful type of SCPs is the incomplete SCPs[[#12|[12] ]]. An incomplete SCP can be evaluated against a given schema component to retrieve a set of schema components within it(i.e. similar to the way an RSCD is evaluated relative to a given schema, an incomplete SCP can be evaluated relative to a given schema component)<<BR>> - Therefore, it is highly desirable to come up with a more loosely coupled design in which SCP resolving capability is provided in a separate interface to serve potential requirements as well as to improve overall extendability and modularity. Following are the two primary operations that would reflect the RSCD implementation, and that would yield a number of SCD use cases[[#13|[13]]],<<BR>> + Therefore, it is highly desirable to come up with a more loosely coupled design in which SCP resolving capability is provided in a separate interface to serve potential requirements as well as to improve overall extendability and modularity. Following are the two primary operations that would reflect the RSCD implementation, and that would yield a number of SCD use cases[[#13|[13] ]],<<BR>> 1. to resolve a relative SCD. i.e. given a schema and an RSCD as the inputs, return a list of schema components.<<BR>> - 2. to obtain the canonical SCP[[#14|[14]]] of a schema component (if available). i.e. given a schema component and the schema that contains the component along with the necessary namespace bindings as the inputs, return the canonical SCP<<BR>> + 2. to obtain the canonical SCP[[#14|[14] ]] of a schema component (if available). i.e. given a schema component and the schema that contains the component along with the necessary namespace bindings as the inputs, return the canonical SCP<<BR>> Based on these two operations and the incomplete SCP resolving capability, we can suggest following essential operations for the SCP interface.<<BR>> @@ -56, +56 @@ At the initial stage, the parser and the evaluator is implemented to support only XML schema 1.0 object model and the system would be easy to extend due to the loosely coupled nature of its design, to support XML schema 1.1 object model as well. As I believe, speed and efficiency are the two most critical factors that must be met to a higher possible degree because the introduction of this new feature must not degrade the existing performance of Xerces under any circumstances. However, initially more attention is given to design a solid API and to come up with a more modular, extendable and a loosely coupled design as I mentioned earlier.<<BR>> - The parser can be generated with an automatic code generation tool similar to JavaCC and, to write the evaluator, a good understanding of the XML Schema API[[#15|[15]]] and an understanding about how to navigate and XSModel is required. The SCD W3C specification defines the EBNF syntax for both SCD[[#16|[16]]] and SCP[[#17|[17]]] which can be used in the generation of the parser. However, it does not suggest any semantics for evaluating such expressions.<<BR>> + The parser can be generated with an automatic code generation tool similar to JavaCC and, to write the evaluator, a good understanding of the XML Schema API[[#15|[15] ]] and an understanding about how to navigate and XSModel is required. The SCD W3C specification defines the EBNF syntax for both SCD[[#16|[16] ]] and SCP[[#17|[17] ]] which can be used in the generation of the parser. However, it does not suggest any semantics for evaluating such expressions.<<BR>> == Deliverables == 1. Source code and necessary build files for the SCD parser and evaluator<<BR>> 2. Required patches if any<<BR>> 3. A collection of tests that can be used to verify the functionality of the SCD parser and evaluator<<BR>> 4. SCD API Documentation<<BR>> == Things I have done so far == - I checked out and built the Xerces trunk and then I tried out some samples and tests and started to study the code, specially, the coding standards and styles that have been used and the package structure etc. Because I am not very much familiar with Java tools like annotations, packaging, unit tests and documentation generation, etc., I also started to learn them and I looked at existing issues of Xerces related to XML Schema API and searched if there are issues related to SCD in JIRA. I could also find couple of existing implementations of SCD[[#18|[18]]][[#19|[19]]] and they will also be considered in designing the API where necessary. I spent most of my time to research on SCD, specially to trying to understand the W3C SCD specification and setting up measurable goals for my project, and to study various related technologies to SCD.<<BR>> + I checked out and built the Xerces trunk and then I tried out some samples and tests and started to study the code, specially, the coding standards and styles that have been used and the package structure etc. Because I am not very much familiar with Java tools like annotations, packaging, unit tests and documentation generation, etc., I also started to learn them and I looked at existing issues of Xerces related to XML Schema API and searched if there are issues related to SCD in JIRA. I could also find couple of existing implementations of SCD[[#18|[18] ]][[#19|[19] ]] and they will also be considered in designing the API where necessary. I spent most of my time to research on SCD, specially to trying to understand the W3C SCD specification and setting up measurable goals for my project, and to study various related technologies to SCD.<<BR>> == Development Schedule == I am planning to learn most of the required programming skills while doing the development. But initially (i.e. during the community bonding period) I will learn advanced Java skills and the required knowledge of XML schema and XML schema API since they are essential to start designing the components and to begin coding. I will dedicate the complete four-month period starting from April and lasting until the end of August for this project and I could work between forty to fifty hours per week. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
