Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Xerces Wiki" for change 
notification.

The "ishanjayawardena/scd_proposal" page has been changed by ishanjayawardena.
http://wiki.apache.org/xerces/ishanjayawardena/scd_proposal?action=diff&rev1=1&rev2=2

--------------------------------------------------

  #format wiki
  #language en
  = Google Summer of Code 2010 - Project Proposal =
+ 
+ || '''Project''' || '''Implementing a parser and an evaluator for Schema 
Component Designators''' ||
+ || '''Student Name''' || '''Ishan Jayawardena''' ||
+ || '''Email''' || '''[email protected]''' ||
+ || '''Time zone''' || '''UTC+5:30 (Sri Lanka)''' ||
+ 
  == Abstract ==
- Apache Xerces2 is a high-performance, standard complaint processor written in 
Java for parsing, validating, serializing and manipulating XML documents. At 
present, it implements a collection of standard APIs for XML processing and its 
development is in progress to achieve the latest W3C XML Schema 1.1 
specification support. The objective of this project is to implement a parser 
and an evaluator for schema component designators (SCD) that can be used to 
identify and retrieve XML schema component(s) from the XML schema data model 
used by Xerces. Schema components that are defined in two W3C recommendations; 
XML Schema Part 1: structures[[1|[1]]] and XML Schema part 2: Data 
types[[2|[2]]] act as the building blocks of an XML schema document.
+ Apache Xerces2 is a high-performance, standard complaint processor written in 
Java for parsing, validating, serializing and manipulating XML documents. At 
present, it implements a collection of standard APIs for XML processing and its 
development is in progress to achieve the latest W3C XML Schema 1.1 
specification support. The objective of this project is to implement a parser 
and an evaluator for schema component designators (SCD) that can be used to 
identify and retrieve XML schema component(s) from the XML schema data model 
used by Xerces. Schema components that are defined in two W3C recommendations; 
''XML Schema Part 1: structures''[[1|[1]]] and ''XML Schema part 2: Data 
types''[[2|[2]]] act as the building blocks of an XML schema document.
  == Description ==
- ...
+ ''W3C XML Schema Definition Language (XSD): Component Designators'' is a 
specification that reached W3C candidate recommendation in January 2010[3] with 
W3C inviting the community to start implementing it[4]. The main advantage SCD 
provides for the programmers is making it easier to navigate an XML schema 
object model more efficiently by reducing the amount of code that they have to 
write to retrieve a set of specific schema components. This is achieved by 
using a path expression similar to an XPath expression. The W3C SCD 
specification defines two basic types of SCDs[5],<<BR>>
+  1. Absolute SCDs (ASCD): An ASCD identifies a particular schema component; 
it consists of two parts: a designator for the assembled schema (a schema 
designator), and a designator for a particular schema component or schema 
components relative to that assembled schema (a relative schema component 
designator). Syntactically, an ASCD consists of a URI without a fragment 
identifier part which identifies the schema and an XPointer fragment identifier 
which encapsulates a schema component path (SCP)[6] to designate a set of 
components in the context of that schema.<<BR>>
+ 
+  2. Relative SCDs (RSCD): An RSCD identifies a particular schema component 
relative to some current assembled schema; it is expressed as an XPointer 
scheme xscd() that uses a schema component path as the scheme data. This 
XPointer scheme may be used in combination with the XPointer xmlns() scheme.
+ For instance, consider the following ASCD,
+ 
+ {{{http://example.org/schemas/po.xsd#xscd(/type::purchaseOrderType)}}}<<BR>>
+ 
+ In here, the URI {{{http://example.org/schemas/po.xsd}}} refers to an 
assembled schema and the XPointer fragment (which is an RSCD) 
{{{xscd(/type::purchaseOrderType)}}} refers to a particular schema component by 
using the SCP {{{/type::purchaseOrderType}}}. Following is an ASCD with a 
namespace binding,<<BR>>
+ 
+ 
{{{http://example.org/schemas/po.xsd#xmlns(p=http://example.com/schema/po)xscd(/type::p:USAddress)}}}<<BR>>
+ 
+ In here, {{{xmlns(p=http://example.com/schema/po)xscd(/type::p:USAddress)}}} 
represents an RSCD. The W3C SCD specification consists of a more comprehensive 
set of examples[7][8][9] that illustrates a number of possible usages and types 
of SCDs/SCPs.<<BR>>
+ 
+ Please note that the term ''assembled schema'' (or ''schema'' or ''the schema 
description schema component'') refers to a resulting logical namespace that is 
generated by the combination of one or more such schemas and these schemas may 
be physically represented as schema documents. In Xerces, the schema 
description schema component(i.e. the XML schema object model) is represented 
by the XSModel[10] interface and the schema components are represented by the 
org.apache.xerces.xs interfaces.<<BR>>
+ 
+ In this project, I am focusing only on implementing the RSCD support for 
Xerces because according to the feedback I received from the Xerces community, 
it will often be more difficult and less useful to work with ASCDs since there 
is no defined way to resolve a URI to a schema. For example, consider the first 
ASCD that I have mentioned. It is obvious that if we try to evaluate that ASCD 
against the schema in that URI, first we have to resolve that URI and build a 
schema from it, which is not possible with Xerces. Even the W3C specification 
does not specify any convention to build a schema by referring to such URI. 
Therefore it will be much more appropriate to implement only the RSCD support 
for Xerces.<<BR>>
+ 
+ The ability to resolve (i.e. to parse and evaluate) an RSCD comes from the 
ability to resolve a given SCP relative to a given context (i.e. either 
relative to a schema or to a schema component). Therefore, giving Xerces the 
ability to resolve SCPs (more specifically, non-canonical SCPs) is the main 
objective of this project and the RSCD support is implemented as a feature 
which uses it. There are couple of compelling reasons behind this.<<BR>>
+ 
+  1. SCP is the main component in any ASCD or RSCD(but we are only interested 
in RSCDs)<<BR>>
+ 
+  2. SCPs have many usages; according to the W3C specification, SCPs can be 
used in contexts other than SCDs as long as proper namespace bindings are 
provided [11]. For instance, we could use an SCP inside an XML element by 
properly binding namespaces<<BR>>
+ 
+  3. Another useful type of SCPs is the incomplete SCPs[12]. An incomplete SCP 
can be evaluated against a given schema component to retrieve a set of schema 
components within it(i.e. similar to the way an RSCD is evaluated relative to a 
given schema, an incomplete SCP can be evaluated relative to a given schema 
component)<<BR>>
+ 
+ Therefore, it is highly desirable to come up with a more loosely coupled 
design in which SCP resolving capability is provided in a separate interface to 
serve potential requirements as well as to improve overall extendability and 
modularity. Following are the two primary operations that would reflect the 
RSCD implementation, and that would yield a number of SCD use cases[13],<<BR>>
+ 
+  1. to resolve a relative SCD. i.e. given a schema and an RSCD as the inputs, 
return a list of schema components.<<BR>>
+ 
+  2. to obtain the canonical SCP[14] of a schema component (if available). 
i.e. given a schema component and the schema that contains the component along 
with the necessary namespace bindings as the inputs, return the canonical 
SCP<<BR>>
+ 
+ Based on these two operations and the incomplete SCP resolving capability, we 
can suggest following essential operations for the SCP interface.<<BR>>
+ 
+ {{{XSObjectList resolveSCP(String scp, XSModel schema, NamespaceContext 
nc)}}}<<BR>>
+ {{{XSObjectList resolveSCP(String scp, XSModel schema)}}}<<BR>>
+ {{{XSObjectList resolveIncompleteSCP(String scp, XSObject component, 
NamespaceContext nc)}}}<<BR>>
+ {{{XSObjectList resolveIncompleteSCP(String scp, XSObject component)}}}<<BR>>
+ {{{String getCanonicalSCP(XSObject component, XSModel schema, 
NamespaceContext nc)}}}<<BR>>
+ 
+ After considering time constraints applied on the project and the need for 
setting up more realistic and measurable objectives, I will only implement the 
first four methods and if time permits, I will also consider implementing the 
fifth method as well. But I have not mentioned any specific details about it in 
my project schedule.<<BR>>
+ 
+ The main components of the implementation are the SCP parser and the SCP 
evaluator which are going to be used extensively by the above methods. For 
example, in the first four methods, the parser parses either an SCP or an 
incomplete SCP and then this expression is processed by the evaluator to return 
a list of schema components in an XSObjectList.<<BR>>
+ 
+ At the initial stage, the parser and the evaluator is implemented to support 
only XML schema 1.0 object model and the system would be easy to extend due to 
the loosely coupled nature of its design, to support XML schema 1.1 object 
model as well. As I believe, speed and efficiency are the two most critical 
factors that must be met to a higher possible degree because the introduction 
of this new feature must not degrade the existing performance of Xerces under 
any circumstances. However, initially more attention is given to design a solid 
API and to come up with a more modular, extendable and a loosely coupled design 
as I mentioned earlier.<<BR>>
+ 
+ The parser can be generated with an automatic code generation tool similar to 
JavaCC and, to write the evaluator, a good understanding of the XML Schema 
API[15] and an understanding about how to navigate and XSModel is required. The 
SCD W3C specification defines the EBNF syntax for both SCD[16] and SCP[17] 
which can be used in the generation of the parser. However, it does not suggest 
any semantics for evaluating such expressions.<<BR>>
+ == Deliverables ==
+  1. Source code and necessary build files for the SCD parser and 
evaluator<<BR>>
+  2. Required patches if any<<BR>>
+  3. A collection of tests that can be used to verify the functionality of the 
SCD parser and evaluator<<BR>>
+  4. SCD API Documentation<<BR>>
  == Things I have done so far ==
+ I checked out and built the Xerces trunk and then I tried out some samples 
and tests and started to study the code, specially, the coding standards and 
styles that have been used and the package structure etc. Because I am not very 
much familiar with Java tools like annotations, packaging, unit tests and 
documentation generation, etc., I also started to learn them and I looked at 
existing issues of Xerces related to XML Schema API and searched if there are 
issues related to SCD in JIRA. I could also find couple of existing 
implementations of SCD[18][19] and they will also be considered in designing 
the API where necessary. I spent most of my time to research on SCD, specially 
to trying to understand the W3C SCD specification and setting up measurable 
goals for my project, and to study various related technologies to SCD.<<BR>>
+ 
  == Development Schedule ==
+ I am planning to learn most of the required programming skills while doing 
the development. But initially (i.e. during the community bonding period) I 
will learn advanced Java skills and the required knowledge of XML schema and 
XML schema API since they are essential to start designing the components and 
to begin coding. I will dedicate the complete four-month period starting from 
April and lasting until the end of August for this project and I could work 
between forty to fifty hours per week.
+ || Community Bonding Period: April 26 - May 24 || Get to know the mentor and 
the community<<BR>>Learning more about the required API and 
features<<BR>>Preparing the development environment<<BR>>Familiarizing myself 
with Xerces, XML Schema API and Java technologies etc.<<BR>>Reading 
documentation about JavaCC<<BR>>Start designing the system: this includes 
designing the required data structures and algorithms for the SCP parser and 
the evaluator, overall class hierarchy, and deciding where and how to implement 
methods of the API etc. ||
+ || Interim Period: May 24 - July 12            || Finalizing the API 
design<<BR>>Dividing the development process into stages with the help of the 
mentor<<BR>>Completing the SCP parser and together with its unit 
tests<<BR>>Begin coding the evaluator (I believe the development of the 
evaluator will take more time than that of the parser and therefore I have 
allocated more time for it) ||
+ || July 12 - July 16                           || Submitting mid-term 
evaluations and continue with the development of the evaluator ||
+ || Interim Period: July 16 - August 9          || Completing the evaluator 
and its unit tests<<BR>>Completing the first four methods of the API by using 
the completed parser and the evaluator by arranging them as required to create 
the final system<<BR>>Testing the evaluator with the parser<<BR>>Start working 
on unit tests and documentation for the overall functionality of the system ||
+ || August 9 - August 16                        || Refine code and unit tests, 
running complete tests, and improve documentation ||
+ || August 20                                   || Final evaluation deadline ||
+ || August 30                                   || Submitting required code to 
Google ||
  == Community Interaction ==
+ I have subscribed to both Xerces users list and development and I posted 
couple of times when I came across difficulties in installing and using Xerces. 
I also used the development list to introduce my interest in doing SCD as a 
project. Even before that, I tried to communicate with last year's GSoC mentors 
of Xerces in order to introduce my self to them and to ask about the possible 
projects for this year. Apart from that, I used the mailing list whenever 
possible to clarify the doubts by asking questions from the experts. Specially, 
the problems that I had about various aspects regarding the W3C SCD 
specification, expected results and possible design details of this project, 
internals of Xerces like XSModel and XSSerializer, etc. This knowledge together 
with the feed back that I received on my draft project proposal was so useful 
for me in creating this final project proposal. In the future also I'm 
expecting to use the mailing lists to clarify issues I find and to receive 
suggestions and feedback for my work from the experienced developers and to get 
them involved in the design process of the project as well. I'm also expecting 
to maintain an excellent communication with my mentor via email and IM. <<BR>>
+ 
  == About me ==
+ Hi, I'm Ishan. I'm an undergraduate of the department of Computer Science and 
Engineering, University of Moratuwa, Sri Lanka and my interests are XML and web 
services. What I expect from participating in a GSoC project is most 
importantly to get introduced to a large, well known community like Apache and 
to ultimately become a commiter of that project. I have a great passion to 
contribute to free software and therefore I believe this would be a great 
opportunity and an excellent starting point for that. With this project, I'm 
hoping to obtain a better understanding about the Xerces architecture by 
experimenting with it's code base and above everything, to implement a brand 
new feature for it that has just reached it's W3C candidate recommendation. At 
the same time, I'm hoping to improve my programming and communication skills 
and to learn more about XML, XML Schema, Java and similar technologies.<<BR>>
+ 
+ My experience in open source development: The first experience I had in open 
source development was writing a plugin for Mozilla Firefox web browser which 
was a visualizing tool for navigating and managing tabs. Then I attempted to 
contribute to KDevelop IDE by fixing a little bug in it. But I didn't receive a 
good feedback because it was considered an unwanted fix by the KDevelop 
community. Nevertheless, I could learn a lot of skills related to open source 
development by involving in that project, even if it was for a short time. I 
can code in C, C++, and Java. In addition to these things, I'm familiar with 
Linux and various command line tools. I always use free and open source 
software in my academic and development work and I encourage my colleagues to 
use free software alternatives whenever they can.<<BR>>
+ 
  == References and Resources ==
+ [1] ''XML Schema Part 1: Structures'' Second Edition: 
http://www.w3.org/TR/xmlschema-1/
+ <<BR>>[2] ''XML Schema Part2: Datatypes'' Second Edition: 
http://www.w3.org/TR/xmlschema-2/
+ <<BR>>[3] ''W3C XML Schema Definition Language (XSD): Component 
Designators'': http://www.w3.org/TR/2010/CR-xmlschema-ref-20100119/
+ <<BR>>[4] W3C News Archive: http://www.w3.org/News/2010#entry-8703
+ <<BR>>[5] ''Schema Component Designators'': 
http://www.w3.org/TR/xmlschema-ref/#section-scds
+ <<BR>>[6] ''Schema Component Paths'': 
http://www.w3.org/TR/2010/CR-xmlschema-ref-20100119/#section-path
+ <<BR>>[7] ''Extended Primer Example'': 
http://www.w3.org/TR/2010/CR-xmlschema-ref-20100119/#section-primer-example
+ <<BR>>[8] ''Additional Examples'': 
http://www.w3.org/TR/2010/CR-xmlschema-ref-20100119/#section-example-more
+ <<BR>>[9] Examples with component and elided-component Axes (Non-Normative): 
http://www.w3.org/TR/2010/CR-xmlschema-ref-20100119/#section-examples-abbreviations
+ <<BR>>[10] XSModel(XML Schema API): 
http://xerces.apache.org/xerces2-j/javadocs/xs/org/apache/xerces/xs/XSModel.html
+ <<BR>>[11] See Section 4.3.2 ''Namespaces'': 
http://www.w3.org/TR/xmlschema-ref/#section-path-interpret
+ <<BR>>[12] See Section 4.3.1 ''Incomplete Schema Component Paths'': 
http://www.w3.org/TR/xmlschema-ref/#section-path-interpret
+ <<BR>>[13] ''Use Cases'': 
http://www.w3.org/TR/2010/CR-xmlschema-ref-20100119/#section-usecases
+ <<BR>>[14] Canonical Schema Component Paths: 
http://www.w3.org/TR/2010/CR-xmlschema-ref-20100119/#section-canonical-path
+ <<BR>>[15] XML Schema API: 
http://xerces.apache.org/xerces2-j/javadocs/xs/index.html
+ <<BR>>[16] ''Schema Component Designator Syntax'': 
http://www.w3.org/TR/xmlschema-ref/#section-scd-syntax
+ <<BR>>[17] ''Schema Component Path Syntax'': 
http://www.w3.org/TR/xmlschema-ref/#section-path-syntax
+ <<BR>>[18] 
http://fisheye5.cenqua.com/browse/~raw,r=1.1/xsom/www/javadoc/index.html?com/sun/xml/xsom/SCD.html
+ <<BR>>[19] 
http://www.mathling.com/xsd/docs/api/index.html?com/mathling/scd/XSSerializer.html
  

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to