Hi, I have attached my draft project proposal regarding SCD with this and I'm looking forward to your feedback on it. Specially, I need your help to improve its "Description", "Deliverables" and the "Development Schedule" sections. If you have any suggestion or comment please let me know. Thanks in advance.
On 3/9/10, Michael Glavassevich <[email protected]> wrote: > > Hi Ishan, > > Feel free to share your proposal with the list when you're ready. This will > give folks an idea of what you're thinking of and serve as a start to > concretely define a GSoC project which would be manageable in the time you > would have. I think SCD is an important feature and would help developers > who use XSModel to navigate it more easily with fewer lines of application > code, just like how XPath can reduce the complexity of navigating a DOM > tree. > > Thanks again for your interest in getting involved in the development. > > Michael Glavassevich > XML Parser Development > IBM Toronto Lab > E-mail: [email protected] > E-mail: [email protected] > > Ishan Jayawardena <[email protected]> wrote on 03/03/2010 07:11:07 AM: > >> Hi, >> I've been asking some questions about the SCD spec and the way that >> it has to be implemented for Xerces, for some time because I would >> like to do it as a project for this year's GSoC. I saw this idea[1] >> (i.e implementing a parser and an evaluator for SCD) in the ideas >> list that Apache had published for last year's GSoC and I saw that >> it had not been done. I also contacted Michael to learn more about >> that and he gave me enough guidelines to start with it. Therefore, I >> thought of doing it this time and started to work on it. So now, I'm >> writting my proposal and I want to know if it's ok to post my draft >> project proposal here? I would also like to know your ideas about >> this project and expecting your feedback to prepare a successful >> project proposal as well. Do you belive that this project is >> appropriate for a GSoC project (i.e. in terms of available time, >> workload, or even the expected result etc.)? Do you have any >> suggestions, improvements or modifications about this idea? To what >> extent does Xerces need this new feature and what effect will it >> have on Xerces? >> Thanks in advance. >> >> [1] http://wiki.apache.org/general/SummerOfCode2009#xerces-project>
Google Summer of Code 2010 - Project Proposal
AbstractApache Xerces2 is a high-performance, standard complaint processor written in Java for parsing, validating, serializing and manipulating XML documents. At present, it implements a collection of standard APIs for XML processing and its development is in progress to achieve the latest W3C XML Schema 1.1 specification support. The objective of this project is to implement a parser and an evaluator for schema component designators (SCD) for Xerces that can be used to identify and retrieve XML schema component(s) from an XSModel[5]; the XML schema data model used by Xerces. Schema components that are defined in two W3C recommendations; XML Schema Part 1: structures[1] and XML Schema part 2: Data types [2] act as the building blocks of an XML schema document.
DescriptionW3C XML Schema Definition Language (XSD): Component Designators is a specification that reached W3C candidate recommendation in January 2010[3] with W3C inviting the community to start implementing it[4]. The main advantage SCD provides for the programmers is making it easier to navigate an XSModel[5] (more generally, an XML Schema object model) more efficiently by reducing the amount of code that they have to write to retrieve a set of specific schema components. This is achieved by using a path _expression_ similar to an XPath _expression_. One of the basic requirements of SCD is that it should be usable for either the XML Schema 1.0 or the XML Schema 1.1 component model. The W3C SCD specification defines two basic types of SCDs[6],
For instance, consider the following ASCD, http://example.org/schemas/po.xsd#xscd(/type::purchaseOrderType) In here, the URI http://example.org/schemas/po.xsd refers to an assembled schema and the XPointer fragment xscd(/type::purchaseOrderType) referes to a particular component by using the SCP /type::purchaseOrderType. Following is an ASCD with a namespace binding, http://example.org/schemas/po.xsd#xmlns(p=http://example.com/schema/po)xscd(/type::p:USAddress) Please note that the term "assembled schema" (or "schema") refers to a resulting logical namespace that is generated by the combination of one or more such schemas and these schemas are physically represented as schema documents. The W3C SCD specification consists of a more comprehensive set of examples[13][14][15] that illustrates a number of possible usages and types of SCDs/SCPs.
In this project, I am focusing only on implementing the RSCD support for Xerces because according to the feedback I received from the Xerces community, it will often be more difficult and less useful to work with ASCDs since there is no defined way to resolve a URI to a schema. For example, consider the first ASCD that I have mentioned. It is obvious that if we try to evaluate that ASCD against the schema in that URI, first we have to resolve that URI and build an XSModel object from it, which is not possible with Xerces. Even the W3C specification does not specify any convention to build a schema by referring to such URI. In addition to that, it is possible that a URI refers either to a remote schema document or to a schema document stored locally, and if we try to come up with a design where only locally referenced URIs are supported, again that will not be a good program design and, after all, there is almost no difference between that method and RCSD because we could easily create an XSModel object with that local schema document. Therefore it will be much more appropriate to implement only the RCSD support for Xerces.
The two primary operations that would reflect the RSCD implementation, and that would yield a number of SCD use cases[16] are,
To achieve the first operation, first we have to create an assembled schema from an existing XML document by processing it into schema components with XSModel and XSLoader[12]. After that, the required SCDs/SCPs can be evaluated relative to this schema with the help of XSSerializer[7] utility. Schema components are represented in Xerces using org.apache.xerces.xs interfaces and the schema description schema component ( or the "assembled schema" or the "schema") described in the W3C SCD specification is represented by the XSModel interface.
Giving Xerces the ability to parse and evaluate an SCP is the main objective of this project and the RSCD support is implemented with the help of it. There are couple of compelling reasons behind this fact.
Therefore, in this project I am planning to come up with a design where the SCP support is provided in a separate interface to serve both RSCD implementation as well as potential future requirements.
The most important processes in the design and implementation stage that affect various aspects of the final outcome of this project are the processes of writing the parser to parse an SCD/SCP _expression_ and writing the evaluator that evaluates such _expression_ and returns a list of selected components. The parser can be generated with an automatic code generation tool similar to JavaCC and, to write the evaluator, a good understanding of the XML Schema API[8] and an understanding about how to navigate and XSModel is required. As I believe, speed and efficiency are the two most critical factors that must be met to a higher possible degree because the introduction of this new feature must not degrade the existing performance of Xerces under any circumstances. The SCD W3C specification defines the EBNF syntax for both SCD[9] and SCP[10] which can be used in the generation of the parser. However, it does not suggest any semantics for evaluating such expressions.
Things I have done so farI checked out and built the Xerces trunk and then I tried out some samples and tests and started to study the code, specially, the coding standards and styles that have been used and the package structure etc. Because I am not very much familiar with Java tools like annotations, packaging and documentation generation, I also have started to read about them and I looked at existing issues of Xerces and searched if there are issues related to SCD in JIRA. I could also find couple of existing implementations of SCD[11] and I am studying their APIs to incorporate necessary details from it to my project. I spent most of my time to research on SCD, specially to trying to understand the W3C SCD specification and to study about various related technologies to SCD. Because SCD is a completely new technology, it was quite difficult to find resources to learn more about it. In addition to that, I started to read the literature, specially other related W3C standards, various tutorials, and Xerces XML Schema API etc, that would be helpful for this project.
Development Schedule
Deliverables
Community InteractionI have subscribed to both Xerces users list and development list and I posted couple of times when I came across difficulties in installing and using Xerces. I also used the development list to introduce my interest in doing SCD as a project. Even before that, I tried to communicate with last year's GSoC mentors of Xerces in order to introduce my self to them and to ask about the possible projects for this year. Apart from that, I used the mailing list whenever possible to clarify the doubts by asking questions from the experts. Specially, the problems that I had about various aspects regarding the W3C SCD specification, expected results and possible design details of this project, internals of Xerces like XSModel and XSSerializer, etc. This knowledge together with the feed back that I received on my draft project proposal was so useful for me in creating this final project proposal. In the future also I'm expecting to use the mailing lists to clarify issues I find and to receive suggestions and feedback for my work from the expert developers and to get them involved in the design decisions of the project as well. I'm also expecting to maintain an excellent communication with my mentor via email and IM.
About meHi, I'm Ishan. I'm an undergraduate of the department of Computer Science and Engineering, University of Moratuwa, Sri Lanka and my interests are XML and web services. What I expect from participating in a GSoC project is most importantly to get introduced to a large, well known community like Apache and to ultimately become a long term commiter of that project. I have a great passion to contribute to free software and therefore I believe this would be a great opportunity and an excellent starting point for that. With this project, I'm hoping to obtain a better understanding about the Xerces architecture and to improve my knowledge on it by experimenting with it's code base and above everything, to implement a brand new feature for it that has just reached it's W3C candidate recommendation. At the same time, I'm hoping to improve my programming and communication skills and to learn more about XML and various technologies. My experience in open source development: The first experience I had in open source development is writing a plugin for Mozilla Firefox web browser which was a visualizing tool for navigating and managing tabs. Then I attempted to contribute to KDevelop IDE by fixing a little bug in it. But I didn't receive a good feedback because it was considered an unwanted fix by the KDevelop community. Nevertheless, I could learn a lot of skills related to open source development by involving in that project, even if it was for a short time. I learned how to build applications by checking out from the repositories, how to use source code management systems like SVN, CVS, Git, how to use build tools, how to create patches and submit them, and how to connect with the community by using IRC channels and mailing lists, etc. I can code in C, C++, and Java. In addition to these things, I'm familiar with Linux and various command line tools. I always use free and open source software in my academic and development work and I encourage my colleagues to use free software alternatives whenever they can.
References and Resources[1]XML Schema Part 1: Structures Second Edition: http://www.w3.org/TR/xmlschema-1/ [2]XML Schema Part2: Data types Second Edition: http://www.w3.org/TR/xmlschema-2/ [3]W3C XML Schema Definition Language (XSD): Component Designators: http://www.w3.org/TR/2010/CR-xmlschema-ref-20100119/ [4] W3C News Archive: http://www.w3.org/News/2010#entry-8703 [5] XSModel(XML Schema API): http://xerces.apache.org/xerces2-j/javadocs/xs/org/apache/xerces/xs/XSModel.html [6] http://www.w3.org/TR/xmlschema-ref/#section-scds [7] A utility written by Mukul Gandhi that utility serializes the Xerces XSModel into lexical, XSD syntax: http://svn.apache.org/viewvc/xerces/java/branches/xml-schema-1.1-dev/samples/xs/XSSerializer.java [8] XML Schema API: http://xerces.apache.org/xerces2-j/javadocs/xs/index.html [9]http://www.w3.org/TR/xmlschema-ref/#section-scd-syntax [10]http://www.w3.org/TR/xmlschema-ref/#section-path-syntax [11]http://fisheye5.cenqua.com/browse/~raw,r=1.1/xsom/www/javadoc/index.html?com/sun/xml/xsom/SCD.html [12]http://xerces.apache.org/xerces2-j/javadocs/xs/org/apache/xerces/xs/XSLoader.html SCD examples: [13]http://www.w3.org/TR/2010/CR-xmlschema-ref-20100119/#section-primer-example [14]http://www.w3.org/TR/2010/CR-xmlschema-ref-20100119/#section-example-more [15]http://www.w3.org/TR/2010/CR-xmlschema-ref-20100119/#section-examples-abbreviations [16]http://www.w3.org/TR/2010/CR-xmlschema-ref-20100119/#section-usecases [17]http://www.w3.org/TR/2010/CR-xmlschema-ref-20100119/#section-path [18]http://www.w3.org/TR/2010/CR-xmlschema-ref-20100119/#section-canonical-path |
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
