Hi Michael, Today i edit my proposal again and got a new version, i will submit it in the GSoC web site soon. Now, i sent it to you first, if you have any advises or suggestions, just let me know, thank you. I hope i can finshed this project well,do some thing, or even do more thing for Xerces :-)
2010-03-29 xiaohei.leiyin 发件人: Michael Glavassevich 发送时间: 2010-03-29 11:51:01 收件人: [email protected] 抄送: 主题: Re: GSoC proposal about "Asynchronous LSParser and parseWithContext " Hi Yin, Yin Lei <[email protected]> wrote on 03/25/2010 04:53:16 AM: > Hi Michael, > > About the function parseWithContext(LSInput input, Node > contextArg,short action), there is a point i am not so clear. > > If LSInput contains following content: > > <?xml version="1.0" encoding="UTF-8"?> > <element id="1">element_one</element> > <element id="2">element_two</element> > > For a LSInput, is it well-format or legal ? Or we could just neglect > XML declation ? It matches the production [1] for well-formed external parsed entities so I would say yes it's allowed. That's a text declaration [2] by the way, not an XML declaration. > If this input is legal,action is ACTION_INSERT_AFTER and contextArg > is a DOM element has the following content: > > <contextnode>content here</contextnode> > > Should we return this DOM Node ? > > <contextnode>content here</contextnode> > <element id="1">element_one</element> > <element id="2">element_two</element> As long as the parent of "contextnode" is an Element or a DocumentFragment that is the correct result. > Thank you and expceting your reply Thanks. [1] http://www.w3.org/TR/2006/REC-xml-20060816/#NT-extParsedEnt [2] http://www.w3.org/TR/2006/REC-xml-20060816/#NT-TextDecl Michael Glavassevich XML Parser Development IBM Toronto Lab E-mail: [email protected] E-mail: [email protected]
Google Summer of Code 2010 - Project Proposal
AbstractApache Xerces2 is a powful XML parser,at present, it implements a collection of standard APIs for XML processing,though Xerces has a functional DOM Level 3 LSParser,but there are a couple parts of the spec which still need to be implemented.This project will provide an asynchronous version for LSParse which returns from the parse method immediately and builds the DOM tree on another thread as well as implementing the function parseWithContext which allows a document fragment to be parsed and attached to an existing DOM.
DescriptionApache Xerces-J is a high-performance, standard complaint processor written in Java for parsing, validating, serializing and manipulating XML documents. It provides a complete implementation of the Document Object Model Level 3 Core and Document Object Model Level 3 Load and Save Recommendations,but Xerces' implemention of LSParser has two limitations(http://xerces.apache.org/xerces2-j/dom3.html): 1.not support asynchronous LSParser which returns from the parse method immediately and builds the DOM tree on another thread. 2.not support the function parserWithContext of interface LSParser which parse an XML fragment from a resource identified by a LSInput and insert the content into an existing document at the position specified with the context and action arguments. In order to solve these two limitations, i have been researching W3C's recommendation specification about LSParser and in the meantime, i have downloaded Xerces2-J's source code,import it to my Eclipse workspace, look it over and over and consider how to implements these two specifications.At the same time,i discuss the subject with Xerces' developers(You help me a lot,thank you,especially Michael Glavassevich).Now,i have found some ideas about the solution and did some experiments to check my solution,this is only a global solution,and i neglect some details. 1.interface DOMImplementationLS Class org.apache.xerces.dom.CoreDOMImplementationImpl implements the interface. As described in W3C's recommendation, DOMImplementationLS's implemention should supply a function createLSParser which can create synchronous LSParser as well as asynchronous LSParser,but now, we can only get the former using CoreDOMImplementationImpl's function createLSParser. So,i should fix this problem. 2.interface LSParser Class org.apache.xerces.parsers.DOMParserImpl implements the interface, but absolutely,it supports synchronous model only,even the function getAsync in it directly return false. There is my solution to provide an asynchronous version for LSParser. It use a Vector ojbect (we name it repository) to store all the action listeners registered in to the current LSParser object. Each of listeners is made up of three parts,type,useCapture and event handler function,there are only two types of event,load and progress. My following task is to implement function addEventListener,dispatchEvent and removeEventListener. addEventListener : just add a action listener object in to repository.We should notice that listener with the same parameters can only be added once. dispatchEvent : traverse each item of repository,if some one has the same type value with the event and its useCapture value is true,let's dispatch its handleEvent function. removeEventListener : traverse each item of repository,if some one is the same as the object in the parameter,just remove this item from repository. In asynchronous LSParser, LSLoadEvent is used to inform the parser that the parse function has finished parse job. We can achieve it by dispatching LSParser's dispatchEvent function which will receive LSLoadEvent as a parameter. In asynchronous LSParser,the parse thread will trigger a LSProgressEvent when it finish a entity node parsing job,the triggered LSProgressEvent will tell LSParser current parse position. If it can see more external resource reference, it may also change totalSize value. DOMParserImpl has a attribute which mark its model,synchronous or asynchronous. We can get its parse model from the function getAsynoc. If the parser is in asynchronous, when LSParser instance's parse() function is dispatched,set busy value true, start a Thread to parse XML document in LSInput,and then return null value. When XML parse thread finish its parse job,set busy value false,create a LSLoadEvent instance with type value load,dispatch function dispatchEvent(Event evt).If user register any actionlistener for load event,dispatchEvent function will finish jobs defined in actionlistener's handleEvent function. 3.function parseWithContext(LSInput input, Node contextArg, short action) This function parse an XML fragment from a resource identified by a LSInput and insert the content into an existing document at the position specified with the context and action arguments. This XML fragment is a special data structure, I need contruct a new class named XMLFragment to store it. Then, i should do the following jobs: 1). Parse the XML fragment into a XMLFragment object,mark it whether a complete XML document, any error happens,throw an exception. These classes can help me: 2). Start add the XMLFragment in to the place indicated by the parameter action. In this phase, we have lots of validate jobs to do, including four aspects: Basic Validation, Namespace Validate, DTD Validation and Schema Validateoin. I. Basic Validation: II. Namespace Validation III.DTD Validation IV.Schema Validation If everything is OK,return the result Node,otherwise if an error occurs, the caller is notified through the ErrorHandler instance associated with the "error-handler" parameter of the DOMConfiguration.As the new data is inserted into the document, at least one mutation event is fired per new immediate child or sibling of the context node. Things I have done so farI have been studying W3C's recommendation specification about LSParser,parseWithContext and their relative specifications,caught some problems,discussed these problems with Xerces' developers in Xerces' develop mail list.In order to finish the project,i have to know everything and any detail about LSParser. In addition to that, I started to read the literature, specially other related W3C standards, various tutorials etc, that would be helpful for this project. At the same time, I checked out and built the Xerces trunk and then I tried out some samples and tests and started to study the code. In the future, if i want my codes to be one part of Xerces-J,i must keep the same coding standards and styles that have been used and the package structure etc. I want to look over every class package in Xerces' source project, get to know each class's function,this is a long term study process and i have not finished it yet. In Xerces' develop mail and user mail list,I filter existing issues of Xerces and searched if there are issues related to LSParser in JIRA.
Development Schedule
Deliverables
Community InteractionI have subscribed to both Xerces users list and development list and I posted couple of times when I came across difficulties in installing and using Xerces and reading Xerces source code. I also used the development list to introduce my interest in implementing LSParser and parseWithContext as a project,and discucss some confusing W3C recommendation specification. Even before that, I tried to communicate with last year's GSoC students and mentors,they gave me some good advises about how to prepare for GSOC open source project.Apart from that, I used the mailing list whenever possible to clarify the doubts by asking questions from the experts. Specially, some open source pionneers and mentors help me so much(here,to my honest,thank you so much). Feed back that I received on my draft project proposal from mail list was so useful for me in creating this final project proposal. In the future also I'm expecting to use the mailing lists to clarify issues I find and to receive suggestions and feedback for my work from the expert developers and to get them involved in the design decisions of the project as well. I'm also expecting to maintain an excellent communication with my mentor via email and IM.
About meHi,My name is Yin Lei. I am a postgraduate student of University of Science and Technology Beijing,China. My major is computer scienece and technology. During my six years Java development experience, Apache help me so much, many projects such as Struts,Tomcat,Xerces,Xalan,HttpClient,Common FileUpload,JavaMail,POI play important part of my research projects. So, i am eager to participate in open source community and become a long term commiter of that project, i help GSoC may help me,introduce me to the open source projects.With this project, I'm hoping to obtain a better understanding about the Xerces architecture and to improve my knowledge on it by experimenting with it's code base and above everything, to implement the missing features for it that has just reached it's W3C candidate recommendation. At the same time, I'm hoping to improve my programming and communication skills and to learn more about XML and various technologies. My work experience and relative rewards: My
experience in open source development:
References and Resources[1]W3C recommendation specification about DOM Level 3 Load and Save LSParser: http://www.w3.org/TR/2004/REC-DOM-Level-3-LS-20040407/load-save.html#LS-LSParser [2]W3C recommendation specification about LSParser function parseWithContext: http://www.w3.org/TR/2004/REC-DOM-Level-3-LS-20040407/load-save.html#LS-LSParser-parseWithContext [3]Apache Xerces2-j Home page: http://xerces.apache.org/xerces2-j [4]DOM Level 3 Load and Save limitations about LSParser and parseWithContext:http://xerces.apache.org/xerces2-j/dom3.html [5]Document Object Model Level 3 Core recommendation: http://www.w3.org/TR/DOM-Level-3-Core/ [6]Document Object Model Level 3 Load and Save Recommendation: http://www.w3.org/TR/DOM-Level-3-LS/
|
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
