Hi Yin,

I noticed you completely removed the section on community involvement in
your most recent draft. In my previous note I wasn't suggesting that you
delete it altogether, just that you describe your unique experience in your
own words.

Engagement with the project community is important and it's a good idea for
everyone to mention it in their proposal.

Thanks.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: [email protected]
E-mail: [email protected]

"xiaohei.leiyin" <[email protected]> wrote on 03/30/2010 10:19:16
AM:

> Hi Michael,
>
> Thank you so much, i think i have found the solution to implement
> Asynchronous LSParser and parseWithContext for Xerces, and i am sure
> i can finish it well, the most important thing is that i am
> interested in XML parsing job,that gives me power. I am really
> really looking forward to one of Xerces committers :-)
>
> In addition, here is my submited proposal,if you have time, i am
> looking forward to any suggestions from you
>
-------------------------------------------------------------------------------------------------------------------------------------
> Project Title:Implement Xerces' Asynchronous LSParser and
parseWithContext
> StudentName: Yin Lei
> Student Email: [email protected]
> Organization/Project:Apache Foundation/Xerces
>
> Assigned Mentor:Michael Glavassevich
> Proposal Abstract:
> Apache Xerces2 is a powful XML parser,at present, it implements a
> collection of standard APIs for XML processing,though Xerces has a
> functional DOM Level 3 LSParser,but there are a couple parts of the
> spec which still need to be implemented.This project will provide an
> asynchronous version for LSParse which returns from the parse method
> immediately and builds the DOM tree on another thread as well as
> implementing the function parseWithContext which allows a document
> fragment to be parsed and attached to an existing DOM.
> Detailed Description:
> Apache Xerces-J is a high-performance, standard complaint processor
> written in Java for parsing, validating, serializing and
> manipulating XML documents. It provides a complete implementation of
> the Document Object Model Level 3 Core and Document Object Model
> Level 3 Load and Save Recommendations,but Xerces' implemention of
> LSParser has two limitations
(http://xerces.apache.org/xerces2-j/dom3.html):
> 1. not support asynchronous LSParser which returns from the parse
> method immediately and builds the DOM tree on another thread.
> 2. not support the function parserWithContext of interface LSParser
> which parse an XML fragment from a resource identified by a LSInput
> and insert the content into an existing document at the position
> specified with the context and action arguments.
> In order to solve these two limitations, i have been researching
> W3C's recommendation specification about LSParser and in the
> meantime, i have downloaded Xerces2-J's source code,import it to my
> Eclipse workspace, look it over and over and consider how to
> implements these two specifications.At the same time,i discuss the
> subject with Xerces' developers(You help me a lot,thank
> you,especially dear Michael Glavassevich).Now,i have found some
> ideas about the solution and did some experiments to check my
> solution,this is only a global solution,and i neglect some details.
> 1. interface DOMImplementationLS,Class
> org.apache.xerces.dom.CoreDOMImplementationImpl implements the
> interface. As described in W3C's recommendation,
> DOMImplementationLS's implemention should supply a function
> createLSParser which can create synchronous LSParser as well as
> asynchronous LSParser,but now, we can only get the former using
> CoreDOMImplementationImpl's function createLSParser. So,i should fix
> this problem.,
> 2. interface LSParser,Class org.apache.xerces.parsers.DOMParserImpl
> implements the interface, but absolutely,it supports synchronous
> model only,even the function getAsync in it directly return false.
> There is my solution to provide an asynchronous version for LSParser.
> Step one : DOMParserImpl implements interface EventTarget as well as
> interface LSParser.
> It use a Vector ojbect (we name it repository) to store all the
> action listeners registered in to the current LSParser object. Each
> of listeners is made up of three parts,type,useCapture and event
> handler function,there are only two types of event,load and
> progress. My following task is to implement function
> addEventListener,dispatchEvent and removeEventListener.
> addEventListener : just add a action listener object in to
> repository.We should notice that listener with the same parameters
> can only be added once.
> dispatchEvent : traverse each item of repository,if some one has the
> same type value with the event and its useCapture value is
> true,let's dispatch its handleEvent function.
> removeEventListener : traverse each item of repository,if some one
> is the same as the object in the parameter,just remove this item
> from repository.
> Step two : implement interface LSLoadEventIn asynchronous LSParser,
> LSLoadEvent is used to inform the parser that the parse function has
> finished parse job. We can achieve it by dispatching LSParser's
> dispatchEvent function which will receive LSLoadEvent as a parameter.
> Step three : implement interface LSProgressEventIn asynchronous
> LSParser,the parse thread will trigger a LSProgressEvent when it
> finish a entity node parsing job,the triggered LSProgressEvent will
> tell LSParser current parse position. If it can see more external
> resource reference, it may also change totalSize value.
> Step four : implement asynchronous mechanismDOMParserImpl has a
> attribute which mark its model,synchronous or asynchronous. We can
> get its parse model from the function getAsynoc. If the parser is in
> asynchronous, when LSParser instance's parse() function is
> dispatched,set busy value true, start a Thread to parse XML document
> in LSInput,and then return null value. When XML parse thread finish
> its parse job,set busy value false,create a LSLoadEvent instance
> with type value load,dispatch function dispatchEvent(Event evt).If
> user register any actionlistener for load event,dispatchEvent
> function will finish jobs defined in actionlistener's handleEvent
function.
> 3. function parseWithContext(LSInput input, Node contextArg, short
action)
> This function parse an XML fragment from a resource identified by a
> LSInput and insert the content into an existing document at the
> position specified with the context and action arguments. This XML
> fragment is a special data structure, I need contruct a new class
> named XMLFragment to store it. Then, i should do the following jobs:
> Parse the XML fragment into a XMLFragment object,mark it whether a
> complete XML document, any error happens,throw an exception. These
> classes can help me:
> a . org.apache.xerces.impl.XMLDocumentFragmentScannerImpl
> b .
org.apache.xerces.impl.XMLDocumentScannerImpl.FragmentContentDispatcher
> c . org.apache.xerces.impl.XMLEntityScanner
> I can use some functions in these classes and start the parsing job
> by consult function startEntity in class
> org.apache.xerces.impl.XMLDocumentScannerImpl and function
> scanDocument in class
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl. Here is the
> basic implement idea (in fact, this is a recursion process):
> 1. Create a XMLFragment instance,there is a very importmant
> attribute in it, we named it fCurrentNode;
> 2. Start read characters from the LSInput stream.If catch Node start
> character such as "<" from the input stream,go tostep 3; If catch
> Node end character such as "/" from the input stream,go to step 4;
> If catch file end up character,go to step 5.
> 3. A begin EVENT happens, usually, instance a Node object,append the
> Node instance in to fCurrentNode's child node list,change
> fCurrentNode to this Node,then go to step 2.
> 4. A end EVENT happens,usually,we should change fCurrentNode to its
> father node,then go to step 2.
> 5. Parsing job ends.
> Start add the XMLFragment in to the place indicated by the parameter
> action. In this phase, we have lots of validate jobs to do,
> including four aspects: Basic Validation, Namespace Validate, DTD
> Validation and Schema Validateoin.
> 1. Basic Validation:
> I should validate whether this merge job is legal, for example,if
> the context Node is document root element, and the parameter action
> is not ACTION_REPLACE_CHILDREN,in this situation,an error should be
thrown up.
> I should confirm the merge result XML document is well-formed,for
> example, the DOM should have only one root element and Entity
> declaration must be at the beginning part of the document etc.
> 2. Namespace Validation: I should validate both Element namespace
> and Attribute namespace of the merge result XML document
> 3. DTD Validationa:
> Validate whether the merge result XML document is in keeping with
> Element Type Declarationb.
>  Validate whether the merge result XML document is in keeping with
> Entity Declarationc.
> Validate whether the merge result XML document is in keeping with
> Attribute Declaration, for example, if DTD file includes default
> attribute declaration,i should add default attributes for the
> elements which are root in LSInput Fragment.
> 4. Schema Validation,This section includes validations demands in
> DTD Validation, and it has some more validation requests:
> Validate data type of elements and attributes
> Three kinds of annotation declaration validation
> If everything is OK,return the result Node,otherwise if an error
> occurs, the caller is notified through the ErrorHandler instance
> associated with the "error-handler" parameter of the
> DOMConfiguration.As the new data is inserted into the document, at
> least one mutation event is fired per new immediate child or sibling
> of the context node.
> Additional Information:
> My development plan:
>
> 1st week in 1st month(May 24 - Jun 1)
>
> Read Xerces-J source code and get familiar with its
> architecture,thus what I have done will comply with its philosophy
>
> 2st week in 1st month(Jun 1 - Jun 8)
>
> Do some change job to DOMImplementationLS and DOMParserImpl,make
> DOMImplementationLS can create asynchronous LSParser and add some
> basic attribute for DOMParserImpl such as asynchronous flag and so on
>
> 3st week in 1st month(Jun 9 - Jun 16)
>
> Construct DOMParserImpl's structure to implement interface
> EventTarget,implement addActionListener,dispatchEvent and
> removeActionListener
>
> 4st week in 1st month(Jun 17 - Jun 24)
>
> Implement LSParser parse() and parseURI() function, add
> asynochronous support implement LSParser function abort() implement
> LSParser function getAsync() implement LSParser function getBusy()
>
> 1st week in 2st month(Jun 25 - Jul 2)
>
> Implement interface LSLoadEvent and LSProgressEvent,finish the whole
> asynchronous parse cycle and some  unit test)
>
> 2st week in 2st month(Jul 3- Jul 10)
>
> finish sub task of function parseWithContext() -- parse the LSInput
> into a XMLFragment instance
>
> 3st week in 2st month(Jul 11- Jul 18)
>
> start merge context Node and XML fragment document,finish Basic
> Validation and Namespace Validation
>
> 4st week in 2st month(Jul 19- Jul 26)
>
> finish the merge job of context DOM tree and the XMLFragment,finish
> DTD Validation and Schema Validation
>
> 1st week in 3rd month(Jul 27- Aug 3)
>
> Test My asynchronous LSParser and function parseWithContext
>
> last 2 weeks in 3rd month(Aug 3 - Aug 20)
>
> submit all codes and documents
>
> Who i am ?
> Hi,everyone,My name is Yin Lei. I am a postgraduate student of
> University of Science and Technology Beijing,China. My major is
> computer scienece and technology. During my six years Java
> development experience, Apache help me so much, many projects such
> as Struts,Tomcat,Xerces,Xalan,HttpClient,Common
> FileUpload,JavaMail,POI play important part of my research projects.
> So, i am eager to participate in open source community and become a
> long term commiter of that project, in my daily work, i use Xerces
> as my XML parser, so, i found its lacking and want to improve it to
> make it perfect :-)
> My work experience and relative rewards:
> 2007.7 - 2008.5 : work in SUN Microsystem Inc. as a intern
> 2008.7 - 2009.12 : work in IBM China Development Laborary as a intern
> 2008.9 : won excellent team member of 2008 IBM blue pathway program
> 2009.11: won Lotus Innovation Award of IBM Asia Pacific
> Also,i did some open source job before,the first experience I had in
> open source development is building a Eclipse plugin for Apache
> SCXML engine, and also attempt to add a new feature for SCXML engine
> to make it support multi-thread operation.I can code in C++, Java
> and some script language such as JaveScript and ActionScript. In
> addition to these things, I'm familiar with XML,DOM,SAX,JDOM and
> Dom4j,I want to improve existing XML parsing tools through my job.
>
> 2010-03-30
>
> xiaohei.leiyin
>
> 发件人: Michael Glavassevich
> 发送时间: 2010-03-30  20:10:21
> 收件人: xiaohei.leiyin
> 抄送:
> 主题: Re: GSoC proposal about "Asynchronous LSParser and
parseWithContext "
> Hi Yin,
>
> Yes, that's fine. If your proposal is accepted for GSoC I would
> mentor you and I think that's what they're looking for there on the GSoC
site.
>
> There are usually several hundred proposals submitted to Apache
> every year for the various projects across the organization. It can
> be very competitive depending on the number of spots that Google
> actually awards to Apache and the number of good proposals submitted
> by students. I wish you good luck in the selection process.
>
> Thanks.
>
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: [email protected]
> E-mail: [email protected]
>
> "xiaohei.leiyin" <[email protected]> wrote on 03/30/2010 01:02:31
AM:
>
> > Dear Michael,
> >
> > I have modify my proposal follow your advise, and submit it in the
> > GSoC web site, i noticed that there is a item "Organization/
> > Project:Assigned Mentor:" in the content section of the proposal
> > submit page. So, can i fill it "Organization/Project:Apache
> > Foundation/Xerces  Assigned Mentor:Michael Glavassevich", is it ok ?
> > I mean that can i take you as my assigned mentor ? If you think it
> > is ok, i will maintain, if you do not like it due to some
> > reasons,please let me know, i will alter it ( it is ok, i must
> > respect you, in Chinese culture,you are my teacher already,respct
> > teach is a Chinese culture of long standing and well established, we
> > call it 尊师重教[zun shi zhong jiao] ).
> >
> > During these days,when i was researching Xerces' architect and
> > discovering how to implement Asynchronous LSParser and
> > parseWithContext for Xerces, i found i got lots of knowledge, made a
> > full-grown progress. When i came across some difficulties, you
> > helped me a lot, in fact ,you are my mentor in my heart. Thank you
> > so so so so  much ! I think i have won knowledge no matter GSoC
> > receive my proposal or not, i will finish this project, once i began
> > it, i want to finish it, for you, for open source.
> >
> > Your student : Yin Lei from China
> >
> > 2010-03-30
> >
> > xiaohei.leiyin

Reply via email to