Hi Michael,

Thank you so much, i think i have found the solution to implement Asynchronous 
LSParser and parseWithContext for Xerces, and i am sure i can finish it well, 
the most important thing is that i am interested in XML parsing job,that gives 
me power. I am really really looking forward to one of Xerces committers :-)

In addition, here is my submited proposal,if you have time, i am looking 
forward to any suggestions from you
-------------------------------------------------------------------------------------------------------------------------------------
Project Title:Implement Xerces' Asynchronous LSParser and parseWithContext
StudentName: Yin Lei
Student Email: [email protected]
Organization/Project:Apache Foundation/Xerces

Assigned Mentor:Michael Glavassevich
Proposal Abstract:
Apache Xerces2 is a powful XML parser,at present, it implements a collection of 
standard APIs for XML processing,though Xerces has a functional DOM Level 3 
LSParser,but there are a couple parts of the spec which still need to be 
implemented.This project will provide an asynchronous version for LSParse which 
returns from the parse method immediately and builds the DOM tree on another 
thread as well as implementing the function parseWithContext which allows a 
document fragment to be parsed and attached to an existing DOM.
Detailed Description:
Apache Xerces-J is a high-performance, standard complaint processor written in 
Java for parsing, validating, serializing and manipulating XML documents. It 
provides a complete implementation of the Document Object Model Level 3 Core 
and Document Object Model Level 3 Load and Save Recommendations,but Xerces' 
implemention of LSParser has two 
limitations(http://xerces.apache.org/xerces2-j/dom3.html):
not support asynchronous LSParser which returns from the parse method 
immediately and builds the DOM tree on another thread. 
not support the function parserWithContext of interface LSParser which parse an 
XML fragment from a resource identified by a LSInput and insert the content 
into an existing document at the position specified with the context and action 
arguments.
In order to solve these two limitations, i have been researching W3C's 
recommendation specification about LSParser and in the meantime, i have 
downloaded Xerces2-J's source code,import it to my Eclipse workspace, look it 
over and over and consider how to implements these two specifications.At the 
same time,i discuss the subject with Xerces' developers(You help me a lot,thank 
you,especially dear Michael Glavassevich).Now,i have found some ideas about the 
solution and did some experiments to check my solution,this is only a global 
solution,and i neglect some details.
interface DOMImplementationLS,Class 
org.apache.xerces.dom.CoreDOMImplementationImpl implements the interface. As 
described in W3C's recommendation, DOMImplementationLS's implemention should 
supply a function createLSParser which can create synchronous LSParser as well 
as asynchronous LSParser,but now, we can only get the former using 
CoreDOMImplementationImpl's function createLSParser. So,i should fix this 
problem.
interface LSParser,Class org.apache.xerces.parsers.DOMParserImpl implements the 
interface, but absolutely,it supports synchronous model only,even the function 
getAsync in it directly return false. There is my solution to provide an 
asynchronous version for LSParser. 
Step one : DOMParserImpl implements interface EventTarget as well as interface 
LSParser.
It use a Vector ojbect (we name it repository) to store all the action 
listeners registered in to the current LSParser object. Each of listeners is 
made up of three parts,type,useCapture and event handler function,there are 
only two types of event,load and progress. My following task is to implement 
function addEventListener,dispatchEvent and removeEventListener. 
addEventListener : just add a action listener object in to repository.We should 
notice that listener with the same parameters can only be added once.
dispatchEvent : traverse each item of repository,if some one has the same type 
value with the event and its useCapture value is true,let's dispatch its 
handleEvent function.
removeEventListener : traverse each item of repository,if some one is the same 
as the object in the parameter,just remove this item from repository.
Step two : implement interface LSLoadEventIn asynchronous LSParser, LSLoadEvent 
is used to inform the parser that the parse function has finished parse job. We 
can achieve it by dispatching LSParser's dispatchEvent function which will 
receive LSLoadEvent as a parameter. 
Step three : implement interface LSProgressEventIn asynchronous LSParser,the 
parse thread will trigger a LSProgressEvent when it finish a entity node 
parsing job,the triggered LSProgressEvent will tell LSParser current parse 
position. If it can see more external resource reference, it may also change 
totalSize value.
Step four : implement asynchronous mechanismDOMParserImpl has a attribute which 
mark its model,synchronous or asynchronous. We can get its parse model from the 
function getAsynoc. If the parser is in asynchronous, when LSParser instance's 
parse() function is dispatched,set busy value true, start a Thread to parse XML 
document in LSInput,and then return null value. When XML parse thread finish 
its parse job,set busy value false,create a LSLoadEvent instance with type 
value load,dispatch function dispatchEvent(Event evt).If user register any 
actionlistener for load event,dispatchEvent function will finish jobs defined 
in actionlistener's handleEvent function.
function parseWithContext(LSInput input, Node contextArg, short action)
This function parse an XML fragment from a resource identified by a LSInput and 
insert the content into an existing document at the position specified with the 
context and action arguments. This XML fragment is a special data structure, I 
need contruct a new class named XMLFragment to store it. Then, i should do the 
following jobs:
Parse the XML fragment into a XMLFragment object,mark it whether a complete XML 
document, any error happens,throw an exception. These classes can help me: 
a . org.apache.xerces.impl.XMLDocumentFragmentScannerImpl 
b . org.apache.xerces.impl.XMLDocumentScannerImpl.FragmentContentDispatcher
c . org.apache.xerces.impl.XMLEntityScanner
I can use some functions in these classes and start the parsing job by consult 
function startEntity in class org.apache.xerces.impl.XMLDocumentScannerImpl and 
function scanDocument in class 
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl. Here is the basic 
implement idea (in fact, this is a recursion process):
Create a XMLFragment instance,there is a very importmant attribute in it, we 
named it fCurrentNode;
Start read characters from the LSInput stream.If catch Node start character 
such as "<" from the input stream,go tostep 3; If catch Node end character such 
as "/" from the input stream,go to step 4; If catch file end up character,go to 
step 5. 
A begin EVENT happens, usually, instance a Node object,append the Node instance 
in to fCurrentNode's child node list,change fCurrentNode to this Node,then go 
to step 2. 
A end EVENT happens,usually,we should change fCurrentNode to its father 
node,then go to step 2.
Parsing job ends. 
Start add the XMLFragment in to the place indicated by the parameter action. In 
this phase, we have lots of validate jobs to do, including four aspects: Basic 
Validation, Namespace Validate, DTD Validation and Schema Validateoin.
Basic Validation: 
I should validate whether this merge job is legal, for example,if the context 
Node is document root element, and the parameter action is not 
ACTION_REPLACE_CHILDREN,in this situation,an error should be thrown up.
I should confirm the merge result XML document is well-formed,for example, the 
DOM should have only one root element and Entity declaration must be at the 
beginning part of the document etc. 
Namespace Validation: I should validate both Element namespace and Attribute 
namespace of the merge result XML document 
DTD Validationa: 
Validate whether the merge result XML document is in keeping with Element Type 
Declarationb.
 Validate whether the merge result XML document is in keeping with Entity 
Declarationc. 
Validate whether the merge result XML document is in keeping with Attribute 
Declaration, for example, if DTD file includes default attribute declaration,i 
should add default attributes for the elements which are root in LSInput 
Fragment. 
Schema Validation,This section includes validations demands in DTD Validation, 
and it has some more validation requests: 
Validate data type of elements and attributes
Three kinds of annotation declaration validation 
If everything is OK,return the result Node,otherwise if an error occurs, the 
caller is notified through the ErrorHandler instance associated with the 
"error-handler" parameter of the DOMConfiguration.As the new data is inserted 
into the document, at least one mutation event is fired per new immediate child 
or sibling of the context node.
Additional Information:
My development plan:



1st week in 1st month(May 24 - Jun 1)Read Xerces-J source code and get familiar 
with its architecture,thus what I have done will comply with its philosophy
2st week in 1st month(Jun 1 - Jun 8) Do some change job to DOMImplementationLS 
and DOMParserImpl,make DOMImplementationLS can create asynchronous LSParser and 
add some basic attribute for DOMParserImpl such as asynchronous flag and so on
3st week in 1st month(Jun 9 - Jun 16) Construct DOMParserImpl's structure to 
implement interface EventTarget,implement addActionListener,dispatchEvent and 
removeActionListener 
4st week in 1st month(Jun 17 - Jun 24) Implement LSParser parse() and 
parseURI() function, add asynochronous support implement LSParser function 
abort() implement LSParser function getAsync() implement LSParser function 
getBusy()
1st week in 2st month(Jun 25 - Jul 2) Implement interface LSLoadEvent and 
LSProgressEvent,finish the whole asynchronous parse cycle and some  unit test
2st week in 2st month(Jul 3- Jul 10) finish sub task of function 
parseWithContext() -- parse the LSInput into a XMLFragment instance 
3st week in 2st month(Jul 11- Jul 18) start merge context Node and XML fragment 
document,finish Basic Validation and Namespace Validation 
4st week in 2st month(Jul 19- Jul 26) finish the merge job of context DOM tree 
and the XMLFragment,finish DTD Validation and Schema Validation
1st week in 3rd month(Jul 27- Aug 3) Test My asynchronous LSParser and function 
parseWithContext
last 2 weeks in 3rd month(Aug 3 - Aug 20)  submit all codes and documents

Who i am ? 
Hi,everyone,My name is Yin Lei. I am a postgraduate student of University of 
Science and Technology Beijing,China. My major is computer scienece and 
technology. During my six years Java development experience, Apache help me so 
much, many projects such as Struts,Tomcat,Xerces,Xalan,HttpClient,Common 
FileUpload,JavaMail,POI play important part of my research projects. So, i am 
eager to participate in open source community and become a long term commiter 
of that project, in my daily work, i use Xerces as my XML parser, so, i found 
its lacking and want to improve it to make it perfect :-)
My work experience and relative rewards:
2007.7 - 2008.5 : work in SUN Microsystem Inc. as a intern
2008.7 - 2009.12 : work in IBM China Development Laborary as a intern
2008.9 : won excellent team member of 2008 IBM blue pathway program
2009.11: won Lotus Innovation Award of IBM Asia Pacific 
Also,i did some open source job before,the first experience I had in open 
source development is building a Eclipse plugin for Apache SCXML engine, and 
also attempt to add a new feature for SCXML engine to make it support 
multi-thread operation.I can code in C++, Java and some script language such as 
JaveScript and ActionScript. In addition to these things, I'm familiar with 
XML,DOM,SAX,JDOM and Dom4j,I want to improve existing XML parsing tools through 
my job.

2010-03-30 



xiaohei.leiyin 



发件人: Michael Glavassevich 
发送时间: 2010-03-30  20:10:21 
收件人: xiaohei.leiyin 
抄送: 
主题: Re: GSoC proposal about "Asynchronous LSParser and parseWithContext " 
 
Hi Yin,

Yes, that's fine. If your proposal is accepted for GSoC I would mentor you and 
I think that's what they're looking for there on the GSoC site.

There are usually several hundred proposals submitted to Apache every year for 
the various projects across the organization. It can be very competitive 
depending on the number of spots that Google actually awards to Apache and the 
number of good proposals submitted by students. I wish you good luck in the 
selection process.

Thanks.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: [email protected]
E-mail: [email protected]

"xiaohei.leiyin" <[email protected]> wrote on 03/30/2010 01:02:31 AM:

> Dear Michael,
>  
> I have modify my proposal follow your advise, and submit it in the 
> GSoC web site, i noticed that there is a item "Organization/
> Project:Assigned Mentor:" in the content section of the proposal 
> submit page. So, can i fill it "Organization/Project:Apache 
> Foundation/Xerces  Assigned Mentor:Michael Glavassevich", is it ok ?
> I mean that can i take you as my assigned mentor ? If you think it 
> is ok, i will maintain, if you do not like it due to some 
> reasons,please let me know, i will alter it ( it is ok, i must 
> respect you, in Chinese culture,you are my teacher already,respct 
> teach is a Chinese culture of long standing and well established, we
> call it 尊师重教[zun shi zhong jiao] ).
>  
> During these days,when i was researching Xerces' architect and 
> discovering how to implement Asynchronous LSParser and 
> parseWithContext for Xerces, i found i got lots of knowledge, made a
> full-grown progress. When i came across some difficulties, you 
> helped me a lot, in fact ,you are my mentor in my heart. Thank you 
> so so so so  much ! I think i have won knowledge no matter GSoC 
> receive my proposal or not, i will finish this project, once i began
> it, i want to finish it, for you, for open source.
>  
> Your student : Yin Lei from China
>  
> 2010-03-30 
> 
> xiaohei.leiyin 

Reply via email to