[Xerces Wiki] Update of "Proposal" by Thiwanka Somasiri

Apache Wiki Sat, 09 Apr 2011 08:39:06 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Xerces Wiki" for change 
notification.


The "Proposal" page has been changed by Thiwanka Somasiri.
http://wiki.apache.org/xerces/Proposal?action=diff&rev1=5&rev2=6

--------------------------------------------------

  ## page was copied from Thiwanka Somasiri
  #format wiki
  #language en
- == Thiwanka Somasiri ==
+ Proposal Title: [XERCESJ-1429] [GSoC]: Asynchronous LSParser and 
parseWithContext
  
- Email: <<MailTo(asthiwanka AT gmail DOT com)>>
+ Student Name: Thiwanka Somasiri
  
+ Student E-mail: [email protected] 
  
+ Organization/Project: The Apache Software Foundation / Xerces2-J
+ 
+ Assigned Mentor: Michael Glavassevich
+ 
+ 
+ 
+ == Proposal Abstract: ==
+ 
+ Apache Xerces2-J is a high performance and fully compliant XML parser written 
in Java to parse, validate and manipulate XML documents. The goal of this 
project is to complete the implementation of the DOM Level 3 LSParser. It has 
to focus on two areas that are still to be developed according to w3c 
recommendation. Namely : Asynchronous LSParser and parseWithContext
+ 
+ == Detailed Description: ==
+ 
+ The purpose of this project is to address two important features that are 
still not developed in the Apache Xerces-J according to the Document Object 
Model Load and Save section in w3c recommendation [1]. Those tasks are as 
follows:
+ 
+ 1. Implementation of Asynchronous version of LSParser
+ 
+ 2. Implementation of parseWithContext() functionality
+ 
+ A discussion on the high-level design of the project is carried out now 
onwards.
+ 
+ === 1. Implementation of Asynchronous version of LSParser ===
+ 
+ LSParser is an interface to an object that can build or augment a DOM tree 
using various input sources. It has an API for parsing XML and building the 
corresponding DOM structure [2]. But Xerces-J is lack of an asynchronous 
version of LSParser which returns from the parse method immediately and builds 
the DOM tree on another thread.
+ 
+ === High level implementation details for the asynchronous mode for LSParser 
===
+ 
+ According to the specification, an LSParser to be asynchronous, it has to be 
extended by EventTarget interface. In Xerces-J, the actual implementation for 
LSParser resides in org.apache.xerces.parsers.DOMParserImpl class. As a result, 
the asynchronous version of the LSParser can be implemented as an extension of 
DOMParserImpl. Extending EventTarget interface results in three method 
implementations in the DOMParserImpl as denoted below.
+ 
+ 1. void addEventListener(String type, EventListener listener,boolean 
useCapture)
+ 
+ This method helps the parser to allow register event listeners on the event 
target. When registering EventListeners there is a particular constraint 
specified in the w3c recommendation, not to allow multiple EventListeners to 
register in the same EventTarget with same parameters. To resolve this we can 
use a proper data structure (probably a HashSet in Java) and manipulate it in 
DOMParserImpl. Yet another implementation aspect is to avoid triggering 
EventListeners if they are added while EventTarget is in process.
+ 
+ 2. void removeEventListener(String type, EventListener listener, boolean 
useCapture)
+ 
+ This method is simply the other way of addEventListener().
+ 
+ 3. boolean dispatchEvent(Event event)
+ 
+ This method allows the dispatch of events into the implementations event 
model. The return value of this method indicates whether any of the 
EventListeners who handled the ‘event’ called preventDefault() method. If it is 
called the return value is “false” and “true” if not [8] [9].
+ 
+ Asynchronous LSParsers supports two such events. They are,
+ 
+ 1. Load
+ 
+ This event indicates the completion of loading document by LSParser and,
+ 
+ 2. Progress
+ 
+ The LSParser signals progress as data is parsed. The signaling process is 
completely implementation dependent since the specification does not define 
when the progress events should be dispatched.
+ 
+ ==== How the main thread and the thread that the asynchronous parser runs on 
interact - ====
+ 
+ In asynchronous LSParser, as the parse() method is called it may return with 
‘null’ immediately, because at this point the Document may have not been 
constructed. But the parser may continue parsing and at some point where it 
completes loading the Document, an LSLoadEvent is fired at the EventListener. 
The thread who invoked parse() method (say main thread) can continue with other 
work while the asynchronous LSParser is busy on another thread with creating 
the Document. The synchronous parser does the opposite of this staying to 
return until parsing has ended (existing DOMParserImpl.java class specifies 
this scenario).
+ 
+ ==== How the ‘busy’ flag’s value should behave - ====
+ 
+ Although the asynchronous parser returns with "null" immediately after main 
thread (or any other thread) invoke parse() method, the parser should remain in 
busy state. Once it has finished loading the Document, it can free the "busy" 
flag and give chance to any other thread who is waiting to invoke parse() in 
the asynchronous parser.
+ 
+ ==== What will abort() method will do after invocation - ====
+ 
+ When abort() method is called at a time the parser is busy, it should prevent 
loading the document. If the value of the "busy" flag is 'true', I have to set 
it to false and interrupt the current Thread (which runs the asynchronous 
parser). If the "busy" flag is 'false', abort() method will do nothing.
+ 
+ The following diagrams depict an example for how progress events are 
dispatched.
+ 
+  
+ 
+ 
+ Figure 1 - Progress Event when parser starts receiving data
+ 
+  
+ 
+ 
+ Figure 2 - Progress Event when parser processes blocks (2048 bytes) of data
+ 
+ The above mentioned events (see Figure 1 & Figure 2) can be implemented as 
LSLoadEvent and LSProgressEvent according to the specification. Since these 
classes are event oriented they should reside under 
org.apache.xerces.dom.events.* package and they should be extensions of base 
EventImpl class.
+ 
+ This is how EventListeners are triggered by particular event:
+ 
+  
+ 
+ 
+ Figure 3
+ 
+ When multiple event listeners are listening to the same event (say E1) all of 
them should be triggered upon the event (E1). The diagram below shows how 
EventListeners are triggered by a particular event (see Figure 3). As the 
EventListener(s) is triggered the handleEvent() method should be invoked[4].
+ 
+ Return value of parseURI() methods is dependent on the asynchronous property 
of the LSParser. At an instance where the document object may not be 
constructed the return value should be “null”. So this also should be addressed 
in the implementation which is somewhat similar to parse().
+ 
+ === 2. Implementation of parseWithContext() functionality ===
+ 
+ Implementation of parseWithContext() is the second part of the project which 
is a very important feature for XML application developers. At the moment 
Xerces-J does not support the facility to allow a document fragment to be 
parsed and attached to an existing DOM.
+ 
+ In order to insert a fragment to an existing document, the fragment should be 
identified by an LSInput. The parameters of method (namely: input, contextArg 
and action) defines the input identified by the LSInput, the node that is used 
as the context for the data that is being parsed and the action which should be 
taken between new set of nodes and existing children of the context node[7]. 
The “input” parameter for parseWithContext() should be an XML fragment 
(anything except a complete XML document), a DOCTYPE, entity declaration(s), 
etc. But there are some special cases where we should handle. One such case is 
the “input” being a whole XML document rather than an XML fragment and the 
“action” is ACTION_REPLACE_CHILDREN. Now this can be processed as a whole XML 
document just like the input was parsed using the regular LSParser.parse() 
method.
+ 
+ As per discussion with Michael Glavassevich in the mailing list[5], he 
suggested to implement the method (where context node is not a Document node) 
by synthesizing a wrapper XML document which contains a reference to the 
fragment and necessary content (for example, namespace declarations) required 
to parse the document. Later, the nodes created for the fragment can be 
transferred into the existing DOM. The following demonstration and diagram 
describes the high level approach (see Figure 4).
+ 
+ Consider the existing document :
+ 
+ <ns1:a xmlns:ns1=”http://ns1”>
+ 
+ <ns2:b xmlns:ns2=”http://ns2”>
+ 
+ </ns1:a>
+ 
+ Then we want to add the fragment below, as a child of “ns2:b” :
+ 
+ <ns2:c/><ns1:d/>
+ 
+ Then we can generate a wrapper document instead as follows :
+ 
+ <!DOCTYPE DUMMY_ROOT [
+ 
+  <!ENTITY fragment PUBLIC "***" "***">
+ 
+ ]>
+ 
+ <DUMMY_ROOT xmlns:ns1="http://ns1"; 
xmlns:ns2="http://ns2";>&fragment;</DUMMY_ROOT>
+ 
+ Here the “fragment” entity points to the XML fragment provided by the user 
and then we can parse this document as a normal XML document. Then we can merge 
new nodes underneath the entity reference with the existing document resulting:
+ 
+ <ns1:a xmlns:ns1=”http://ns1”>
+ 
+ <ns2:b xmlns:ns2=”http://ns2”><ns2:c/><ns1:d/></ns2:b>
+ 
+ </ns1:a>
+ 
+  
+ 
+ 
+ Figure 4 - High level approach for parseWithContext()
+ 
+  
+ 
+ == Deliverables ==
+ 
+ 1.      Source and build files for Asynchronous LSParser and 
parseWithContext()
+ 
+ 2.      Test cases to verify the required functionalities of the project
+ 
+ 3.      Necessary documentations/APIs
+ 
+ == Things done so far ==
+ 
+ 1.      Checked out and built the Apache Xerces-J trunk
+ 
+ 2.      Set up development environment
+ 
+ 3.      Went through the coding disciplines and styles
+ 
+ 4.      Went through the related classes and interfaces of the project (for 
example : DOMParserImpl, XMLFragmentScannerImpl, etc)
+ 
+ 5.      Went through many XML and XML Schema tutorials to gain a better 
understanding about XML
+ 
+ 6.      Went through the DOM Load and Save w3c recommendation multiple times 
to achieve a good understanding of the project
+ 
+  
+ 
+ == Development Schedule ==
+ 
+ Prior start coding I will improve areas that I have to be strong when 
proceeding with the project. I will go through XML & XML Schema which will be 
advantageous in the second part of the project (parseWithContext()). I will 
also attempt to come up with a more stable architecture, discussing with the 
mentor. In the upcoming four months I am looking forward to enjoy coding and 
hope to spend at least average of 35 hours per week for the project.
+ 
+ April 26 – May 22 
+   ü  Get to know the mentor and the community
+   ü  Go through the DOM Load and Save w3c recommendation
+   ü  Go through the project related classes and interfaces in the Xerces-J
+   ü  Prepare the development environment and familiarizing with the coding    
    standards
+   ü  Improve the technical skills that would be useful when proceeding with   
   the project
+   ü  Discuss the overall architecture with the mentor and do modifications
+ May23 – July 10
+   ü  Start coding Asynchronous LSParser
+   ü  Implement classes conforming to the agreed architecture with the         
         mentor
+   ü  Synchronize with the mentor and get advices to proceed
+   ü  Write test cases for Asynchronous LSParser and testing
+   ü  Prepare documentation & go through parseWithContext() architecture
+ July 11 – July 15
+   ü  Begin submitting mid-term evaluations
+ July 16 – Aug 14  
+   ü  Start implementing parseWithContext()
+   ü  Implement Dummy XML generator for XML fragments and its relatives
+   ü  Implement mechanisms to merge existing document with the newly           
     created nodes
+   ü  Synchronize with the mentor and get advices to proceed
+   ü  Write test cases to verify the correctness of parseWithContext()
+   ü  Finalize coding efforts and start documentation for the latter part of 
the        project
+  
+ August 15 – August 21
+   ü  Go through the work done up to now and check whether the project         
    fulfills the requirements and conforms to Apache standards
+   ü  Run all the test cases for verification
+   ü  Improve documentation
+   ü  Check the source code with the mentor to finalize the correctness of the 
      project
+ August 22
+ ü  Begin submitting final evaluations to Google
+ August 30
+ ü  Submit required code samples to Google
+  
+ 
+ == Community Interaction ==
+ 
+ From the moment I had a thought in my mind to join with a GSOC project, I was 
looking for opportunities that I might get in Apache Xerces-J. As a result, I 
went through the list of projects which were not covered in the earlier GSOCs 
for Xerces-J. Then I subscribed to the mailing list and asked the community 
whether I can hold this project. Sooner I got a reply from Michael 
Glavassevich(Project Lead) saying that this project is still available for 
2011. Then I checked out and built the Apache Xerces-J trunk and went through 
the coding disciplines of the project to gain a simple idea of how a massive 
project behaves. Later I started a discussion with Michael about the project 
under the topic “New to Apcahe Xerces” and continued the discussion over two 
months period and achieved a good overall understanding about the project[6].
+ 
- == About me ==
+ == About Me ==
-  I am an undergraduate from Department of Computer Science and Engineering, 
University of Moratuwa, Sri Lanka.
  
+ I am Thiwanka Somasiri, a final year undergraduate from Department of 
Computer Science and Engineering, University of Moratuwa, Sri Lanka. I have few 
years of coding experience in Java and C programming languages. I have made use 
of XML parsing in one of projects (developed using J2SE) in my internship to 
provide a customizable feature for the feasibility of Quality Assurance people. 
I have some open source software development experience by developing a add-on 
for bulk image downloading purposes for Mozilla Firefox using JavaScript and 
SUL.
- ----
- CategoryHomepage
  
+ As my final year group project, I am implementing a language independent web 
personalization framework for e-commerce applications which analyzes user 
navigations, user activity history, etc using concepts such as Data Mining, 
Machine Learning, etc. To achieve independency, we are using XML in this 
project and hope that it will be very useful when continuing with this project.
+ 
+ It is my pleasure to have an opportunity to contribute to a massive 
organization like Apache and I would like to continue working with Apache 
Xerces-J and to be a committer in near future. The Xerces-J community is very 
supportive for the newcomers and I would always encourage beginners to join 
with this fantastic community and work together.
+ 
+ == References ==
+ 
+ [1]. http://www.w3.org/TR/2004/REC-DOM-Level-3-LS-20040407/load-save.html
+ 
+ [2]. 
http://www.w3.org/TR/2004/REC-DOM-Level-3-LS-20040407/load-save.html#LS-LSParser
+ 
+ [3]. http://www.w3.org/TR/DOM-Level-2-Events/events.html#Events-EventTarget
+ 
+ [4]. http://www.w3.org/TR/DOM-Level-2-Events/events.html#Events-EventListener
+ 
+ [5]. 
http://markmail.org/thread/x4adifemzae5comi#query:+page:1+mid:ir6z3e3ryjvhoa4w+state:results
+ 
+ [6]. http://markmail.org/thread/x4adifemzae5comi
+ 
+ [7]. 
http://www.w3.org/TR/2004/REC-DOM-Level-3-LS-20040407/load-save.html#LS-LSInput
+ 
+ [8]. 
http://www.w3.org/TR/DOM-Level-2-Events/events.html#Events-Registration-interfaces
+ 
+ [9]. http://www.w3.org/TR/DOM-Level-2-Events/events.html#Events-EventListener
+ 

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[Xerces Wiki] Update of "Proposal" by Thiwanka Somasiri

Reply via email to