Thanks for you interest, James.


> Does this sound like a good idea to you? 

Yes it does ! (see 'performance' and 'XSLT conflict resolution' chapters below).

> Do you want to try build one or shall I?

I took some time to learn about XPath and Jaxen. Since I don't need to know everything 
on them, I think I can try. It is probable that I'll miss some key features, making my 
code more complicated than needed. But it's an interesting challenge anyways !


Performance
-----------
I'm afraid the element of the ElementHandler lookup taking too much overhead (right or 
wrong ?). After all, users should not wait for an ElementHandler lookup if they don't 
need use ElementHandlers. I propose that you set org.dom4j.io.DispatchHandler.handlers 
to null by default, meaning nothing should be done. When an ElementHandler is added, 
the HashMap would be created if null. This looks like an acceptable compromise for 
satisfying everybody.


XSLT 'conflict' resolution
--------------------------

> Currently we have an implementation of XSLT patterns in the 
> org.dom4j.rule package - this actually contains an implementation 
> of the XSLT rule engine (or 'processing model' as its referred 
> to in the XSLT spec). XSLT is a similiar thing to what we've just 
> described - handlers are associated with different XSLT patterns. 
> Though in XSLT there is a 'conflict resolution' protocol to ensure 
> that only *one* handler gets called for a single pattern 

> I'd expect that this 'only one handler gets called' part of XSLT 
> is not needed in this case, we'd want to notify all handlers that 
> match a given element.

You're right to point out the 'conflict' aspect, which is not trivial.
My opinion is that it's harder to debug when you expect an event to happen, and it 
doesn't happen (in this case, because it happens elsewhere). Following this idea, I'd 
have to throw something like a DuplicatePathException if two compatible paths were 
added (if it can crash, make it crash soon).

The problem is that duplicate XPaths cannot be detected before parsing. I thought 
about an option like an allowMultipleHandlerCalls property somewhere but it's not 
consistent with XPath logic, and makes the things confusing.

http://www.w3.org/TR/xslt11/#conflict says : 
<< It is an error if this leaves more than one matching template rule. An XSLT 
processor may signal the error; if it does not signal the error, it must recover by 
choosing, from amongst the matching template rules that are left, the one that occurs 
last in the stylesheet. >>

This sounds quite reasonable. I propose the following :

- a org.dom4j.io.MultipleElementHandlersException (derived from SAXPathException) is 
thrown when more than one path matches to an Element,

- a DispatchHandler( org.xml.sax.ErrorHandler errorHandler ) constructor is added to 
org.dom4j.io.DispatchHandler, setting a private errorHandler member.

- when multiple handlers are found by the org.dom4j.io.DispatchHandler, the 
errorHandler.warning() method is invoked with a MultipleElementHandlersException 
instance. If the errorHandler is null, nothing is done, allowing document processing 
to go on its way.

- org.dom4j.io.SAXHandler creates the DispatchHandler with a reference to this 
(providing an ErrorHandler).

At least, this allows the problem of several matches not to be lost if the user 
provided an ErrorHandler which keeps track of the warnings. Maybe you'll find useful 
to add this behavior to the current org.dom4j.rule.RuleSet implementation. The main 
drawback I see is that it requires full scan of the Rule array, instead of stopping on 
first occurence found.

Since I'm quite a beginner in the development of XML framework, I don't know if this 
proposal is a good idea, or simply overkill.


Use of ElementHandlers
----------------------

> The original use case for ElementHandler was simply to chunk 
> an XML document into 'rows' when processing massive documents. 
> I think there is another use case, splitting the document into
> parts based on XSLT Patterns, which each get processed as they
> arrive by handlers - rather than building the whole document 
> then navigating to find the parts.

My intend is to build buisness objects, using the ElementHandlers. I want to shape my 
XML in the most readable form (it will be accessed by humans), and I don't want to 
sacrify the design of my buisness objects. So I accept this "manual" implementation of 
persistence.

Remark : putting the XML into the buisness model could be done with Sun's JAXB also, 
but I don't want to create Java classes with a DTD. I want to create Java classes in 
Java ! 

But I think that if you want to prune your DOM when dealing with big documents, it 
would be faster to avoid creating unnecessary Nodes. An interesting answer is provided 
by the Xerces Native Interface (http://xml.apache.org/xerces2-j/xni.html). It allows 
to reshape SAX events the way you want before DOM creation. On the other hand, it has 
noticeable drawbacks at this time (at least because it works only with Xerces).


Best regards.

Laurent

P.S. I found logical to reply to [EMAIL PROTECTED] instead of your 
personal address. I'm new to open sourced projects, please let me know if it is not a 
correct habit.


------------------------------------------------------------
NetCourrier, votre bureau virtuel sur Internet : Mail, Agenda, Clubs, Toolbar...
Une gamme d'outils gratuits et performants à votre service.
Web/Wap : www.netcourrier.com
Téléphone/Fax : 08 92 69 00 21 (0,34 E TTC/min - 2,21 F TTC/min)
Minitel: 3615 NETCOURRIER (0,15 E TTC/min - 1,00 F TTC/min)


_______________________________________________
dom4j-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dom4j-dev

Reply via email to