[ 
https://issues.apache.org/jira/browse/CAMEL-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303335#comment-14303335
 ] 

Stephan Siano commented on CAMEL-8273:
--------------------------------------

OK, I think I have a better grasp of the problem now:

A good solution for a single XPath evaluation from some byte/stream-like source 
using Saxon is to convert the data to SAXSource and then feed it into the XPath 
evaluation (the JDK parser only supports Node and InputSource and the latter 
allows XXE injection).

So far so good. However if we have two consecutive XPath evaluations the 
SAXSource way is not so good:
let's assume we have an InputStream and are doing a setBody with an XPath 
expression that has a SAXSource as documentType and returns a Document (which 
is a DocumentOverNodeInfo). If we now do another XPath evaluation (with 
SAXSource documentType) on the same data the Document will be converted to a 
String in oder to be wrapped into a SAXSource. Saxon's XPath evaluator will 
then build another TinyTree from that data, So we end up with two TinyTrees and 
a String in Memory. Let's assume the TinyTrees consume 4 times the binary 
document size in Memory each and the String consumes two time the binary 
document, we end up with ten times the binary document size, which is about as 
much as a DOM document consumes from the beginning (and we are parsing an XML 
document again that was already parsed).

What do you think about the following approach?
We consider Node input as first class citizen and we are not doing any type 
conversion on that (as we can safely assume that each XPath implementation will 
be able to handle these) no matter which value documentType has.


> More flexible selection of default documentType in XPath expressions
> --------------------------------------------------------------------
>
>                 Key: CAMEL-8273
>                 URL: https://issues.apache.org/jira/browse/CAMEL-8273
>             Project: Camel
>          Issue Type: Improvement
>          Components: camel-core
>            Reporter: Stephan Siano
>            Assignee: Claus Ibsen
>             Fix For: Future
>
>         Attachments: 
> 0001-CAMEL-8273-More-flexible-selection-of-default-docume.patch
>
>
> In the current implementation of XPath if no documentType is defined (likely 
> in most cases) the document used for XPath evaluation is parsed into a (DOM) 
> Document using the JDK XML parser before applying the XPath expression on it.
> For large documents this might be resource intensive, especially if the XPath 
> is evaluated using a more efficient parser like Saxon.
> With the current implementation it is possible to workaround this by setting 
> a documentType attribute to the XPath expression, but doing this efficiently 
> requires some internal knowledge about the previous component in the camel 
> route (which type it creates) and the qualities of the used XML parser (e.g. 
> the JDK parser accepts only InputSource and Node as input types for XPath 
> evaluation whereas Saxon does also support other types like SAXSource).
> The attached patch will make the data type used by default for XPath 
> evaluation more flexible (depending on the type of the input).
> There are two cases to differentiate:
> documentType is set on the XPath expression:
> current implementation:
> 1. try to convert to the documentType
> 2. if that fails do some extra conversions for some additional data types 
> (WrappedFile, BeanInvocation, String)
> 3. if that fails throw an exception
> new implementation:
> 1. try to convert to the documentType
> 2. if that fails, use the message if it is of type Node, InputSource or 
> DOMSource or do some type conversions for specific data types (WrappedFile, 
> BeanInvocation, String, InputStream, Reader, byte[]...)
> 3. if that fails throw an exception
> documentType is not set on the XPath expresson
> old implementation:
> this is actually the same as if documentType was set to Document
> new implementation:
> 1. Use the message if it is of type Node, InputSource or DOMSource or do some 
> type conversions for specific data types (WrappedFile, BeanInvocation, 
> String, InputStream, Reader, byte[]...) (to InputSource)
> 2. If the old message is not of one of the types above, convert to DOM 
> Document
> 3. If this fails throw an Exception



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to