[ 
https://issues.apache.org/jira/browse/CAMEL-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300929#comment-14300929
 ] 

Stephan Siano commented on CAMEL-8273:
--------------------------------------

Saxon cannot do XPath in streaming mode (I actually don't think that this is 
even possible to have a full XPath implementation with streaming), but it 
supports XPath with TinyTree (which is much smaller than the Xerces DOM). If 
the XML parsing is done during the XPath evaluation (the document it provided 
not as a DOM tree but something else like InputSource) Saxon will parse into 
that TinyTree, which was actually the purpose of my patch. Unfortunately I 
overlooked the XXE thing.

I think I will check two things now:
1. whether Saxon will also allow XXE attacks if some non parsed type (like 
InputSource) is used for the conversion
2. If that is the case convert to NodeInfo (which is the Saxon interface for 
DOM-Like nodes (the TinyTree is a implementation of that)) and do the XPath 
parsing with that.

Both ways require to set the documentInfo parameter to something else than 
Document. Unfortunately I don't see a way to do that automatically in case 
saxon is used...

> More flexible selection of default documentType in XPath expressions
> --------------------------------------------------------------------
>
>                 Key: CAMEL-8273
>                 URL: https://issues.apache.org/jira/browse/CAMEL-8273
>             Project: Camel
>          Issue Type: Improvement
>          Components: camel-core
>            Reporter: Stephan Siano
>            Assignee: Claus Ibsen
>             Fix For: Future
>
>         Attachments: 
> 0001-CAMEL-8273-More-flexible-selection-of-default-docume.patch
>
>
> In the current implementation of XPath if no documentType is defined (likely 
> in most cases) the document used for XPath evaluation is parsed into a (DOM) 
> Document using the JDK XML parser before applying the XPath expression on it.
> For large documents this might be resource intensive, especially if the XPath 
> is evaluated using a more efficient parser like Saxon.
> With the current implementation it is possible to workaround this by setting 
> a documentType attribute to the XPath expression, but doing this efficiently 
> requires some internal knowledge about the previous component in the camel 
> route (which type it creates) and the qualities of the used XML parser (e.g. 
> the JDK parser accepts only InputSource and Node as input types for XPath 
> evaluation whereas Saxon does also support other types like SAXSource).
> The attached patch will make the data type used by default for XPath 
> evaluation more flexible (depending on the type of the input).
> There are two cases to differentiate:
> documentType is set on the XPath expression:
> current implementation:
> 1. try to convert to the documentType
> 2. if that fails do some extra conversions for some additional data types 
> (WrappedFile, BeanInvocation, String)
> 3. if that fails throw an exception
> new implementation:
> 1. try to convert to the documentType
> 2. if that fails, use the message if it is of type Node, InputSource or 
> DOMSource or do some type conversions for specific data types (WrappedFile, 
> BeanInvocation, String, InputStream, Reader, byte[]...)
> 3. if that fails throw an exception
> documentType is not set on the XPath expresson
> old implementation:
> this is actually the same as if documentType was set to Document
> new implementation:
> 1. Use the message if it is of type Node, InputSource or DOMSource or do some 
> type conversions for specific data types (WrappedFile, BeanInvocation, 
> String, InputStream, Reader, byte[]...) (to InputSource)
> 2. If the old message is not of one of the types above, convert to DOM 
> Document
> 3. If this fails throw an Exception



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to