Re: Slow access to XML node attribute because of XPath in NodeModel.get()?

Christoph Rüger Thu, 12 Oct 2017 23:02:17 -0700

2017-10-13 0:33 GMT+02:00 Daniel Dekany <[email protected]>:

> Actually, in `someNode.something` (which is the same as
> `someNode["something"]`) the "something" meant to be XPath (appart
> from a few FreeMarker-specific things).


But as you have seen in some
> very simple cases we solve the XPath query "natively" instead of
> delegating it to XPath.


Thanks Daniel, good to know. So there probably was a reason for this too in
the past.

Now if profiling reliably(!) shows that it's
> indeed XPath in this case that's so slow, then it seems it would be

better if we take over even more cases from XPath, such as
> "@something"... I will have to look into that though. (This can be
> tricky though, as you replace the implementation of that kind of query
> under thousands of already working applications, so it really has to
> behave exactly like the real thing earlier.


I agree, this is very risky and my poor-mans "benchmark" is not reliable at
all. I can try if I can find some time to come up with a programmatic
test-case maybe which makes it better visible.
Your insights tell me so far that we better go with the custom
TemplateMethodModel for now (shortterm) to solve our specific business
problem. Less risk.

I will try fiddling around with the sources of
freemarker.ext.dom.NodeModel.get(String)
etc. which already do some "key.startsWith("@@")" checks. Maybe I find the
place where to add an improvement and try it out.


> That's unlike with
> product.PRODUCT_DETAILS which was never delegated to the real XPath
> implementation.)
>

> Thursday, October 12, 2017, 7:35:42 PM, Christoph Rüger wrote:
>
> > Hi,
> >
> > we have the following XML snippet:
> >
> > <PRODUCT_DETAILS>
> >   <INTERNATIONAL_PID *type="GTIN"*>4011395534809</INTERNATIONAL_PID>
> > </PRODUCT_DETAILS>
> >
> > We iterate over XML-Nodes like this:
> >
> > <#list product["PRODUCT_DETAILS"]?children as detail>
>
> Note that you can write `product["PRODUCT_DETAILS"]` as
> `product.PRODUCT_DETAILS`.
>

Ok. Usually we do. Not sure why we didn't here. Was a code snippet from a
colleague :)


>
> >   nodeValue: ${detail!}
> > </#list>
> >
> > This code snippet is pretty fast (result appears after e.g. 3 seconds).
> But
> > when we turn it into this and try to add the node attribute "type"...
> >
> > <#list product["PRODUCT_DETAILS"]?children as detail>
> >  <#if detail.*@type*[0]?? && detail.*@type*[0] == "GTIN">
> >   node*Attribute* type: ${detail.@type}
> >   nodeValue: ${detail!}
> >  </#if>
> > </#list>
> >
> > ...it gets very very slow (result appears after > 1 minute). We have a
> large XML with >> 1000 products here.
> >
> > I have digged a little bit in the code and found that when the check for
> > *detail.@type[0]??* is in the mix, then
> > *freemarker.ext.dom.NodeModel.get(String) *is jumping into the
> else-branch
> > which uses XPath:
> >
> >  public TemplateModel get(String key) throws TemplateModelException {
> >
> >    if (key.startsWith("@@")) {
> >     // ....
> >
> > }
> >
> > else{
> >
> > XPathSupport xps = getXPathSupport();
> >           if (xps != null) {
> >
> > *return xps.executeQuery(node, key);*            }
> >
> > }
> >
> > This *xps.executeQuery(node, key); *seems to be the reason why it gets
> > slow. Internally it
> > calls
> > freemarker.ext.dom.SunInternalXalanXPathSupport.executeQuery(Object,
> > String)
> >
> > To work around this problem I have tried creating an own
> > TemplateMethodModel ${nodeAttrib(node, attributename)}
> >
> > where I do something like this (I know it's ugly but experimental...) :
> >
> > new TemplateMethodModelEx(){
> >
> > public Object exec(List arg0) throws TemplateModelException {
> >
> > Object node = arg0.get(0);
> >
> > Object attribute = arg0.get(1);
> >
> > if(node instanceof NodeModel) {
> >
> > NamedNodeMap attributes = ((NodeModel) node).getNode().getAttributes();
> >
> > if(attributes != null && attributes.getLength() > 0) {
> >
> > Node namedItem = *attributes.getNamedItem(attribute.toString())*;
> >
> > if(namedItem != null) {
> >
> > System.out.println(namedItem.getNodeValue());
> >
> > return namedItem.getNodeValue();
> >
> > }
> >
> > else {
> >
> > return "";
> >
> > }
> >
> > }
> >
> > }
> >
> > return "";
> >
> > }
> >
> >
> > When we then use this it is much faster:
> > <#list product["PRODUCT_DETAILS"]?children as detail>
> >   <#if *nodeAttribute(details, "type")!* == "GTIN" >
> >    node*Attribute* type: ${*nodeAttribute(details, "type")!*}
> >    nodeValue: ${detail!}
> >   </#if>
> > </#list>
> >
> > It seems that NamedNodeMap.*getNamedItem() *much faster and does not use
> > XPath. Although it is also iterating internally over the whole array to
> > find the index of the attribute - the overhead seems to be less then
> XPath.
> >
> > My questions are:
> > 1. What is the reason that *detail.@type[0]?? *causes a heavy XPath
> > evaluation under the hood?
> > 2. Are we doing something wrong?
> > 3. Should we go the custom TemplateMethodModel way? Is this approach ok?
> >
> > I hope this was understandable.
> >
> > Thanks
> > Christoph
> >
>
> --
> Thanks,
>  Daniel Dekany
>
>


-- 
Christoph Rüger, Geschäftsführer
Synesty <https://synesty.com/> - Anbinden und Automatisieren ohne
Programmieren - Automatisierung, Schnittstellen, Datenfeeds

Xing: https://www.xing.com/profile/Christoph_Rueger2
LinkedIn: http://www.linkedin.com/pub/christoph-rueger/a/685/198

-- 
Synesty GmbH
Moritz-von-Rohr-Str. 1a
07745 Jena
Tel.: +49 3641 559649
Fax.: +49 3641 5596499
Internet: http://synesty.com

Geschäftsführer: Christoph Rüger
Unternehmenssitz: Jena
Handelsregister B beim Amtsgericht: Jena
Handelsregister-Nummer: HRB 508766
Ust-IdNr.: DE287564982

Re: Slow access to XML node attribute because of XPath in NodeModel.get()?

Reply via email to