[ 
https://issues.apache.org/jira/browse/ANY23-76?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney closed ANY23-76.
-------------------------------------


Bulk close for 0.7.0-incubating release
                
> Improve runtime of the Microformat extractor on documents with many relations.
> ------------------------------------------------------------------------------
>
>                 Key: ANY23-76
>                 URL: https://issues.apache.org/jira/browse/ANY23-76
>             Project: Apache Any23
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 0.7.0
>            Reporter: Timothy Potter
>            Assignee: Michele Mostarda
>            Priority: Trivial
>             Fix For: 0.7.0
>
>         Attachments: MicroformatSpeed.patch
>
>
> For some large documents with many Microformat tuples the extensive use of 
> XPath in the DomUtils class cause Microformat extraction to be slow.   I've 
> market this as trivial as it's a corner case. 
> To reproduce the problem the patch addresses, run the Microformat extractor 
> on the folloing url:
> http://en.wikipedia.org/wiki/List_of_Nike_missile_locations
> I include a patch that improves performance at the cost of code simplicity.  
> I hope someone who is more involved in the project can decide if it's a good 
> idea to use the patch or not, or maybe address this issue in another way..  
> The patch replaces commonly used XPath queries with DOM tree traversals.  Eg. 
> getting all nodes with 'class' attributes.  On my machine the time to parse 
> the given document is reduced from around 105 seconds to 14 seconds.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to