[
https://issues.apache.org/jira/browse/ANY23-76?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Timothy Potter updated ANY23-76:
--------------------------------
Attachment: (was: MicroformatSpeed.patch)
> Improve runtime of the Microformat extractor on documents with many relations.
> ------------------------------------------------------------------------------
>
> Key: ANY23-76
> URL: https://issues.apache.org/jira/browse/ANY23-76
> Project: Apache Any23
> Issue Type: Improvement
> Reporter: Timothy Potter
> Priority: Trivial
> Attachments: MicroformatSpeed.patch
>
>
> For some large documents with many Microformat tuples the extensive use of
> XPath in the DomUtils class cause Microformat extraction to be slow. I've
> market this as trivial as it's a corner case.
> To reproduce the problem the patch addresses, run the Microformat extractor
> on the folloing url:
> http://en.wikipedia.org/wiki/List_of_Nike_missile_locations
> I include a patch that improves performance at the cost of code simplicity.
> I hope someone who is more involved in the project can decide if it's a good
> idea to use the patch or not, or maybe address this issue in another way..
> The patch replaces commonly used XPath queries with DOM tree traversals. Eg.
> getting all nodes with 'class' attributes. On my machine the time to parse
> the given document is reduced from around 105 seconds to 14 seconds.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira