[
https://issues.apache.org/jira/browse/ANY23-75?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287867#comment-13287867
]
Hudson commented on ANY23-75:
-----------------------------
Integrated in Any23-trunk #220 (See
[https://builds.apache.org/job/Any23-trunk/220/])
Improved MicrodataParser performances. Related to issue #ANY23-75.
(Revision 1345154)
Result = SUCCESS
mostarda :
Files :
*
/incubator/any23/trunk/core/src/main/java/org/apache/any23/extractor/microdata/MicrodataParser.java
> Improve runtime of the Microdata extractor on documents with many relations.
> ----------------------------------------------------------------------------
>
> Key: ANY23-75
> URL: https://issues.apache.org/jira/browse/ANY23-75
> Project: Apache Any23
> Issue Type: Improvement
> Affects Versions: 0.7.0
> Reporter: Timothy Potter
> Assignee: Michele Mostarda
> Fix For: 0.7.0
>
> Attachments: MicrodataParser.diff
>
>
> I've been running Any23 on a big web crawler dump. I found for certain
> documents with a lot of Microdata relations the method
> MicrodataParser.getItemProps() becomes very slow. As a result, processing one
> document can take several minutes. An example of a problematic page can be
> seen here: http://dreamtime.fftunes.com/
> I'll attach a patch for the method that greatly improves the performance of
> this method. I was wondering if someone could have a look at it and include
> it in the next release if possible.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira