Improve runtime of the Microdata extractor on documents with many relations.
----------------------------------------------------------------------------
Key: ANY23-75
URL: https://issues.apache.org/jira/browse/ANY23-75
Project: Apache Any23
Issue Type: Improvement
Reporter: Timothy Potter
Priority: Minor
I've been running Any23 on a big web crawler dump. I found for certain
documents with a lot of Microdata relations the method
MicrodataParser.getItemProps() becomes very slow. As a result, processing one
document can take several minutes. An example of a problematic page can be
seen here: http://dreamtime.fftunes.com/
I'll attach a patch for the method that greatly improves the performance of
this method. I was wondering if someone could have a look at it and include it
in the next release if possible.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira