[jira] [Updated] (ANY23-75) Improve runtime of the Microdata extractor on documents with many relations.

Lewis John McGibbney (Updated) (JIRA) Fri, 13 Apr 2012 04:11:44 -0700

     [ 
https://issues.apache.org/jira/browse/ANY23-75?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Lewis John McGibbney updated ANY23-75:
--------------------------------------

    Affects Version/s: 0.7.0
        Fix Version/s: 0.7.0

Setting for 0.7.0-incubating.
Is it possible for you to explain a bit about the patch and the underlying 
reason as to why the existing parser implementation seems to clog up? Also this 
is really trivial but can you please have a look at the coding format if it 
differs or not. My initial thoughts of the patch are great, it's a nice one to 
have caught, but some additional explanation would really help us out. Thank 
you very much. Lewis
                
> Improve runtime of the Microdata extractor on documents with many relations.
> ----------------------------------------------------------------------------
>
>                 Key: ANY23-75
>                 URL: https://issues.apache.org/jira/browse/ANY23-75
>             Project: Apache Any23
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Timothy Potter
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: MicrodataParser.diff
>
>
> I've been running Any23 on a big web crawler dump.  I found for certain 
> documents with a lot of Microdata relations the method 
> MicrodataParser.getItemProps() becomes very slow. As a result, processing one 
> document can take several minutes.   An example of a problematic page can be 
> seen here: http://dreamtime.fftunes.com/
> I'll attach a patch for the method that greatly improves the performance of 
> this method.  I was wondering if someone could have a look at it and include 
> it in the next release if possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ANY23-75) Improve runtime of the Microdata extractor on documents with many relations.

Reply via email to