[ 
https://issues.apache.org/jira/browse/NUTCH-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293510#comment-13293510
 ] 

Ferdy Galema commented on NUTCH-1356:
-------------------------------------

Thanks.

"The parser threads you refer to, is that a known problem? Can we solve it?"
To solve it correctly every parser should check the interrupted state at 
regular intervals. This is pretty huge task considering the amount of parsers. 
For now it is something to keep in mind. I'll create an issue for reference.
                
> ParseUtil use ExecutorService instead of manually thread handling.
> ------------------------------------------------------------------
>
>                 Key: NUTCH-1356
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1356
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Ferdy Galema
>             Fix For: nutchgora, 1.6
>
>         Attachments: NUTCH-1356-trunk-v2.patch, NUTCH-1356-trunk.patch, 
> NUTCH-1356.patch
>
>
> Because ParseUtil manages it's own parser threads by creating a thread for 
> every parse it sometimes happens that specific parsers are very expensive. 
> For example, parsers that have threadlocal fields will initialize them for 
> every item to be parsed.
> By simply introducing a caching ExecutorService the ParseUtil will be able to 
> cache threads therefore parsing more efficient. See attached patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to