[ 
https://issues.apache.org/jira/browse/NUTCH-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel resolved NUTCH-2435.
------------------------------------
    Resolution: Fixed

> New configuration allowing to choose whether to store 'parse_text' directory 
> or not.
> ------------------------------------------------------------------------------------
>
>                 Key: NUTCH-2435
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2435
>             Project: Nutch
>          Issue Type: New Feature
>          Components: parser
>    Affects Versions: 1.13
>         Environment: Apach Nutch 1.13
>            Reporter: Marcos Bori
>            Assignee: Sebastian Nagel
>             Fix For: 1.14
>
>
> Whenever a page is parsed, one of the outputs is the directory 'parse_text'.
> It is intended to be used at the indexing phase so the page can be searched 
> from a search engine such as Solr.
> In my special crawling case, I don't need to index the page contents. 
> Therefore, creating and filing the 'parse_text' is not required for me. To 
> optimize performance, I don't want the crawler to store this information to 
> the filesystem. 
> I propose a new parameter "parser.store.text" allowing to choose whether to 
> store 'parse_text' directory or not. Its default value, of course, is "true".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to