[ 
https://issues.apache.org/jira/browse/NUTCH-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16182445#comment-16182445
 ] 

Sebastian Nagel commented on NUTCH-2435:
----------------------------------------

+1 -- a valid use case (of course, not the most common one).

PR looks good except formatting.

> New configuration allowing to choose whether to store 'parse_text' directory 
> or not.
> ------------------------------------------------------------------------------------
>
>                 Key: NUTCH-2435
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2435
>             Project: Nutch
>          Issue Type: New Feature
>          Components: parser
>    Affects Versions: 1.13
>         Environment: Apach Nutch 1.13
>            Reporter: Marcos Bori
>
> Whenever a page is parsed, one of the outputs is the directory 'parse_text'.
> It is intended to be used at the indexing phase so the page can be searched 
> from a search engine such as Solr.
> In my special crawling case, I don't need to index the page contents. 
> Therefore, creating and filing the 'parse_text' is not required for me. To 
> optimize performance, I don't want the crawler to store this information to 
> the filesystem. 
> I propose a new parameter "parser.store.text" allowing to choose whether to 
> store 'parse_text' directory or not. Its default value, of course, is "true".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to