[
https://issues.apache.org/jira/browse/NUTCH-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel reassigned NUTCH-2435:
--------------------------------------
Assignee: Sebastian Nagel
> New configuration allowing to choose whether to store 'parse_text' directory
> or not.
> ------------------------------------------------------------------------------------
>
> Key: NUTCH-2435
> URL: https://issues.apache.org/jira/browse/NUTCH-2435
> Project: Nutch
> Issue Type: New Feature
> Components: parser
> Affects Versions: 1.13
> Environment: Apach Nutch 1.13
> Reporter: Marcos Bori
> Assignee: Sebastian Nagel
> Fix For: 1.14
>
>
> Whenever a page is parsed, one of the outputs is the directory 'parse_text'.
> It is intended to be used at the indexing phase so the page can be searched
> from a search engine such as Solr.
> In my special crawling case, I don't need to index the page contents.
> Therefore, creating and filing the 'parse_text' is not required for me. To
> optimize performance, I don't want the crawler to store this information to
> the filesystem.
> I propose a new parameter "parser.store.text" allowing to choose whether to
> store 'parse_text' directory or not. Its default value, of course, is "true".
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)