[ https://issues.apache.org/jira/browse/NUTCH-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16182445#comment-16182445 ]
Sebastian Nagel commented on NUTCH-2435: ---------------------------------------- +1 -- a valid use case (of course, not the most common one). PR looks good except formatting. > New configuration allowing to choose whether to store 'parse_text' directory > or not. > ------------------------------------------------------------------------------------ > > Key: NUTCH-2435 > URL: https://issues.apache.org/jira/browse/NUTCH-2435 > Project: Nutch > Issue Type: New Feature > Components: parser > Affects Versions: 1.13 > Environment: Apach Nutch 1.13 > Reporter: Marcos Bori > > Whenever a page is parsed, one of the outputs is the directory 'parse_text'. > It is intended to be used at the indexing phase so the page can be searched > from a search engine such as Solr. > In my special crawling case, I don't need to index the page contents. > Therefore, creating and filing the 'parse_text' is not required for me. To > optimize performance, I don't want the crawler to store this information to > the filesystem. > I propose a new parameter "parser.store.text" allowing to choose whether to > store 'parse_text' directory or not. Its default value, of course, is "true". -- This message was sent by Atlassian JIRA (v6.4.14#64029)