[
https://issues.apache.org/jira/browse/NUTCH-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16538264#comment-16538264
]
ASF GitHub Bot commented on NUTCH-1541:
---------------------------------------
sebastian-nagel commented on issue #294: NUTCH-1541 Indexer plugin to write CSV
URL: https://github.com/apache/nutch/pull/294#issuecomment-403747396
+1 Please go ahead and merge. Thanks, @r0ann3l!
- unit tests pass
- successfully indexed into CSV using default configuration:
```
% bin/nutch index -Dplugin.includes='indexer-csv|index-(basic|anchor|more)' \
crawl/crawldb/ -dir crawl/segments/ -noCommit -deleteGone
Indexer: number of documents indexed, deleted, or skipped:
Indexer: 4 deleted (gone)
Indexer: 35 indexed (add/update)
% head -2 csvindexwriter/nutch.csv
id,title,content
http://nutch.apache.org/,Apache Nutchâ„¢ -,"Apache Nutchâ„¢ -
```
- I had to remove or change exchange.xml to avoid that the Exchange
component still tries to route documents to indexer_solr_1, see NUTCH-2617
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Indexer plugin to write CSV
> ---------------------------
>
> Key: NUTCH-1541
> URL: https://issues.apache.org/jira/browse/NUTCH-1541
> Project: Nutch
> Issue Type: New Feature
> Components: indexer
> Affects Versions: 1.7
> Reporter: Sebastian Nagel
> Priority: Minor
> Fix For: 1.15
>
> Attachments: NUTCH-1541-v1.patch, NUTCH-1541-v2.patch
>
>
> With the new pluggable indexer a simple plugin would be handy to write
> configurable fields into a CSV file - for further analysis or just for export.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)