[ 
https://issues.apache.org/jira/browse/NUTCH-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16538264#comment-16538264
 ] 

ASF GitHub Bot commented on NUTCH-1541:
---------------------------------------

sebastian-nagel commented on issue #294: NUTCH-1541 Indexer plugin to write CSV
URL: https://github.com/apache/nutch/pull/294#issuecomment-403747396
 
 
   +1 Please go ahead and merge. Thanks, @r0ann3l!
   - unit tests pass
   - successfully indexed into CSV using default configuration:
   ```
   % bin/nutch index -Dplugin.includes='indexer-csv|index-(basic|anchor|more)' \
        crawl/crawldb/ -dir crawl/segments/ -noCommit -deleteGone
   Indexer: number of documents indexed, deleted, or skipped:
   Indexer:      4  deleted (gone)
   Indexer:     35  indexed (add/update)
   
   % head -2 csvindexwriter/nutch.csv 
   id,title,content
   http://nutch.apache.org/,Apache Nutchâ„¢ -,"Apache Nutchâ„¢ -
   ```
   - I had to remove or change exchange.xml to avoid that the Exchange 
component still tries to route documents to indexer_solr_1, see NUTCH-2617

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Indexer plugin to write CSV
> ---------------------------
>
>                 Key: NUTCH-1541
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1541
>             Project: Nutch
>          Issue Type: New Feature
>          Components: indexer
>    Affects Versions: 1.7
>            Reporter: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.15
>
>         Attachments: NUTCH-1541-v1.patch, NUTCH-1541-v2.patch
>
>
> With the new pluggable indexer a simple plugin would be handy to write 
> configurable fields into a CSV file - for further analysis or just for export.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to