[ 
https://issues.apache.org/jira/browse/NUTCH-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17799911#comment-17799911
 ] 

ASF GitHub Bot commented on NUTCH-1541:
---------------------------------------

grege117 commented on PR #294:
URL: https://github.com/apache/nutch/pull/294#issuecomment-1868000580

   Sorry to chime in a few years late, but I'm not sure this plugin is 
configured correctly.
   
   If I modify my conf/index-writers.xml and remove everything except for 
"<writer id="indexer_csv_1">, you will get the message:
   
   IndexerOutputFormat [pool-5-thread-1] No IndexWriters activated - check your 
configuration
   
   The only way I could write to CSV was to execute what @sebastian-nagel  
wrote above:
   bin/nutch index -Dplugin.includes='indexer-csv' crawl/crawldb/ -linkdb 
crawl/linkdb/ crawl/segments/20231222132024/ -filter -normalize -deleteGone
   
   
   However, if I add back in the index-writer for SOLR, that just works (no 
-Dplugin.includes is required).  
   
   So I think there's a bug here in the OOTB configuration that prevents 
indexer-csv working without specifying it on the CLI.




> Indexer plugin to write CSV
> ---------------------------
>
>                 Key: NUTCH-1541
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1541
>             Project: Nutch
>          Issue Type: New Feature
>          Components: indexer
>    Affects Versions: 1.7
>            Reporter: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.15
>
>         Attachments: NUTCH-1541-v1.patch, NUTCH-1541-v2.patch
>
>
> With the new pluggable indexer a simple plugin would be handy to write 
> configurable fields into a CSV file - for further analysis or just for export.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to