[
https://issues.apache.org/jira/browse/NUTCH-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17799911#comment-17799911
]
ASF GitHub Bot commented on NUTCH-1541:
---------------------------------------
grege117 commented on PR #294:
URL: https://github.com/apache/nutch/pull/294#issuecomment-1868000580
Sorry to chime in a few years late, but I'm not sure this plugin is
configured correctly.
If I modify my conf/index-writers.xml and remove everything except for
"<writer id="indexer_csv_1">, you will get the message:
IndexerOutputFormat [pool-5-thread-1] No IndexWriters activated - check your
configuration
The only way I could write to CSV was to execute what @sebastian-nagel
wrote above:
bin/nutch index -Dplugin.includes='indexer-csv' crawl/crawldb/ -linkdb
crawl/linkdb/ crawl/segments/20231222132024/ -filter -normalize -deleteGone
However, if I add back in the index-writer for SOLR, that just works (no
-Dplugin.includes is required).
So I think there's a bug here in the OOTB configuration that prevents
indexer-csv working without specifying it on the CLI.
> Indexer plugin to write CSV
> ---------------------------
>
> Key: NUTCH-1541
> URL: https://issues.apache.org/jira/browse/NUTCH-1541
> Project: Nutch
> Issue Type: New Feature
> Components: indexer
> Affects Versions: 1.7
> Reporter: Sebastian Nagel
> Priority: Minor
> Fix For: 1.15
>
> Attachments: NUTCH-1541-v1.patch, NUTCH-1541-v2.patch
>
>
> With the new pluggable indexer a simple plugin would be handy to write
> configurable fields into a CSV file - for further analysis or just for export.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)