Sebastian Nagel commented on NUTCH-1541:

Yes, the fields dumped are configurable. Of course, they must be available (ie, 
some indexing filter must add them before). Eg. this will dump the fields "url" 
and "title" in default CSV format (there will be a new output directory 
 bin/nutch org.apache.nutch.indexer.IndexingJob -Dindexer.csv.fields=url,title \
   crawldb/ -linkdb linkdb/ -dir segments/
Don't forget to "activate" the plugin indexer-csv. To dump in tab-separated 
 bin/nutch org.apache.nutch.indexer.IndexingJob \
   -Dindexer.csv.separator=$'\t' -Dindexer.csv.quotechar="" 
-Dindexer.csv.recordsep=$'\n' \
   crawldb/ -linkdb linkdb/ -dir segments/
So the output is quite configurable.
> Indexer plugin to write CSV
> ---------------------------
>                 Key: NUTCH-1541
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1541
>             Project: Nutch
>          Issue Type: New Feature
>          Components: indexer
>    Affects Versions: 1.7
>            Reporter: Sebastian Nagel
>            Priority: Minor
>         Attachments: NUTCH-1541-v1.patch
> With the new pluggable indexer a simple plugin would be handy to write 
> configurable fields into a CSV file - for further analysis or just for export.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to