[
https://issues.apache.org/jira/browse/NUTCH-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13797768#comment-13797768
]
Julien Nioche commented on NUTCH-1541:
--------------------------------------
Hi
line 342 needs to be
{code}
while (nextQuoteChar > 0 && nextQuoteChar < max) {
{code}
Am I right in thinking that it generates the output on the local file system
only? When it is used in deployed mode, won't it create one local file per
reducer? If so we should make this very explicit in a README file.
Just thinking aloud here but what's preventing us from relying on the standard
TextOutputFormat and put things on HDFS if the configuration says so? Is it
because the IndexingJob sets a dummy FileOutputPath and the IndexWriters know
nothing about it?
Maybe it would be good to have some abstract class for text-based index writers
to facilitate writing new ones, e.g. XML, JSON etc...?
> Indexer plugin to write CSV
> ---------------------------
>
> Key: NUTCH-1541
> URL: https://issues.apache.org/jira/browse/NUTCH-1541
> Project: Nutch
> Issue Type: New Feature
> Components: indexer
> Affects Versions: 1.7
> Reporter: Sebastian Nagel
> Priority: Minor
> Fix For: 1.8
>
> Attachments: NUTCH-1541-v1.patch, NUTCH-1541-v2.patch
>
>
> With the new pluggable indexer a simple plugin would be handy to write
> configurable fields into a CSV file - for further analysis or just for export.
--
This message was sent by Atlassian JIRA
(v6.1#6144)