[jira] [Commented] (NUTCH-1541) Indexer plugin to write CSV

Julien Nioche (JIRA) Thu, 17 Oct 2013 03:38:33 -0700

    [ 
https://issues.apache.org/jira/browse/NUTCH-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13797768#comment-13797768
 ]


Julien Nioche commented on NUTCH-1541:
--------------------------------------

Hi 

line 342 needs to be 
{code}
while (nextQuoteChar > 0 && nextQuoteChar < max) {
{code}

Am I right in thinking that it generates the output on the local file system 
only? When it is used in deployed mode, won't it create one local file per 
reducer? If so we should make this very explicit in a README file.

Just thinking aloud here but what's preventing us from relying on the standard 
TextOutputFormat and put things on HDFS if the configuration says so? Is it 
because the IndexingJob sets a dummy FileOutputPath and the IndexWriters know 
nothing about it?

Maybe it would be good to have some abstract class for text-based index writers 
to facilitate writing new ones, e.g. XML, JSON etc...? 





> Indexer plugin to write CSV
> ---------------------------
>
>                 Key: NUTCH-1541
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1541
>             Project: Nutch
>          Issue Type: New Feature
>          Components: indexer
>    Affects Versions: 1.7
>            Reporter: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.8
>
>         Attachments: NUTCH-1541-v1.patch, NUTCH-1541-v2.patch
>
>
> With the new pluggable indexer a simple plugin would be handy to write 
> configurable fields into a CSV file - for further analysis or just for export.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (NUTCH-1541) Indexer plugin to write CSV

Reply via email to