[jira] [Commented] (NUTCH-1857) readb -dump -format csv should use comma

Lewis John McGibbney (JIRA) Sat, 27 Sep 2014 15:22:24 -0700

    [ 
https://issues.apache.org/jira/browse/NUTCH-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14150797#comment-14150797
 ]


Lewis John McGibbney commented on NUTCH-1857:
---------------------------------------------

I've tested this out on a number of existing crawl databases and it is working 
a charm.
[~boustani] please see this issue, your crawl data dump now looks like this

Url,Status code,Status name,Fetch Time,Modified Time,Retries since fetch,Retry 
interval seconds,Retry interval days,Score,Signature,Metadata
as oppose to the ';' which was proving awkward to work with in Python and also 
for importing direcrtly to Solr.

> readb -dump -format csv should use comma
> ----------------------------------------
>
>                 Key: NUTCH-1857
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1857
>             Project: Nutch
>          Issue Type: New Feature
>          Components: crawldb
>    Affects Versions: 1.9
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>             Fix For: 1.10
>
>         Attachments: NUTCH-1857.patch
>
>
> The -dump -format csv option currently uses ASCII character ';' %3B which is 
> not a comma but instead a semi-colon.
> This is a pain in the back side as I always need to override this within the 
> Solr update request.
> We should change the behavhiour to default to the common comma... as 
> indicated here
> http://www.ietf.org/rfc/rfc4180.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (NUTCH-1857) readb -dump -format csv should use comma

Reply via email to