[ 
https://issues.apache.org/jira/browse/NUTCH-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700082#comment-13700082
 ] 

lufeng commented on NUTCH-1602:
-------------------------------

Hi Markus, this output format only used in *normal* output format, not within 
CSV output format. there are two different crawl datum output format. now the 
normal output like this, better than previous one.

{code:xml}
http://www.baidu.com/   Version: 7
Status: 3 (db_gone)
Fetch time: Sat Aug 17 22:35:37 CST 2013
Modified time: Thu Jan 01 08:00:00 CST 1970
Retries since fetch: 0
Retry interval: 3888000 seconds (45 days)
Score: 1.0
Signature: null
Metadata: 
        m1=v22
        m3=v3
        m2=v2
        m5=v5
        m4=m4
        _pst_=robots_denied(18), lastModified=0
        m6=v6

{code}

thanks Julien and Tejas.
                
> improve the readability of metadata in readdb dump normal 
> ----------------------------------------------------------
>
>                 Key: NUTCH-1602
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1602
>             Project: Nutch
>          Issue Type: Improvement
>          Components: crawldb
>    Affects Versions: 1.7
>            Reporter: lufeng
>            Assignee: lufeng
>            Priority: Minor
>             Fix For: 1.8
>
>         Attachments: NUTCH-1602.patch
>
>
> the dumped metadata format is not readable.
> {code:xml}
> $bin/nutch readdb crawldb/ -dump dir
> http://www.baidu.com/ Version: 7
> Status: 3 (db_gone)
> Fetch time: Sat Aug 17 22:35:37 CST 2013
> Modified time: Thu Jan 01 08:00:00 CST 1970
> Retries since fetch: 0
> Retry interval: 3888000 seconds (45 days)
> Score: 1.0
> Signature: null
> Metadata: m1: v22m3: v3m2: v2m5: v5m4: m4_pst_: robots_denied(18), 
> lastModified=0m6: v6
> {code}
> so I improve the Metadata format to this
> {code:xml}
> Metadata: m1=v22;m3=v3;m2=v2;m5=v5;m4=m4;_pst_=robots_denied(18), 
> lastModified=0;m6=v6;
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to