[jira] [Commented] (NUTCH-2795) CrawlDbReader: compress CrawlDb dumps if configured

Hudson (Jira) Sun, 21 Aug 2022 04:48:11 -0700


    [ 
https://issues.apache.org/jira/browse/NUTCH-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17582539#comment-17582539
 ]


Hudson commented on NUTCH-2795:
-------------------------------

SUCCESS: Integrated in Jenkins build Nutch » Nutch-trunk #85 (See 
[https://ci-builds.apache.org/job/Nutch/job/Nutch-trunk/85/])
NUTCH-2795 CrawlDbReader: compress CrawlDb dumps if configured (snagel: 
[https://github.com/apache/nutch/commit/bca5fc0d0e25a213c704d9ac486ebf9d88b3cf7a])
* (edit) src/java/org/apache/nutch/crawl/CrawlDbReader.java


> CrawlDbReader: compress CrawlDb dumps if configured
> ---------------------------------------------------
>
>                 Key: NUTCH-2795
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2795
>             Project: Nutch
>          Issue Type: Improvement
>          Components: crawldb
>    Affects Versions: 1.17
>            Reporter: Sebastian Nagel
>            Assignee: Sebastian Nagel
>            Priority: Minor
>              Labels: help-wanted
>             Fix For: 1.19
>
>
> The dumps of CrawlDbReader (text, CSV, JSON) are not compressed given the 
> configured file output compression. E.g., if running
> {noformat}
> $> bin/nutch readdb \
>        -Dmapreduce.output.fileoutputformat.compress=true  \
>        
> -Dmapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.BZip2Codec
>  \
>        crawldb/ -dump crawldb.dump -format json
> {noformat}
> the output should be compressed using bzip2.
> See the Hadoop class 
> [FileOutputFormat|https://hadoop.apache.org/docs/r3.1.3/api/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.html]
>  and the [implementation in 
> TextOutputFormat|https://github.com/apache/hadoop/blob/639acb6d8921127cde3174a302f2e3d71b44f052/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/TextOutputFormat.java].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (NUTCH-2795) CrawlDbReader: compress CrawlDb dumps if configured

Reply via email to