[
https://issues.apache.org/jira/browse/NUTCH-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17582539#comment-17582539
]
Hudson commented on NUTCH-2795:
-------------------------------
SUCCESS: Integrated in Jenkins build Nutch ยป Nutch-trunk #85 (See
[https://ci-builds.apache.org/job/Nutch/job/Nutch-trunk/85/])
NUTCH-2795 CrawlDbReader: compress CrawlDb dumps if configured (snagel:
[https://github.com/apache/nutch/commit/bca5fc0d0e25a213c704d9ac486ebf9d88b3cf7a])
* (edit) src/java/org/apache/nutch/crawl/CrawlDbReader.java
> CrawlDbReader: compress CrawlDb dumps if configured
> ---------------------------------------------------
>
> Key: NUTCH-2795
> URL: https://issues.apache.org/jira/browse/NUTCH-2795
> Project: Nutch
> Issue Type: Improvement
> Components: crawldb
> Affects Versions: 1.17
> Reporter: Sebastian Nagel
> Assignee: Sebastian Nagel
> Priority: Minor
> Labels: help-wanted
> Fix For: 1.19
>
>
> The dumps of CrawlDbReader (text, CSV, JSON) are not compressed given the
> configured file output compression. E.g., if running
> {noformat}
> $> bin/nutch readdb \
> -Dmapreduce.output.fileoutputformat.compress=true \
>
> -Dmapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.BZip2Codec
> \
> crawldb/ -dump crawldb.dump -format json
> {noformat}
> the output should be compressed using bzip2.
> See the Hadoop class
> [FileOutputFormat|https://hadoop.apache.org/docs/r3.1.3/api/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.html]
> and the [implementation in
> TextOutputFormat|https://github.com/apache/hadoop/blob/639acb6d8921127cde3174a302f2e3d71b44f052/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/TextOutputFormat.java].
--
This message was sent by Atlassian Jira
(v8.20.10#820010)