[
https://issues.apache.org/jira/browse/NUTCH-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784029#comment-17784029
]
ASF GitHub Bot commented on NUTCH-3017:
---------------------------------------
sebastian-nagel commented on PR #793:
URL: https://github.com/apache/nutch/pull/793#issuecomment-1801814549
Thanks, @jnioche!
Merged into master, adding the lines to make use of Hadoop-provided
compression codecs.
Successfully tested in local and pseudo-distributed mode with various codecs
(gzip / .gz, bzip2, ZStandard / .zst).
One final note: if the fast-urlfilter is not found, the Nutch job (local
mode) or the tasks (distributed mode) fail with an exception. I didn't change
this behavior.
> Allow fast-urlfilter to load from HDFS/S3 and support gzipped input
> -------------------------------------------------------------------
>
> Key: NUTCH-3017
> URL: https://issues.apache.org/jira/browse/NUTCH-3017
> Project: Nutch
> Issue Type: Improvement
> Components: plugin, urlfilter
> Affects Versions: 1.19
> Reporter: Julien Nioche
> Priority: Minor
> Fix For: 1.20
>
>
> This provide an easier way to refresh the resources since no rebuild of the
> jar will be needed. The path can point to either HDFS or S3. Additionally,
> .gz files should be handled automatically
--
This message was sent by Atlassian Jira
(v8.20.10#820010)