[
https://issues.apache.org/jira/browse/NUTCH-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964127#comment-13964127
]
Sebastian Nagel commented on NUTCH-1748:
----------------------------------------
Hi [~alexmc], you'r absolutely right: the analogy to Unix file names ([drawn
here|http://mail-archives.apache.org/mod_mbox/nutch-dev/201404.mbox/%3C533F1D81.7020401%40googlemail.com%3E])
is of no relevancy. Tried to reformulate it: urlfilter-validate should allow
two dots inside path elements, e.g. inside the "file name" as in
[~msertacturkel]'s example:
{code}
http://www.example.com/example-example..-16067h.htm
{code}
Of course, there must be surrounding (leading or trailing characters): a path
element ".." should be rejected two avoid trivial duplicates on the URL level.
> urlfilter-validator to allow .. (two dots) inside file names (path elements)
> ----------------------------------------------------------------------------
>
> Key: NUTCH-1748
> URL: https://issues.apache.org/jira/browse/NUTCH-1748
> Project: Nutch
> Issue Type: Bug
> Affects Versions: 2.2.1
> Reporter: Sertac TURKEL
> Priority: Minor
> Fix For: 2.3
>
>
> Unix systems accept files containing two dots "abc..xyz.txt". So
> urlfilter-validator should not reject this kind of urls. Also paths
> containing "/../" or "/.." in final position should be still rejected.
--
This message was sent by Atlassian JIRA
(v6.2#6252)