[ 
https://issues.apache.org/jira/browse/NUTCH-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964127#comment-13964127
 ] 

Sebastian Nagel commented on NUTCH-1748:
----------------------------------------

Hi [~alexmc], you'r absolutely right: the analogy to Unix file names ([drawn 
here|http://mail-archives.apache.org/mod_mbox/nutch-dev/201404.mbox/%3C533F1D81.7020401%40googlemail.com%3E])
 is of no relevancy. Tried to reformulate it: urlfilter-validate should allow 
two dots inside path elements, e.g. inside the "file name" as in 
[~msertacturkel]'s example:
{code}
http://www.example.com/example-example..-16067h.htm
{code}
Of course, there must be surrounding (leading or trailing characters): a path 
element ".." should be rejected two avoid trivial duplicates on the URL level.


> urlfilter-validator to allow .. (two dots) inside file names (path elements)
> ----------------------------------------------------------------------------
>
>                 Key: NUTCH-1748
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1748
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 2.2.1
>            Reporter: Sertac TURKEL
>            Priority: Minor
>             Fix For: 2.3
>
>
> Unix systems accept files containing two dots "abc..xyz.txt". So
> urlfilter-validator should not  reject this kind of urls. Also paths 
> containing "/../" or "/.." in final position should be still rejected.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to