Rafael Thomas Goz Coutinho created NUTCH-1669:
-------------------------------------------------

             Summary: FTP crawl does not use FTP's server root folder
                 Key: NUTCH-1669
                 URL: https://issues.apache.org/jira/browse/NUTCH-1669
             Project: Nutch
          Issue Type: Bug
          Components: protocol
    Affects Versions: 1.7
         Environment: Linux Ubuntu
            Reporter: Rafael Thomas Goz Coutinho
            Priority: Minor


Setup an FTP with root folder setup for a user (let's say test) pointing to 
/home/test/ftphome/

And create a folder under it called target with a test.txt file:
/home/test/ftphome/target/test.txt

Configure a URL to crawl as with depth of 1:
ftp://FTP_SERVER/target/

It will fail to crawl because the FTP plugin protocol assumes the path is 
always absolute. It will look into /target/ and not /home/test/ftphome/target/






--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to