Rafael Thomas Goz Coutinho created NUTCH-1669:
-------------------------------------------------
Summary: FTP crawl does not use FTP's server root folder
Key: NUTCH-1669
URL: https://issues.apache.org/jira/browse/NUTCH-1669
Project: Nutch
Issue Type: Bug
Components: protocol
Affects Versions: 1.7
Environment: Linux Ubuntu
Reporter: Rafael Thomas Goz Coutinho
Priority: Minor
Setup an FTP with root folder setup for a user (let's say test) pointing to
/home/test/ftphome/
And create a folder under it called target with a test.txt file:
/home/test/ftphome/target/test.txt
Configure a URL to crawl as with depth of 1:
ftp://FTP_SERVER/target/
It will fail to crawl because the FTP plugin protocol assumes the path is
always absolute. It will look into /target/ and not /home/test/ftphome/target/
--
This message was sent by Atlassian JIRA
(v6.1#6144)