Hi,
I circumvented this problem by modifying the
org.apache.nutch.protocol.file.FileResponse class belonging to the
protocol-file plugin.
In particular, at line 120, I added
String path = "".equals(url.getPath()) ? "/" : url.getPath();
+String decoded_path = path;
+try {
+ decoded_path=java.net.URLDecoder.decode(path,"UTF-8");
+}catch(Exception ex){}
Then, rather than
- java.io.File f = new java.io.File(path);
I have
+ java.io.File f = new java.io.File(decoded_path);
Thanks,
Michela
--
View this message in context:
http://lucene.472066.n3.nabble.com/Crawling-File-Error-404-when-fetching-file-with-an-hexadecimal-character-in-the-file-name-tp826407p848871.html
Sent from the Nutch - Dev mailing list archive at Nabble.com.