Hi,

I circumvented this problem by modifying the
org.apache.nutch.protocol.file.FileResponse class belonging to the
protocol-file plugin.

In particular, at line 120, I added

String path = "".equals(url.getPath()) ? "/" : url.getPath();
+String decoded_path = path;
+try { 
+ decoded_path=java.net.URLDecoder.decode(path,"UTF-8");
+}catch(Exception ex){}

Then, rather than

- java.io.File f = new java.io.File(path);

I have

+ java.io.File f = new java.io.File(decoded_path);

Thanks,

Michela
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Crawling-File-Error-404-when-fetching-file-with-an-hexadecimal-character-in-the-file-name-tp826407p848871.html
Sent from the Nutch - Dev mailing list archive at Nabble.com.

Reply via email to