Hi: 

I downloaded and compiled the Nutch trunk. But when I try to make a 
parsechecker I get the error: Can't retrieve Tika parser for mime-type 
image/jpeg 

My log file content is this: 

2015-11-02 10:50:57,421 INFO parse.ParserChecker - fetching: 
http://www.cubadebate.cu/wp-content/uploads/2015/11/air-china-3-150x125.jpg 
2015-11-02 10:50:57,897 INFO protocol.RobotRulesParser - robots.txt whitelist 
not configured. 
2015-11-02 10:50:57,897 INFO httpclient.Http - http.proxy.host = null 
2015-11-02 10:50:57,897 INFO httpclient.Http - http.proxy.port = 8080 
2015-11-02 10:50:57,897 INFO httpclient.Http - http.proxy.exception.list = 
false 
2015-11-02 10:50:57,897 INFO httpclient.Http - http.timeout = 60000 
2015-11-02 10:50:57,897 INFO httpclient.Http - http.content.limit = 1048576000 
2015-11-02 10:50:57,897 INFO httpclient.Http - http.agent = agent/Nutch-1.11 
(Agent; [email protected]) 
2015-11-02 10:50:57,897 INFO httpclient.Http - http.accept.language = 
en-us,en-gb,en;q=0.7,*;q=0.3 
2015-11-02 10:50:57,897 INFO httpclient.Http - http.accept = 
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 
2015-11-02 10:50:58,582 ERROR tika.TikaParser - Can't retrieve Tika parser for 
mime-type image/jpeg 
2015-11-02 10:50:58,594 INFO crawl.SignatureFactory - Using Signature impl: 
org.apache.nutch.crawl.MD5Signature 
2015-11-02 10:50:58,602 INFO parse.ParserChecker - parsing: 
http://www.cubadebate.cu/wp-content/uploads/2015/11/air-china-3-150x125.jpg 
2015-11-02 10:50:58,602 INFO parse.ParserChecker - contentType: image/jpeg 
2015-11-02 10:50:58,602 INFO parse.ParserChecker - signature: 
bfdbe472ed3e43e686b4619b2c043d50 
2015-11-02 10:50:58,603 INFO parse.ParserChecker - --------- 

Thanks 

Noviembre 13-14: Final Caribeña 2015 del Concurso de Programación ACM-ICPC
https://icpc.baylor.edu/regionals/finder/cf-2015

Reply via email to