Hi, I am running a crawl on a website that serves pages and images via php. Nutch doesn’t seem to crawl these pages.
I see the below in the hadoop.log 015-10-03 12:48:31,091 INFO parse.ParserFactory - The parsing plugins: [org.apache.nutch.parse.tika.TikaParser] are enabled via the plugin.includes system property, and all claim to support the content type text/x-php, but they are not mapped to it in the parse-plugins.xml file 2015-10-03 12:48:31,712 ERROR tika.TikaParser - Can't retrieve Tika parser for mime-type text/x-php 2015-10-03 12:48:31,713 WARN parse.ParseSegment - Error parsing: http://www.arguntrader.com/ucp.php?mode=login: failed(2,0): Can't retrieve Tika parser for mime-type text/x-php Can anyone help with identifying what is to be done to crawl a site which serves pages via php? Regards Girish

