From: bineesh k <bineesh13...@gmail.com> Date: Wednesday, October 3, 2018 at 12:37 AM To: "dev-ow...@tika.apache.org" <dev-ow...@tika.apache.org> Subject: Solr/Nutch /tika config for PDF crawing Hello Tika Team, Need help on Solr/Nutch setup for crawling the PDF pages We are using Nutch 1.15 and Solr 7.3.1 for our setup. We parsed the tika details in the nutch-site.xml file ans could crawl the PDF pages and index in solr successfully The current issue is title and description parts are missing for the indexed PDF pages. Is there a way to fix this ? if not Can we take first couple of lines from the content part and add to title fields ? Below fields are indexed in sole for PDF pages "date" "type":["application/pdf", "application", "pdf"], "url": "content": "tstamp": "digest": "host": "boost": "contentLength": "id":" "lastModified": "lang": "host_str": "url_str": "lang_str":["en"], "digest_str": "_version_":1613120835557523457, "content_str":, "type_str":["application", "application/pdf", "pdf"]}] Thanks in advance for your help on this Regards, Bineesh k