From: bineesh k <>
Date: Wednesday, October 3, 2018 at 12:37 AM
To: "" <>
Subject: Solr/Nutch /tika config for PDF crawing


Hello Tika Team, 


Need help on Solr/Nutch setup for crawling the PDF pages


We are using Nutch 1.15 and Solr 7.3.1 for our setup. We parsed the tika 
details in the nutch-site.xml file ans could crawl the PDF pages and index in 
solr successfully


The current issue is title  and description parts are missing for the indexed 
PDF pages. Is there a way to fix this ? if not Can we take first couple of 
lines from the content part and add to title fields ? 


Below fields are indexed in sole for PDF pages 



























Thanks in advance for your help on this




Bineesh k


Reply via email to