Hello.
I want to use the output into solr by ManifoldCF.
My crawling target is files of windows shares repository.
I think that this framework can obtain paths, security, and metadata of those 
files by executing jobs.
But, It can not extract text content in crawling files, and can not be 
attributes of solr output, probably. For example, text data of MS excel or PDF 
documents.
It need to include framework like Tika, if it implements text content 
exrtraction on ManifoldCF.
Is this idea correct? Or any ideas, please. Thanks.

Reply via email to