Hello.
I want to use the output into solr by ManifoldCF.
My crawling target is files of windows shares repository.
I think that this framework can obtain paths, security, and metadata of those
files by executing jobs.
But, It can not extract text content in crawling files, and can not be
attributes of solr output, probably. For example, text data of MS excel or PDF
documents.
It need to include framework like Tika, if it implements text content
exrtraction on ManifoldCF.
Is this idea correct? Or any ideas, please. Thanks.