mmmm the point is that right now ManifoldCF has no extractors. The Repository connectors extracts directly the binary and there is no "Extractor Processor" yet. But recently a pipe-line processor architecture has been thought ( https://issues.apache.org/jira/browse/CONNECTORS-959) So can fit there.
Cheers 2014-06-16 15:59 GMT+01:00 Matteo Grolla <[email protected]>: > Since Solr extracting request handler takes the binary and extracts text > what is the point of not using Manifold extractor and send text and > binaries to solr? > I mean the end result is the same solr indexes text and stores text > So if manifold supports text extraction it seems me this is the place > where it should be done > > -- > Matteo Grolla > Sourcesense - making sense of Open Source > http://www.sourcesense.com > > Il giorno 16/giu/2014, alle ore 16:51, Antonio David Perez Morales ha > scritto: > > > Hi Matteo > > > > Manifold already handles the extraction, but the only way to send binary > > content and document metadata to Solr is using the update/extract > handler, > > where the metadata is sent as query parameters and the binary content is > > sent in the body of the requests, allowing Solr to use Tika to obtain the > > raw content to be stored in Solr. > > > > Regards > > > > > > On Mon, Jun 16, 2014 at 4:35 PM, Matteo Grolla <[email protected] > > > > wrote: > > > >> Hi During my first indexing I noticed that manifold uses Solr extracting > >> request handler to extract the content of an xml file > >> For performance reasons it would be better if Manifold handled the > >> extraction letting Solr do the search engine > >> Is this because of the connector design, framework design or just to be > >> done? > >> > >> -- > >> Matteo Grolla > >> Sourcesense - making sense of Open Source > >> http://www.sourcesense.com > >> > >> > > > > -- > > > > ------------------------------ > > This message should be regarded as confidential. If you have received > this > > email in error please notify the sender and destroy it immediately. > > Statements of intent shall only become binding when confirmed in hard > copy > > by an authorised signatory. > > > > Zaizi Ltd is registered in England and Wales with the registration number > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, > > London W6 7AN. > > -- -------------------------- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England
