mmmm the point is that right now ManifoldCF has no extractors.
The Repository connectors extracts directly the binary and there is no
"Extractor Processor" yet.
But recently a pipe-line processor architecture has been thought (
https://issues.apache.org/jira/browse/CONNECTORS-959)
So can fit there.

Cheers


2014-06-16 15:59 GMT+01:00 Matteo Grolla <[email protected]>:

> Since Solr extracting request handler takes the binary and extracts text
> what is the point of not using Manifold extractor and send text and
> binaries to solr?
> I mean the end result is the same solr indexes text and stores text
> So if manifold supports text extraction it seems me this is the place
> where it should be done
>
> --
> Matteo Grolla
> Sourcesense - making sense of Open Source
> http://www.sourcesense.com
>
> Il giorno 16/giu/2014, alle ore 16:51, Antonio David Perez Morales ha
> scritto:
>
> > Hi Matteo
> >
> > Manifold already handles the extraction, but the only way to send binary
> > content and document metadata to Solr is using the update/extract
> handler,
> > where the metadata is sent as query parameters and the binary content is
> > sent in the body of the requests, allowing Solr to use Tika to obtain the
> > raw content to be stored in Solr.
> >
> > Regards
> >
> >
> > On Mon, Jun 16, 2014 at 4:35 PM, Matteo Grolla <[email protected]
> >
> > wrote:
> >
> >> Hi During my first indexing I noticed that manifold uses Solr extracting
> >> request handler to extract the content of an xml file
> >> For performance reasons it would be better if Manifold handled the
> >> extraction letting Solr do the search engine
> >> Is this because of the connector design, framework design or just to be
> >> done?
> >>
> >> --
> >> Matteo Grolla
> >> Sourcesense - making sense of Open Source
> >> http://www.sourcesense.com
> >>
> >>
> >
> > --
> >
> > ------------------------------
> > This message should be regarded as confidential. If you have received
> this
> > email in error please notify the sender and destroy it immediately.
> > Statements of intent shall only become binding when confirmed in hard
> copy
> > by an authorised signatory.
> >
> > Zaizi Ltd is registered in England and Wales with the registration number
> > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> > London W6 7AN.
>
>


-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Reply via email to