Re: Solr Extracting request handler

Karl Wright Mon, 16 Jun 2014 09:38:39 -0700

Hi guys,

You folks may not have looked at 1.7 yet, but it has a full pipeline, and
is expected to have a Tika extractor as a transformation connector.


Karl



On Mon, Jun 16, 2014 at 11:14 AM, Matteo Grolla <[email protected]>
wrote:

> Thanks Alessandro,
>         that explains the situation clearly.
> And I agree that sending all the metadata as get parameter can be
> problematic
>
> Cheers
>
> --
> Matteo Grolla
> Sourcesense - making sense of Open Source
> http://www.sourcesense.com
>
> Il giorno 16/giu/2014, alle ore 17:09, Alessandro Benedetti ha scritto:
>
> > mmmm the point is that right now ManifoldCF has no extractors.
> > The Repository connectors extracts directly the binary and there is no
> > "Extractor Processor" yet.
> > But recently a pipe-line processor architecture has been thought (
> > https://issues.apache.org/jira/browse/CONNECTORS-959)
> > So can fit there.
> >
> > Cheers
> >
> >
> > 2014-06-16 15:59 GMT+01:00 Matteo Grolla <[email protected]>:
> >
> >> Since Solr extracting request handler takes the binary and extracts text
> >> what is the point of not using Manifold extractor and send text and
> >> binaries to solr?
> >> I mean the end result is the same solr indexes text and stores text
> >> So if manifold supports text extraction it seems me this is the place
> >> where it should be done
> >>
> >> --
> >> Matteo Grolla
> >> Sourcesense - making sense of Open Source
> >> http://www.sourcesense.com
> >>
> >> Il giorno 16/giu/2014, alle ore 16:51, Antonio David Perez Morales ha
> >> scritto:
> >>
> >>> Hi Matteo
> >>>
> >>> Manifold already handles the extraction, but the only way to send
> binary
> >>> content and document metadata to Solr is using the update/extract
> >> handler,
> >>> where the metadata is sent as query parameters and the binary content
> is
> >>> sent in the body of the requests, allowing Solr to use Tika to obtain
> the
> >>> raw content to be stored in Solr.
> >>>
> >>> Regards
> >>>
> >>>
> >>> On Mon, Jun 16, 2014 at 4:35 PM, Matteo Grolla <
> [email protected]
> >>>
> >>> wrote:
> >>>
> >>>> Hi During my first indexing I noticed that manifold uses Solr
> extracting
> >>>> request handler to extract the content of an xml file
> >>>> For performance reasons it would be better if Manifold handled the
> >>>> extraction letting Solr do the search engine
> >>>> Is this because of the connector design, framework design or just to
> be
> >>>> done?
> >>>>
> >>>> --
> >>>> Matteo Grolla
> >>>> Sourcesense - making sense of Open Source
> >>>> http://www.sourcesense.com
> >>>>
> >>>>
> >>>
> >>> --
> >>>
> >>> ------------------------------
> >>> This message should be regarded as confidential. If you have received
> >> this
> >>> email in error please notify the sender and destroy it immediately.
> >>> Statements of intent shall only become binding when confirmed in hard
> >> copy
> >>> by an authorised signatory.
> >>>
> >>> Zaizi Ltd is registered in England and Wales with the registration
> number
> >>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> >>> London W6 7AN.
> >>
> >>
> >
> >
> > --
> > --------------------------
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
>
>

Re: Solr Extracting request handler

Reply via email to