Hi guys, You folks may not have looked at 1.7 yet, but it has a full pipeline, and is expected to have a Tika extractor as a transformation connector.
Karl On Mon, Jun 16, 2014 at 11:14 AM, Matteo Grolla <[email protected]> wrote: > Thanks Alessandro, > that explains the situation clearly. > And I agree that sending all the metadata as get parameter can be > problematic > > Cheers > > -- > Matteo Grolla > Sourcesense - making sense of Open Source > http://www.sourcesense.com > > Il giorno 16/giu/2014, alle ore 17:09, Alessandro Benedetti ha scritto: > > > mmmm the point is that right now ManifoldCF has no extractors. > > The Repository connectors extracts directly the binary and there is no > > "Extractor Processor" yet. > > But recently a pipe-line processor architecture has been thought ( > > https://issues.apache.org/jira/browse/CONNECTORS-959) > > So can fit there. > > > > Cheers > > > > > > 2014-06-16 15:59 GMT+01:00 Matteo Grolla <[email protected]>: > > > >> Since Solr extracting request handler takes the binary and extracts text > >> what is the point of not using Manifold extractor and send text and > >> binaries to solr? > >> I mean the end result is the same solr indexes text and stores text > >> So if manifold supports text extraction it seems me this is the place > >> where it should be done > >> > >> -- > >> Matteo Grolla > >> Sourcesense - making sense of Open Source > >> http://www.sourcesense.com > >> > >> Il giorno 16/giu/2014, alle ore 16:51, Antonio David Perez Morales ha > >> scritto: > >> > >>> Hi Matteo > >>> > >>> Manifold already handles the extraction, but the only way to send > binary > >>> content and document metadata to Solr is using the update/extract > >> handler, > >>> where the metadata is sent as query parameters and the binary content > is > >>> sent in the body of the requests, allowing Solr to use Tika to obtain > the > >>> raw content to be stored in Solr. > >>> > >>> Regards > >>> > >>> > >>> On Mon, Jun 16, 2014 at 4:35 PM, Matteo Grolla < > [email protected] > >>> > >>> wrote: > >>> > >>>> Hi During my first indexing I noticed that manifold uses Solr > extracting > >>>> request handler to extract the content of an xml file > >>>> For performance reasons it would be better if Manifold handled the > >>>> extraction letting Solr do the search engine > >>>> Is this because of the connector design, framework design or just to > be > >>>> done? > >>>> > >>>> -- > >>>> Matteo Grolla > >>>> Sourcesense - making sense of Open Source > >>>> http://www.sourcesense.com > >>>> > >>>> > >>> > >>> -- > >>> > >>> ------------------------------ > >>> This message should be regarded as confidential. If you have received > >> this > >>> email in error please notify the sender and destroy it immediately. > >>> Statements of intent shall only become binding when confirmed in hard > >> copy > >>> by an authorised signatory. > >>> > >>> Zaizi Ltd is registered in England and Wales with the registration > number > >>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, > >>> London W6 7AN. > >> > >> > > > > > > -- > > -------------------------- > > > > Benedetti Alessandro > > Visiting card : http://about.me/alessandro_benedetti > > > > "Tyger, tyger burning bright > > In the forests of the night, > > What immortal hand or eye > > Could frame thy fearful symmetry?" > > > > William Blake - Songs of Experience -1794 England > >
