The pipeline code itself is now "complete" in trunk. Zaizi said they'd contribute a Tika extractor transformation connector - and if they don't get around to that in a month or so, I may take a crack at it myself.
As for changing the Solr connector so that it doesn't go to the extracting update handler, it would be great if: (1) Someone created a ticket for this, and (2) A patch was provided that maintains backwards compatibility with previous versions of the connector (so a checkbox would probably need to go into the UI somewhere). Do either of you want to start this process? Thanks! Karl On Mon, Jun 16, 2014 at 12:37 PM, Karl Wright <[email protected]> wrote: > Hi guys, > > You folks may not have looked at 1.7 yet, but it has a full pipeline, and > is expected to have a Tika extractor as a transformation connector. > > Karl > > > > On Mon, Jun 16, 2014 at 11:14 AM, Matteo Grolla <[email protected]> > wrote: > >> Thanks Alessandro, >> that explains the situation clearly. >> And I agree that sending all the metadata as get parameter can be >> problematic >> >> Cheers >> >> -- >> Matteo Grolla >> Sourcesense - making sense of Open Source >> http://www.sourcesense.com >> >> Il giorno 16/giu/2014, alle ore 17:09, Alessandro Benedetti ha scritto: >> >> > mmmm the point is that right now ManifoldCF has no extractors. >> > The Repository connectors extracts directly the binary and there is no >> > "Extractor Processor" yet. >> > But recently a pipe-line processor architecture has been thought ( >> > https://issues.apache.org/jira/browse/CONNECTORS-959) >> > So can fit there. >> > >> > Cheers >> > >> > >> > 2014-06-16 15:59 GMT+01:00 Matteo Grolla <[email protected]>: >> > >> >> Since Solr extracting request handler takes the binary and extracts >> text >> >> what is the point of not using Manifold extractor and send text and >> >> binaries to solr? >> >> I mean the end result is the same solr indexes text and stores text >> >> So if manifold supports text extraction it seems me this is the place >> >> where it should be done >> >> >> >> -- >> >> Matteo Grolla >> >> Sourcesense - making sense of Open Source >> >> http://www.sourcesense.com >> >> >> >> Il giorno 16/giu/2014, alle ore 16:51, Antonio David Perez Morales ha >> >> scritto: >> >> >> >>> Hi Matteo >> >>> >> >>> Manifold already handles the extraction, but the only way to send >> binary >> >>> content and document metadata to Solr is using the update/extract >> >> handler, >> >>> where the metadata is sent as query parameters and the binary content >> is >> >>> sent in the body of the requests, allowing Solr to use Tika to obtain >> the >> >>> raw content to be stored in Solr. >> >>> >> >>> Regards >> >>> >> >>> >> >>> On Mon, Jun 16, 2014 at 4:35 PM, Matteo Grolla < >> [email protected] >> >>> >> >>> wrote: >> >>> >> >>>> Hi During my first indexing I noticed that manifold uses Solr >> extracting >> >>>> request handler to extract the content of an xml file >> >>>> For performance reasons it would be better if Manifold handled the >> >>>> extraction letting Solr do the search engine >> >>>> Is this because of the connector design, framework design or just to >> be >> >>>> done? >> >>>> >> >>>> -- >> >>>> Matteo Grolla >> >>>> Sourcesense - making sense of Open Source >> >>>> http://www.sourcesense.com >> >>>> >> >>>> >> >>> >> >>> -- >> >>> >> >>> ------------------------------ >> >>> This message should be regarded as confidential. If you have received >> >> this >> >>> email in error please notify the sender and destroy it immediately. >> >>> Statements of intent shall only become binding when confirmed in hard >> >> copy >> >>> by an authorised signatory. >> >>> >> >>> Zaizi Ltd is registered in England and Wales with the registration >> number >> >>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush >> Road, >> >>> London W6 7AN. >> >> >> >> >> > >> > >> > -- >> > -------------------------- >> > >> > Benedetti Alessandro >> > Visiting card : http://about.me/alessandro_benedetti >> > >> > "Tyger, tyger burning bright >> > In the forests of the night, >> > What immortal hand or eye >> > Could frame thy fearful symmetry?" >> > >> > William Blake - Songs of Experience -1794 England >> >> >
