>As for changing the Solr connector so that it doesn't go to the extracting update handler
I don't think it needs to change Solr connector with new checkbox because currently we can change "/update/extract" into "/update" at 'Update Handler' at Paths tab in Solr connector UI. I confirmed I could post CSV, JSON and XML files to Solr by changing that and using File connector. So I wish we allow Tika extractor transformation connector to create XML files that Solr expects to see. Regards, Shinichiro Abe 2014-06-18 2:55 GMT+09:00 Karl Wright <[email protected]>: > The pipeline code itself is now "complete" in trunk. Zaizi said they'd > contribute a Tika extractor transformation connector - and if they don't > get around to that in a month or so, I may take a crack at it myself. > > As for changing the Solr connector so that it doesn't go to the extracting > update handler, it would be great if: > (1) Someone created a ticket for this, and > (2) A patch was provided that maintains backwards compatibility with > previous versions of the connector (so a checkbox would probably need to go > into the UI somewhere). Do either of you want to start this process? > > Thanks! > Karl > > > > On Mon, Jun 16, 2014 at 12:37 PM, Karl Wright <[email protected]> wrote: > > > Hi guys, > > > > You folks may not have looked at 1.7 yet, but it has a full pipeline, and > > is expected to have a Tika extractor as a transformation connector. > > > > Karl > > > > > > > > On Mon, Jun 16, 2014 at 11:14 AM, Matteo Grolla < > [email protected]> > > wrote: > > > >> Thanks Alessandro, > >> that explains the situation clearly. > >> And I agree that sending all the metadata as get parameter can be > >> problematic > >> > >> Cheers > >> > >> -- > >> Matteo Grolla > >> Sourcesense - making sense of Open Source > >> http://www.sourcesense.com > >> > >> Il giorno 16/giu/2014, alle ore 17:09, Alessandro Benedetti ha scritto: > >> > >> > mmmm the point is that right now ManifoldCF has no extractors. > >> > The Repository connectors extracts directly the binary and there is no > >> > "Extractor Processor" yet. > >> > But recently a pipe-line processor architecture has been thought ( > >> > https://issues.apache.org/jira/browse/CONNECTORS-959) > >> > So can fit there. > >> > > >> > Cheers > >> > > >> > > >> > 2014-06-16 15:59 GMT+01:00 Matteo Grolla <[email protected]>: > >> > > >> >> Since Solr extracting request handler takes the binary and extracts > >> text > >> >> what is the point of not using Manifold extractor and send text and > >> >> binaries to solr? > >> >> I mean the end result is the same solr indexes text and stores text > >> >> So if manifold supports text extraction it seems me this is the place > >> >> where it should be done > >> >> > >> >> -- > >> >> Matteo Grolla > >> >> Sourcesense - making sense of Open Source > >> >> http://www.sourcesense.com > >> >> > >> >> Il giorno 16/giu/2014, alle ore 16:51, Antonio David Perez Morales ha > >> >> scritto: > >> >> > >> >>> Hi Matteo > >> >>> > >> >>> Manifold already handles the extraction, but the only way to send > >> binary > >> >>> content and document metadata to Solr is using the update/extract > >> >> handler, > >> >>> where the metadata is sent as query parameters and the binary > content > >> is > >> >>> sent in the body of the requests, allowing Solr to use Tika to > obtain > >> the > >> >>> raw content to be stored in Solr. > >> >>> > >> >>> Regards > >> >>> > >> >>> > >> >>> On Mon, Jun 16, 2014 at 4:35 PM, Matteo Grolla < > >> [email protected] > >> >>> > >> >>> wrote: > >> >>> > >> >>>> Hi During my first indexing I noticed that manifold uses Solr > >> extracting > >> >>>> request handler to extract the content of an xml file > >> >>>> For performance reasons it would be better if Manifold handled the > >> >>>> extraction letting Solr do the search engine > >> >>>> Is this because of the connector design, framework design or just > to > >> be > >> >>>> done? > >> >>>> > >> >>>> -- > >> >>>> Matteo Grolla > >> >>>> Sourcesense - making sense of Open Source > >> >>>> http://www.sourcesense.com > >> >>>> > >> >>>> > >> >>> > >> >>> -- > >> >>> > >> >>> ------------------------------ > >> >>> This message should be regarded as confidential. If you have > received > >> >> this > >> >>> email in error please notify the sender and destroy it immediately. > >> >>> Statements of intent shall only become binding when confirmed in > hard > >> >> copy > >> >>> by an authorised signatory. > >> >>> > >> >>> Zaizi Ltd is registered in England and Wales with the registration > >> number > >> >>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush > >> Road, > >> >>> London W6 7AN. > >> >> > >> >> > >> > > >> > > >> > -- > >> > -------------------------- > >> > > >> > Benedetti Alessandro > >> > Visiting card : http://about.me/alessandro_benedetti > >> > > >> > "Tyger, tyger burning bright > >> > In the forests of the night, > >> > What immortal hand or eye > >> > Could frame thy fearful symmetry?" > >> > > >> > William Blake - Songs of Experience -1794 England > >> > >> > > > -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Shinichiro Abe 阿部 慎一朗
