Hi Abe-san, So just to be sure -- you believe that no changes at all are required to the Solr Connector as it stands now, other than to use the update handler rather than the /update/extract handler?
Karl On Tue, Jun 17, 2014 at 5:14 PM, Shinichiro Abe <[email protected]> wrote: > >As for changing the Solr connector so that it doesn't go to the extracting > update handler > > I don't think it needs to change Solr connector with new checkbox because > currently we can change "/update/extract" into "/update" at 'Update > Handler' at Paths tab in Solr connector UI. I confirmed I could post CSV, > JSON and XML files to Solr by changing that and using File connector. So I > wish we allow Tika extractor transformation connector to create XML files > that Solr expects to see. > > Regards, > Shinichiro Abe > > > 2014-06-18 2:55 GMT+09:00 Karl Wright <[email protected]>: > > > The pipeline code itself is now "complete" in trunk. Zaizi said they'd > > contribute a Tika extractor transformation connector - and if they don't > > get around to that in a month or so, I may take a crack at it myself. > > > > As for changing the Solr connector so that it doesn't go to the > extracting > > update handler, it would be great if: > > (1) Someone created a ticket for this, and > > (2) A patch was provided that maintains backwards compatibility with > > previous versions of the connector (so a checkbox would probably need to > go > > into the UI somewhere). Do either of you want to start this process? > > > > Thanks! > > Karl > > > > > > > > On Mon, Jun 16, 2014 at 12:37 PM, Karl Wright <[email protected]> > wrote: > > > > > Hi guys, > > > > > > You folks may not have looked at 1.7 yet, but it has a full pipeline, > and > > > is expected to have a Tika extractor as a transformation connector. > > > > > > Karl > > > > > > > > > > > > On Mon, Jun 16, 2014 at 11:14 AM, Matteo Grolla < > > [email protected]> > > > wrote: > > > > > >> Thanks Alessandro, > > >> that explains the situation clearly. > > >> And I agree that sending all the metadata as get parameter can be > > >> problematic > > >> > > >> Cheers > > >> > > >> -- > > >> Matteo Grolla > > >> Sourcesense - making sense of Open Source > > >> http://www.sourcesense.com > > >> > > >> Il giorno 16/giu/2014, alle ore 17:09, Alessandro Benedetti ha > scritto: > > >> > > >> > mmmm the point is that right now ManifoldCF has no extractors. > > >> > The Repository connectors extracts directly the binary and there is > no > > >> > "Extractor Processor" yet. > > >> > But recently a pipe-line processor architecture has been thought ( > > >> > https://issues.apache.org/jira/browse/CONNECTORS-959) > > >> > So can fit there. > > >> > > > >> > Cheers > > >> > > > >> > > > >> > 2014-06-16 15:59 GMT+01:00 Matteo Grolla <[email protected] > >: > > >> > > > >> >> Since Solr extracting request handler takes the binary and extracts > > >> text > > >> >> what is the point of not using Manifold extractor and send text and > > >> >> binaries to solr? > > >> >> I mean the end result is the same solr indexes text and stores text > > >> >> So if manifold supports text extraction it seems me this is the > place > > >> >> where it should be done > > >> >> > > >> >> -- > > >> >> Matteo Grolla > > >> >> Sourcesense - making sense of Open Source > > >> >> http://www.sourcesense.com > > >> >> > > >> >> Il giorno 16/giu/2014, alle ore 16:51, Antonio David Perez Morales > ha > > >> >> scritto: > > >> >> > > >> >>> Hi Matteo > > >> >>> > > >> >>> Manifold already handles the extraction, but the only way to send > > >> binary > > >> >>> content and document metadata to Solr is using the update/extract > > >> >> handler, > > >> >>> where the metadata is sent as query parameters and the binary > > content > > >> is > > >> >>> sent in the body of the requests, allowing Solr to use Tika to > > obtain > > >> the > > >> >>> raw content to be stored in Solr. > > >> >>> > > >> >>> Regards > > >> >>> > > >> >>> > > >> >>> On Mon, Jun 16, 2014 at 4:35 PM, Matteo Grolla < > > >> [email protected] > > >> >>> > > >> >>> wrote: > > >> >>> > > >> >>>> Hi During my first indexing I noticed that manifold uses Solr > > >> extracting > > >> >>>> request handler to extract the content of an xml file > > >> >>>> For performance reasons it would be better if Manifold handled > the > > >> >>>> extraction letting Solr do the search engine > > >> >>>> Is this because of the connector design, framework design or just > > to > > >> be > > >> >>>> done? > > >> >>>> > > >> >>>> -- > > >> >>>> Matteo Grolla > > >> >>>> Sourcesense - making sense of Open Source > > >> >>>> http://www.sourcesense.com > > >> >>>> > > >> >>>> > > >> >>> > > >> >>> -- > > >> >>> > > >> >>> ------------------------------ > > >> >>> This message should be regarded as confidential. If you have > > received > > >> >> this > > >> >>> email in error please notify the sender and destroy it > immediately. > > >> >>> Statements of intent shall only become binding when confirmed in > > hard > > >> >> copy > > >> >>> by an authorised signatory. > > >> >>> > > >> >>> Zaizi Ltd is registered in England and Wales with the registration > > >> number > > >> >>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush > > >> Road, > > >> >>> London W6 7AN. > > >> >> > > >> >> > > >> > > > >> > > > >> > -- > > >> > -------------------------- > > >> > > > >> > Benedetti Alessandro > > >> > Visiting card : http://about.me/alessandro_benedetti > > >> > > > >> > "Tyger, tyger burning bright > > >> > In the forests of the night, > > >> > What immortal hand or eye > > >> > Could frame thy fearful symmetry?" > > >> > > > >> > William Blake - Songs of Experience -1794 England > > >> > > >> > > > > > > > > > -- > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > Shinichiro Abe > 阿部 慎一朗 >
