I've created CONNECTORS-962 to track the "multiple output" idea. Karl
On Wed, Jun 11, 2014 at 12:21 PM, Karl Wright <[email protected]> wrote: > Hi Rafa, > > We would be very interested in a contribution that addresses > CONNECTORS-954. As far as changing the Solr connector to not use the > extracting update handler, as long as that is only one of many options that > contribution too would be welcome. Please consider opening a ticket > specifically for that change. > > Output to multiple indexes at the same time has come up before, but this > is more of a challenge because in theory we'd want to keep a different > record in the ingeststatus table for each document for each individual > output index. With pipeline support, each output index would also no doubt > need a distinct pipeline as well. Nevertheless, I'm not opposed to adding > this feature if I can work out a good way to do it. > > So let's start with CONNECTORS-954 and Solr connector changes, and see how > far we get. > > Karl > > > > On Wed, Jun 11, 2014 at 12:10 PM, Rafa Haro <[email protected]> wrote: > >> Hi Karl, >> >> We (in Zaizi) had also this requirement. We initially addressed it by >> creating a sort of "Processor Connector" mainly for semantically enhancing >> the repository documents before indexing them. We would be very happy to >> give this a try and provide feedback because our current approach is >> totally temporal. Apart from processing the document, we also had an >> special requirement that is to produce different instances of repository >> documents because we populate more than one index at the same time. We >> would need to check also how we can do exactly the same with this >> processing pipeline. >> >> Apart from this Karl, we can also take care of the Tika integration >> (actually we already did it) and eventually take care of CONNECTORS-954 >> then. Because we already use Tika as "processor connector", we are going to >> also modify the solr connector for not using the extract update handler >> which present some problems also. Would that be interesting also for the >> community? >> >> Cheers, >> Rafa >> >> El 11/06/14 16:09, Karl Wright escribió: >> >> Hi folks, >>> >>> ManifoldCF finally has a pipeline! All tests pass. Now I'm looking for >>> people to try things out by hand to see if there are any rough edges, >>> before we get too far along in the 1.7 development cycle to fix them. >>> >>> Trunk has all the necessary moving parts and documentation as well. >>> There >>> are two transformation connectors available -- one that does nothing but >>> pass data through, and one that forces metadata (just like the framework >>> "Forced metadata" tab). But since you can have more than one of each >>> kind >>> of connector in a pipeline, this should be enough to exercise things >>> fairly >>> completely. >>> >>> We still need to address a couple of things in the medium and long term. >>> First, we need a Tika transformation connector, that extracts metadata >>> from >>> binary files. There's an existing ticket for that: CONNECTORS-954. If >>> anyone wants to take a crack at that, please let me know. (Takumi >>> Yoshida >>> would be the obvious choice.) Second, we need to come up with a strategy >>> of removing obsolete tabs/features, like the aforementioned general job >>> Forced Metadata tab. We've got a fair number of such features around >>> now. >>> These kinds of things cannot be removed without either a comprehensive >>> automatic upgrade, or loss of backwards compatibility. I am thinking >>> maybe >>> we break with backwards compatibility and work towards cleaning out >>> duplicate features for ManifoldCF 2.0. >>> >>> Thoughts? >>> >>> Karl >>> >>> >> >
