Re: Call for trunk pipeline testers

Karl Wright Wed, 11 Jun 2014 09:30:33 -0700

I've created CONNECTORS-962 to track the "multiple output" idea.

Karl



On Wed, Jun 11, 2014 at 12:21 PM, Karl Wright <[email protected]> wrote:

> Hi Rafa,
>
> We would be very interested in a contribution that addresses
> CONNECTORS-954.  As far as changing the Solr connector to not use the
> extracting update handler, as long as that is only one of many options that
> contribution too would be welcome.  Please consider opening a ticket
> specifically for that change.
>
> Output to multiple indexes at the same time has come up before, but this
> is more of a challenge because in theory we'd want to keep a different
> record in the ingeststatus table for each document for each individual
> output index.  With pipeline support, each output index would also no doubt
> need a distinct pipeline as well.  Nevertheless, I'm not opposed to adding
> this feature if I can work out a good way to do it.
>
> So let's start with CONNECTORS-954 and Solr connector changes, and see how
> far we get.
>
> Karl
>
>
>
> On Wed, Jun 11, 2014 at 12:10 PM, Rafa Haro <[email protected]> wrote:
>
>> Hi Karl,
>>
>> We (in Zaizi) had also this requirement. We initially addressed it by
>> creating a sort of "Processor Connector" mainly for semantically enhancing
>> the repository documents before indexing them. We would be very happy to
>> give this a try and provide feedback because our current approach is
>> totally temporal. Apart from processing the document, we also had an
>> special requirement that is to produce different instances of repository
>> documents because we populate more than one index at the same time. We
>> would need to check also how we can do exactly the same with this
>> processing pipeline.
>>
>> Apart from this Karl, we can also take care of the Tika integration
>> (actually we already did it) and eventually take care of CONNECTORS-954
>> then. Because we already use Tika as "processor connector", we are going to
>> also modify the solr connector for not using the extract update handler
>> which present some problems also. Would that be interesting also for the
>> community?
>>
>> Cheers,
>> Rafa
>>
>> El 11/06/14 16:09, Karl Wright escribió:
>>
>>  Hi folks,
>>>
>>> ManifoldCF finally has a pipeline!  All tests pass.  Now I'm looking for
>>> people to try things out by hand to see if there are any rough edges,
>>> before we get too far along in the 1.7 development cycle to fix them.
>>>
>>> Trunk has all the necessary moving parts and documentation as well.
>>>  There
>>> are two transformation connectors available -- one that does nothing but
>>> pass data through, and one that forces metadata (just like the framework
>>> "Forced metadata" tab).  But since you can have more than one of each
>>> kind
>>> of connector in a pipeline, this should be enough to exercise things
>>> fairly
>>> completely.
>>>
>>> We still need to address a couple of things in the medium and long term.
>>> First, we need a Tika transformation connector, that extracts metadata
>>> from
>>> binary files.  There's an existing ticket for that: CONNECTORS-954.  If
>>> anyone wants to take a crack at that, please let me know.  (Takumi
>>> Yoshida
>>> would be the obvious choice.)  Second, we need to come up with a strategy
>>> of removing obsolete tabs/features, like the aforementioned general job
>>> Forced Metadata tab.  We've got a fair number of such features around
>>> now.
>>> These kinds of things cannot be removed without either a comprehensive
>>> automatic upgrade, or loss of backwards compatibility.  I am thinking
>>> maybe
>>> we break with backwards compatibility and work towards cleaning out
>>> duplicate features for ManifoldCF 2.0.
>>>
>>> Thoughts?
>>>
>>> Karl
>>>
>>>
>>
>

Re: Call for trunk pipeline testers

Reply via email to