Hi Mark,

I think I'd describe this simplified proposal as "pipeline" (vs.
"Pipeline".  Your original description was the latter.)  This proposal
is simpler but does not have the ability to amalgamate content from
multiple connectors, correct?  As long as it is just modifying the
content and metadata (as described by RepositoryDocument), it's not
hard to develop a generic idea of a content processing pipeline, e.g.
Tika.

There's a question in my mind as to where it belongs.  If its purpose
is to make up for missing code in particular search engines, then I'd
argue it should be a service available to output connector coders, who
can then choose how much configurability makes sense from the point of
view of their target system.  For instance, since Tika is already part
of Solr, there would seem little benefit in adding a Tika pipeline
upstream of Solr as well, but maybe a Google Appliance connector would
want it and therefore expose it.  If the pipeline's purpose is to
include arbitrary business logic, on the other hand, then I think what
you'd really need is a Pipeline and not a pipeline, if you see what I
mean.

So, my question to you is, what would the main use case(s) be for a
"pipeline" in your view?

Karl

On Wed, Jan 11, 2012 at 6:31 AM, Mark Bennett <mbenn...@ideaeng.com> wrote:
> Hi Karl,
>
> Still pondering our last discussion.  Wondering if I got things off track.
>
> As a start, what if I backtracked a bit, to this:
>
> What's the easiest way to do this:
> * A connector that tweaks metadata form a single source.
> * Sits between any existing MCF datasource connector and the main MCF engine
>
> Before:
>
> CMS/DB -> Existing MCF connector -> MCF core -> output
>
> After:
>
> CMS/DB -> Existing MCF connector -> Metadata tweaker -> MCF core -> output
>
>
> Assume the matadata changes don't have any impact on security, or that no
> security is being used (public data)

Reply via email to