Hi Mark, I think I'd describe this simplified proposal as "pipeline" (vs. "Pipeline". Your original description was the latter.) This proposal is simpler but does not have the ability to amalgamate content from multiple connectors, correct? As long as it is just modifying the content and metadata (as described by RepositoryDocument), it's not hard to develop a generic idea of a content processing pipeline, e.g. Tika.
There's a question in my mind as to where it belongs. If its purpose is to make up for missing code in particular search engines, then I'd argue it should be a service available to output connector coders, who can then choose how much configurability makes sense from the point of view of their target system. For instance, since Tika is already part of Solr, there would seem little benefit in adding a Tika pipeline upstream of Solr as well, but maybe a Google Appliance connector would want it and therefore expose it. If the pipeline's purpose is to include arbitrary business logic, on the other hand, then I think what you'd really need is a Pipeline and not a pipeline, if you see what I mean. So, my question to you is, what would the main use case(s) be for a "pipeline" in your view? Karl On Wed, Jan 11, 2012 at 6:31 AM, Mark Bennett <mbenn...@ideaeng.com> wrote: > Hi Karl, > > Still pondering our last discussion. Wondering if I got things off track. > > As a start, what if I backtracked a bit, to this: > > What's the easiest way to do this: > * A connector that tweaks metadata form a single source. > * Sits between any existing MCF datasource connector and the main MCF engine > > Before: > > CMS/DB -> Existing MCF connector -> MCF core -> output > > After: > > CMS/DB -> Existing MCF connector -> Metadata tweaker -> MCF core -> output > > > Assume the matadata changes don't have any impact on security, or that no > security is being used (public data)