[
https://issues.apache.org/jira/browse/CONNECTORS-916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14009618#comment-14009618
]
Karl Wright commented on CONNECTORS-916:
----------------------------------------
Another interesting point...
Other people have proposed adding pipeline capabilities to the MCF framework in
the past. We've so far resisted that, by pointing out that most indexes (Solr,
ES) already have document-processing pipelines built in. Amazon Cloud Search
is the first index that actually seems to violate that premise.
It may be worth considering developing a "Pipeline connector" concept in the
MCF framework. Such a connector would accept a RepositoryDocument object, and
modify it. Each output connector could specify a prerequisite pipeline
connector, which of course could have its own prerequisite, for chaining
purposes. I could imagine there being a "Tika" pipeline connector, which would
configure Tika through a standard connector-style UI, in order to extract
metadata that could be sent to an index output connector such as Amazon Cloud
Search. I raise this issue now because such an architecture would
automatically provide configurable pool size limits, among other benefits (such
as reusability).
Thoughts?
> Amazon CloudSearch output connector
> -----------------------------------
>
> Key: CONNECTORS-916
> URL: https://issues.apache.org/jira/browse/CONNECTORS-916
> Project: ManifoldCF
> Issue Type: New Feature
> Components: Amazon CloudSearch output connector
> Affects Versions: ManifoldCF 1.7
> Reporter: Takumi Yoshida
> Assignee: Karl Wright
> Fix For: ManifoldCF 1.7
>
> Attachments: 0507.diff, 0520.diff, 0520_2.diff, 1.patch, 2.diff,
> 3.diff, AmazonCloudSearchParam.java, AmazonCloudSearchSpecs.java,
> exception_handling.diff, exception_handling_2.diff, licenselist.txt
>
>
> I wrote some codes snipetts of output connector for Amazon CloudSearch.
> I would like you to review my code. You can crawl web site and feed HTML page
> to Amazon CloudSearch.
> but it is not perfectly completed followoing reason.
> - does not write any codes for configuration page.
> - supporting file type is only HTML
> Thank you for your time,
> Takumi Yoshida
--
This message was sent by Atlassian JIRA
(v6.2#6252)