[ 
https://issues.apache.org/jira/browse/CONNECTORS-916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14009618#comment-14009618
 ] 

Karl Wright commented on CONNECTORS-916:
----------------------------------------

Another interesting point...

Other people have proposed adding pipeline capabilities to the MCF framework in 
the past.  We've so far resisted that, by pointing out that most indexes (Solr, 
ES) already have document-processing pipelines built in.  Amazon Cloud Search 
is the first index that actually seems to violate that premise.

It may be worth considering developing a "Pipeline connector" concept in the 
MCF framework.  Such a connector would accept a RepositoryDocument object, and 
modify it.  Each output connector could specify a prerequisite pipeline 
connector, which of course could have its own prerequisite, for chaining 
purposes.  I could imagine there being a "Tika" pipeline connector, which would 
configure Tika through a standard connector-style UI, in order to extract 
metadata that could be sent to an index output connector such as Amazon Cloud 
Search.  I raise this issue now because such an architecture would 
automatically provide configurable pool size limits, among other benefits (such 
as reusability).

Thoughts?

> Amazon CloudSearch output connector
> -----------------------------------
>
>                 Key: CONNECTORS-916
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-916
>             Project: ManifoldCF
>          Issue Type: New Feature
>          Components: Amazon CloudSearch output connector
>    Affects Versions: ManifoldCF 1.7
>            Reporter: Takumi Yoshida
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 1.7
>
>         Attachments: 0507.diff, 0520.diff, 0520_2.diff, 1.patch, 2.diff, 
> 3.diff, AmazonCloudSearchParam.java, AmazonCloudSearchSpecs.java, 
> exception_handling.diff, exception_handling_2.diff, licenselist.txt
>
>
> I wrote some codes snipetts of output connector for Amazon CloudSearch.
> I would like you to review my code. You can crawl web site and feed HTML page 
> to Amazon CloudSearch.
> but it is not perfectly completed followoing reason.
> - does not write any codes for configuration page.
> - supporting file type is only HTML
> Thank you for your time,
>  Takumi Yoshida



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to