[ 
https://issues.apache.org/jira/browse/CONNECTORS-916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010441#comment-14010441
 ] 

Takumi Yoshida commented on CONNECTORS-916:
-------------------------------------------

bq. I also think we should further discuss whether separating the Tika part of 
this connector into its own pipeline connector would be something we should do. 
It is not clear to me how such a pipeline connector would be configured, but I 
think having Tika available to more than one output connector would be a good 
idea, do you agree?

Yes, i agree. It sounds greate extract text on MCF and feed them to some other 
repositories, like mongoDB.
(well, but i have no idea how to implement it..)

Do you know Doment Pipeline on FAST ESP ? That was powerful document processing 
software. There are many processing templates like copy field, extract field 
data like tree, download file and extract text from binary data. The process 
was separated as very small steps(it is called 'stage'), so you can easy to add 
/ remove / modify stage. Also you can write own stage with Python...


> Amazon CloudSearch output connector
> -----------------------------------
>
>                 Key: CONNECTORS-916
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-916
>             Project: ManifoldCF
>          Issue Type: New Feature
>          Components: Amazon CloudSearch output connector
>    Affects Versions: ManifoldCF 1.7
>            Reporter: Takumi Yoshida
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 1.7
>
>         Attachments: 0507.diff, 0520.diff, 0520_2.diff, 0527.diff, 1.patch, 
> 2.diff, 3.diff, AmazonCloudSearchParam.java, AmazonCloudSearchSpecs.java, 
> exception_handling.diff, exception_handling_2.diff, licenselist.txt
>
>
> I wrote some codes snipetts of output connector for Amazon CloudSearch.
> I would like you to review my code. You can crawl web site and feed HTML page 
> to Amazon CloudSearch.
> but it is not perfectly completed followoing reason.
> - does not write any codes for configuration page.
> - supporting file type is only HTML
> Thank you for your time,
>  Takumi Yoshida



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to