[
https://issues.apache.org/jira/browse/CONNECTORS-916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975138#comment-13975138
]
Takumi Yoshida commented on CONNECTORS-916:
-------------------------------------------
Hi Karl,
because of spec of CloudSearch. let me explain.
- CloudSearch drops document which contains field that does not defined in
CloudSearch schema.
- To feed binary files, the output connector extract them with using Tika.
because CloudSearch does not extract binary files on server side (not like
Solr).
- So, if there are No mapping page, Users need to define all of the fields
which extracted by Tika. Otherwise you cannnot feed them.
of course i need to make some test and documents. whch do you prefer to do
first ? I go along with community way.
> Amazon CloudSearch output connector
> -----------------------------------
>
> Key: CONNECTORS-916
> URL: https://issues.apache.org/jira/browse/CONNECTORS-916
> Project: ManifoldCF
> Issue Type: New Feature
> Components: Amazon CloudSearch output connector
> Affects Versions: ManifoldCF 1.7
> Reporter: Takumi Yoshida
> Assignee: Karl Wright
> Fix For: ManifoldCF 1.7
>
> Attachments: 1.patch, 2.diff, 3.diff, exception_handling.diff,
> exception_handling_2.diff
>
>
> I wrote some codes snipetts of output connector for Amazon CloudSearch.
> I would like you to review my code. You can crawl web site and feed HTML page
> to Amazon CloudSearch.
> but it is not perfectly completed followoing reason.
> - does not write any codes for configuration page.
> - supporting file type is only HTML
> Thank you for your time,
> Takumi Yoshida
--
This message was sent by Atlassian JIRA
(v6.2#6252)