[
https://issues.apache.org/jira/browse/CONNECTORS-916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13991259#comment-13991259
]
Karl Wright commented on CONNECTORS-916:
----------------------------------------
Hi Takumi,
About document status tracking function -- let's consider how it would probably
work before deciding if it could fail in a bad way.
I assume that this is how it would work:
(1) Every document that is output, and its metadata, is written as multiple
files to a disk directory. This operation will not fail unless disk space is
exhausted.
(2) Placeholders for "deleted" documents are written too.
(3) Periodically (say, every 1000 documents, and also at
notifyOfJobCompletion()), the entire disk structure is sent to Amazon as one
packed file (?), or in whatever form is cheapest.
Operation (3) *can* fail, but when it does it is important to understand *how*
it fails, and *why* it fails.
If it fails because of some temporary communication problem, it should just
throw ServiceInterruption. ManifoldCF will retry the entire operation again
accordingly until it succeeds.
If it fails because Amazon doesn't like the contents of one or more documents
in the package, then ManifoldCF has no good way of recovering from this. So
the key thing is making sure that only transient communication issues can be a
problem with operation (3). Is it possible to guarantee this? If so I think
this proposal will work well.
> Amazon CloudSearch output connector
> -----------------------------------
>
> Key: CONNECTORS-916
> URL: https://issues.apache.org/jira/browse/CONNECTORS-916
> Project: ManifoldCF
> Issue Type: New Feature
> Components: Amazon CloudSearch output connector
> Affects Versions: ManifoldCF 1.7
> Reporter: Takumi Yoshida
> Assignee: Karl Wright
> Fix For: ManifoldCF 1.7
>
> Attachments: 0507.diff, 1.patch, 2.diff, 3.diff,
> exception_handling.diff, exception_handling_2.diff
>
>
> I wrote some codes snipetts of output connector for Amazon CloudSearch.
> I would like you to review my code. You can crawl web site and feed HTML page
> to Amazon CloudSearch.
> but it is not perfectly completed followoing reason.
> - does not write any codes for configuration page.
> - supporting file type is only HTML
> Thank you for your time,
> Takumi Yoshida
--
This message was sent by Atlassian JIRA
(v6.2#6252)