[ 
https://issues.apache.org/jira/browse/CONNECTORS-916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13991259#comment-13991259
 ] 

Karl Wright commented on CONNECTORS-916:
----------------------------------------

Hi Takumi,

About document status tracking function -- let's consider how it would probably 
work before deciding if it could fail in a bad way.

I assume that this is how it would work:

(1) Every document that is output, and its metadata, is written as multiple 
files to a disk directory.  This operation will not fail unless disk space is 
exhausted.
(2) Placeholders for "deleted" documents are written too.
(3) Periodically (say, every 1000 documents, and also at 
notifyOfJobCompletion()), the entire disk structure is sent to Amazon as one 
packed file (?), or in whatever form is cheapest.

Operation (3) *can* fail, but when it does it is important to understand *how* 
it fails, and *why* it fails.

If it fails because of some temporary communication problem, it should just 
throw ServiceInterruption.  ManifoldCF will retry the entire operation again 
accordingly until it succeeds.

If it fails because Amazon doesn't like the contents of one or more documents 
in the package, then ManifoldCF has no good way of recovering from this.  So 
the key thing is making sure that only transient communication issues can be a 
problem with operation (3).  Is it possible to guarantee this?  If so I think 
this proposal will work well.


> Amazon CloudSearch output connector
> -----------------------------------
>
>                 Key: CONNECTORS-916
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-916
>             Project: ManifoldCF
>          Issue Type: New Feature
>          Components: Amazon CloudSearch output connector
>    Affects Versions: ManifoldCF 1.7
>            Reporter: Takumi Yoshida
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 1.7
>
>         Attachments: 0507.diff, 1.patch, 2.diff, 3.diff, 
> exception_handling.diff, exception_handling_2.diff
>
>
> I wrote some codes snipetts of output connector for Amazon CloudSearch.
> I would like you to review my code. You can crawl web site and feed HTML page 
> to Amazon CloudSearch.
> but it is not perfectly completed followoing reason.
> - does not write any codes for configuration page.
> - supporting file type is only HTML
> Thank you for your time,
>  Takumi Yoshida



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to