[jira] [Commented] (CONNECTORS-1162) Apache Kafka Output Connector

Karl Wright (JIRA) Sat, 15 Aug 2015 02:50:07 -0700

    [ 
https://issues.apache.org/jira/browse/CONNECTORS-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14698205#comment-14698205
 ]


Karl Wright commented on CONNECTORS-1162:
-----------------------------------------

Hi [~tugbadogan],

I've reviewed the code thoroughly.  While I can't read Rafa's mind, I do have a 
couple of things we should work towards in the last week.

(1) You have a e.printStackTrace() in your exception handling.  That can't be 
in the final version.
(2) The exception handling in general looks weak.  Your code should not just 
reject documents when there is an exception.  It should try to determine 
roughly what happened.  Specifically, there are three possible responses:
- REJECT documents that Kafka cannot ever accept, due to characteristics of the 
document itself
- throw ServiceInterruption exceptions when there is some temporary issue with 
connectivity, and there is a chance that the operation will succeed if retried 
later
- throw ManifoldCFException when there is a persistent issue, e.g. 
configuration, that prevents the connection from working properly
(3) Remove the repository connection entirely from the tree, since it is not 
going to be of any use going forward
(4) Ideally, we should have an integration test for the output connector.  In 
this case this would involve setting up a temporary local instance of Kafka, 
and running a test file system crawl against it.  I don't know whether this is 
feasible but it is something that should be considered.
(5) Documentation: I will need a set of usable screen shots for the 
documentation, one for each connector tab.  These must be in .PNG format and 
should be full-screen.  I can crop them but try to keep other windows out of 
them.  I will also need a short description of any Kafka configuration 
specifics that are necessary, especially if there isn't an integration test to 
look at.

Thanks, and hope you have a good remainder for your summer!


> Apache Kafka Output Connector
> -----------------------------
>
>                 Key: CONNECTORS-1162
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1162
>             Project: ManifoldCF
>          Issue Type: Wish
>    Affects Versions: ManifoldCF 1.8.1, ManifoldCF 2.0.1
>            Reporter: Rafa Haro
>            Assignee: Karl Wright
>              Labels: gsoc, gsoc2015
>             Fix For: ManifoldCF 2.3
>
>         Attachments: 1.JPG, 2.JPG
>
>
> Kafka is a distributed, partitioned, replicated commit log service. It 
> provides the functionality of a messaging system, but with a unique design. A 
> single Kafka broker can handle hundreds of megabytes of reads and writes per 
> second from thousands of clients.
> Apache Kafka is being used for a number of uses cases. One of them is to use 
> Kafka as a feeding system for streaming BigData processes, both in Apache 
> Spark or Hadoop environment. A Kafka output connector could be used for 
> streaming or dispatching crawled documents or metadata and put them in a 
> BigData processing pipeline



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CONNECTORS-1162) Apache Kafka Output Connector

Reply via email to