[ 
https://issues.apache.org/jira/browse/CONNECTORS-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595490#comment-14595490
 ] 

Karl Wright commented on CONNECTORS-1162:
-----------------------------------------

Offhand, you do seem to have a callback to synchronize on:

{code}
send(ProducerRecord<K,V> record, Callback callback) 
{code}

So the question is, what information comes back in the callback?  And, does 
case (3) apply?  The reason that is an important question is because there are 
a fixed number of ManifoldCF worker threads, and if they are all waiting on a 
queue in Kafka, but Kafka is waiting for more documents, then you have a 
deadlock situation.  So we need to know that, although it is likely you will 
find it out if you just try it. ;-)



> Apache Kafka Output Connector
> -----------------------------
>
>                 Key: CONNECTORS-1162
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1162
>             Project: ManifoldCF
>          Issue Type: Wish
>    Affects Versions: ManifoldCF 1.8.1, ManifoldCF 2.0.1
>            Reporter: Rafa Haro
>            Assignee: Karl Wright
>              Labels: gsoc, gsoc2015
>             Fix For: ManifoldCF 1.10, ManifoldCF 2.2
>
>         Attachments: 1.JPG, 2.JPG
>
>
> Kafka is a distributed, partitioned, replicated commit log service. It 
> provides the functionality of a messaging system, but with a unique design. A 
> single Kafka broker can handle hundreds of megabytes of reads and writes per 
> second from thousands of clients.
> Apache Kafka is being used for a number of uses cases. One of them is to use 
> Kafka as a feeding system for streaming BigData processes, both in Apache 
> Spark or Hadoop environment. A Kafka output connector could be used for 
> streaming or dispatching crawled documents or metadata and put them in a 
> BigData processing pipeline



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to