Other benefits? Is there performance cost? Would we co-locate both our source and KafkaSouce in same JVM?
On Thursday, May 5, 2016, Jiang Weihua <[email protected]> wrote: > I will say it is a good shortcut for current usage. However we definitely > need our own source and sinks in long term. > > Sent from my iPhone > > ? 2016?5?6??06:49?Manu Zhang <[email protected] <javascript:;> > <mailto:[email protected] <javascript:;>>> ??? > > Hi Kam and others, > > Do you think it makes sense to utilize kafka-connect > <http://docs.confluent.io/2.0.0/connect/connectors.html> for source/sink ? > The topology would be like source ~> KafkaSource ~> DAG ~> KafkaSink ~> > sink. > One benefit is we always get at-least-once delivery provided by the current > KafkaSource. > Kafka provides HDFS and JDBC connector out of box and other connectors are > being contributed by the community > < > https://github.com/search?p=1&q=kafka-connect&type=Repositories&utf8=%E2%9C%93 > > > . > > On Thu, May 5, 2016 at 11:35 PM Kam Kasravi <[email protected] > <javascript:;><mailto:[email protected] <javascript:;>>> wrote: > > Hi Karol > > Good feedback, I'm not sure if GEARPUMP-116 would allow easy integration of > Redis, JMS, AMQP > from beam and akka-stream perspectives. Huafeng, Manu? > > > On Wed, May 4, 2016 at 10:34 AM, Karol Brejna <[email protected] > <javascript:;><mailto:[email protected] <javascript:;>>> > wrote: > > We have a series of jira tickets regarding Gearpump sinks/sources: > > https://issues.apache.org/jira/browse/GEARPUMP-116 - Compatibility > layer/adapter for Apache Storm > https://issues.apache.org/jira/browse/GEARPUMP-115 - Create MQTT > source/sink > https://issues.apache.org/jira/browse/GEARPUMP-106 - Gearpump Redis > Integration > https://issues.apache.org/jira/browse/GEARPUMP-105 - Provide > non-persistent > Sink Task so that examples like word count can materialize Sum results > within the Client > https://issues.apache.org/jira/browse/GEARPUMP-100 - Source task that > emits > messages per a schedule (interval or otherwise) should be provided > https://issues.apache.org/jira/browse/GEARPUMP-95 - Add parquet > datasource > and datasink connectors > https://issues.apache.org/jira/browse/GEARPUMP-91 - Apache Cassandra > Integration > > We also had a ticket for 'Add a HDFS Sink with secutiry' ( > https://github.com/gearpump/gearpump/issues/1547) - I am not sure as for > the outcome of this one. > > Most of them consider the medium (MQTT, Redis, Casandra, ...). Other talk > about the source mechanics (scheduled/repetative source). > > I'd like to discuss the order in wich we plan implementation for them. > > In my opinion Redis an MQTT (GEARPUMP-106, GEARPUMP-115) seems most > important to have. > Redis is well known and widely used. MQTT is a de facto standard in IoT > communications. > > Then I would like to have HDFS sink (if we didn't merged this already). > > Non-persistent datasink could be very useful for examples/demo purposes. > (Imagine we have capped collection that the application can send messages > to, kind of application console. In the dashboard there could be a > section > that presents lates 'console' messages. This way a user could "watch" the > application progress. Especially if he/she doesn't have access to the > backend - as it happens often in YARN mode. But this is a topic for > dedicated discussion, I think.) > > On the other hand, if we start working on GEARPUMP-116, we'd probably > quickly have Redis, JMS, AMQP sources (adapted from Storm) > > Please, let me know what do you think. > > Karol > > >
