Hi Chris, Thanks. Typically kafka cluster will be separate set of nodes. And so would be the k-connect workers which will connect to the PLC devices or databases or whatever is the source and push to Kafka.
I will start off with this information and extend your feature branch. Would keep asking questions along the way Sagar. On Wed, Aug 29, 2018 at 7:51 PM Christofer Dutz <christofer.d...@c-ware.de> wrote: > Hi Sagar, > > Great that we seem to be on the same page now ;-) > > Regarding the "Kafka Connecting" ... what I meant is that the > Kafca-Connect-PLC4X-Instance connects ... I was assuming the driver to be > running on a Kafka Node, but that's just due to my limited knowledge of > everything ;-) > > Well the code for actively reading stuff from a PLC should already be in > my example implementation. It should work out of the box this way ... As I > have seen several Mock Drivers implemented in PLC4X, I am currently > thinking of implementing one that you should be able to just import and use > ... however I'm currently working hard on refactoring the API completely, > so I would postpone that to after these changes are in there. But rest > assured ... I would handle the refactoring so you could just assume that it > works. > > Alternatively I could have you an account for our IoT VPN created. Then > you could log-in to our VPN and talk to some real PLCs ... > > I think I wanted to create an account for Julian, but my guy responsible > for creating them was on holidays ... will re-check this. > > Chris > > > > Am 29.08.18, 16:11 schrieb "Sagar" <sagarmeansoc...@gmail.com>: > > Hi Chris, > > That's perfectly fine :) > > So, the way I understand this now is, we will have a bunch of worker > nodes(in kafka connect terminology, a worker is a JVM process which > runs a > set of connectors/tasks to poll a source and push data to Kafka). > > So, vis-a-vis a JDBC connection, we will have a connection URL which > will > let us connect to these PLC devices poll(poll in the sense you meant it > above), and then push data to Kafka. If this looks fine, then can you > give > me some documentation to refer to and also how can I start testing > these? > > And just one thing I wanted to clarify when you say Kafka nodes > connecting > to devices. That's something which doesn't happen. Kafka doesn't > connect to > any device. I think you just mean it in a more abstract way right? > > @Julian, > > Thanks, I was going through the link you sent. So, you're saying via > scraping, we can push events to Kafka? Is that already happening and we > should look to move this functionality out? > > Thanks! > Sagar. > > > On Wed, Aug 29, 2018 at 11:05 AM Julian Feinauer < > j.feina...@pragmaticminds.de> wrote: > > > Hey Sagar, > > > > hey Chris, > > > > > > > > I want to join your discussion for part b3 as this is something we > usually > > require. > > > > We use a module we call the plc-scraper for that task (the term > scraping > > in that content is borrowed from Prometheus where it is used > extensively in > > this context [1]). > > > > Generally speaking the scraper takes a config containing addresses, > > addresses and scrape rates and runs than as daemon and pushes the > scrape > > results downstream (usually Kafka but we also use other "Queues"). > > > > > > > > As we are currently rewriting this scraper I already considered > donating > > it to plc4x in the form of an example or perhaps even as a standalone > > module. > > > > > > > > So I agree with Chris that this should not be part of the PlcDriver > Level > > but rather on another layer "on top" and I would be more interested > in the > > specification of a "line protocol" which describes how message are > > serialized for Kafka (or other sources). > > > > Can we come up with a common "schema" which fits many use cases? > > > > > > > > Our messages contain the following informations: > > > > - timestamp > > > > - source > > > > - values > > > > - additional tags > > > > > > > > Best > > > > Julian > > > > > > > > [1] https://prometheus.io/docs/prometheus/latest/getting_started/ > > > > > > > > Am 28.08.18, 22:53 schrieb "Christofer Dutz" < > christofer.d...@c-ware.de>: > > > > > > > > Hi Sagar, > > > > > > > > sorry for not responding ... your mail must have skipped my eye > ... > > sorry for that. > > > > > > > > a) PLC4X works exactly the same way ... it consists of > plc4x-core, > > which only contains the DriverManager and plc4x-api which contains > the API > > clases. So with these two jars you can build a full PLC4X > application. > > > > In order to connect to a PLC you need to add the jar containing > the > > required driver to the classpath. > > > > > > > > b) Well if using something other than the connection url as > partition > > key as partition key, it is possible that multiple kafka connect > nodes > > would connect to the same PLC. In that case it could be a problem to > > control the order. I guess using the timestamp when receiving a > response > > (Probably generated by the KC plc4x driver) could be a valid > approach. > > > > > > > > b2) Regarding the infinite loop ... I think we won't need such a > > mechanism. If we think of a set of fields from a PLC, we can think > of a PLC > > as a one-row database table. Producing diffs should be a lot simpler > that > > way. > > > > > > > > b3) Regarding push events ... PLC4X has a subscription mode next > to > > the polling. So it would be possible to also define PLC4X > datasources that > > actively push events to kafka ... I have to admit that this would be > the > > mode I would prefer most. But as not all protocols and PLCs support > this > > mode, I think it would be safest to use polling and to add push > support > > after that. > > > > > > > > Regarding different languages: Currently we are concentrating > mainly > > on the Java implementation as it's the biggest challenge to > understand and > > implement the protocols. Porting them to other languages (especially > C and > > C++) shouldn't be as hard as implementing the first version. But > that's > > currently a base uncovered as we don't have the resources to > implement all > > of them at once. > > > > > > > > And there are no stupid questions :-) > > > > > > > > Hope I could answer all of yours. If not, just ask and I'll > probably > > not miss that one ;-) > > > > > > > > Chris > > > > > > > > > > > > Am 23.08.18, 19:52 schrieb "Sagar" <sagarmeansoc...@gmail.com>: > > > > > > > > Hi Chirstofer, > > > > > > > > Thanks for the detailed responses. I would like to ask a > couple of > > more > > > > questions(which may be borderline naive or stupid :D ). > > > > > > > > First thing that I would like to know- ignore my lack of > knowledge > > on PLCs- > > > > but from what I understand are devices which are small > devices > > used to > > > > execute program instructions. These would have very small > memory > > footprints > > > > as well I believe? Also, when you say the Siemens one can > handle 20 > > > > connections, would it be from different devices connecting > to it? > > The > > > > reason I ask these questions are these -> > > > > > > > > a) The way the kafka-connect framework is executed is by > > installing the > > > > whole framework with all the relevant jars needed on the > > classpath. So, if > > > > you talk about the JDBC connector for K-Connect, it would > need the > > mysql > > > > driver jar(for example) and other jars needed to support the > > framework. If > > > > we say choose to use avro, then we would need more jars to > support > > that. > > > > Would we be able to install all that? > > > > > > > > b) Also, if multiple devices do connect to it, then won't we > have > > events > > > > arriving out of order from them? Does the ordering matter > amongst > > events > > > > that are being pushed? > > > > > > > > Regarding the infinite loop question, the reason JDBC > connector > > uses that > > > > is that it creates tasks for a given table and fires queries > to > > find > > > > deltas. So, if the polling frequency is 2 seconds, and it > last ran > > on > > > > 12.00.00 then it would run at 12.00.02 to figure out what > changed > > in that > > > > time frame. So, the way PlcReaders read() runs, would it keep > > returning > > > > newer data? > > > > > > > > We can skip over the rest of the parts, but looking at parts > a and > > b above, > > > > would it make sense to have something like a kafka-connect > > framework for > > > > pushing data to Kafka? Also, from the github link, the > drivers are > > to be > > > > supported in 3 languages as well. How would that play out? > > > > > > > > Again- apologies if the questions seem stupid. > > > > > > > > Thanks! > > > > Sagar. > > > > > > > > On Wed, Aug 22, 2018 at 10:39 PM Christofer Dutz < > > christofer.d...@c-ware.de> > > > > wrote: > > > > > > > > > Hi Sagar, > > > > > > > > > > great that you managed to have a look ... I'll try to > answer your > > > > > questions. > > > > > (I like to answer them postfix as whenever emails are sort > of > > answered > > > > > in-line, they are extremely hard to read and follow on > mobile > > email clients > > > > > __ ) > > > > > > > > > > First of all I created the original plugin via the > archetype for > > > > > kafka-connect plugins. The next thing I did, was to have a > look > > at the code > > > > > of the JDBC Kafka Connect plugin (as you might have > guessed) as > > I thought > > > > > that it would have similar structure as we do. > Unfortunately I > > think the > > > > > JDBC plugin is far more complex than the plc4x connector > will > > have to be. I > > > > > sort of picked some of the things I liked with the > archetype and > > some I > > > > > liked with the jdbc ... if there was a third, even cooler > option > > ... I will > > > > > definitely have missed that. So if you think there is a > thing > > worth > > > > > changing ... you can change anything you like. > > > > > > > > > > 1) > > > > > The code of the jdbc plugin showed such a while(true) loop, > > however I > > > > > think this was because the jdbc query could return a lot > of rows > > and hereby > > > > > Kafka events. In our case we have one request and get one > > response. The > > > > > code in my example directly calls "get()" on the request > and is > > hereby > > > > > blocking. I don't know if this is good, but from reading > the > > jdbc example, > > > > > this should be blocking too ... > > > > > So the PlcReaders read() method returns a completable > future ... > > this > > > > > could be completed asynchronously and the callback could > fire > > the kafka > > > > > events, but I didn't know if this was ok with kafka. If it > is > > possible, > > > > > please have a look at this example code: > > > > > > > > https://github.com/apache/incubator-plc4x/blob/master/plc4j/protocols/s7/src/test/java/org/apache/plc4x/java/s7/S7PlcReaderSample.java > > > > > It demonstrates with comments the different usage types. > > > > > > > > > > While at it ... is there also an option for a Kafka > connector > > that is able > > > > > to push data? So if an incoming event arrives, this is > > automatically pushed > > > > > without a fixed polling interval? > > > > > > > > > > 2) > > > > > I have absolutely no idea as I am not quite familiar with > the > > concepts > > > > > inside kafka. What I do know is that probably the > partition-key > > should be > > > > > based upon the connection url. The problem is, that with > kafka I > > could have > > > > > 1000 nodes connecting to one PLC. While Kafka wouldn't have > > problems with > > > > > that, the PLCs have very limited resources. So as far as I > > decoded the > > > > > responses of my Siemens S7 1200 it can handle up to 20 > > connections (Usually > > > > > a control-system already consuming 2-3 of them) ... I > think it > > would be > > > > > ideal, if on one Kafka node (or partition) there would be > one > > PlcConnection > > > > > ... this connection should then be shared among all > requests to > > a PLC with > > > > > a shared connection url (I hope I'm not writing nonsense). > So if > > a > > > > > workerTask is responsible for managing all request to one > > partition, then > > > > > I'd say it should be 1 ... otherwise the number could be > bigger. > > > > > > > > > > If it makes things easier, I'm absolutely fine with using > those > > > > > ConnectorUtils > > > > > > > > > > Regarding the connector offsets ... are you referring to > that > > counter > > > > > Kafka uses to let the clients know the sequence of events > and > > which they > > > > > use to sort of say: "Hi, I have number 237367 of topic > 'ABC', > > plese > > > > > continue" ... is that what you are referring to? If it is, > well > > ... I have > > > > > to admit ... I don't know ... ok ... if it isn't then > probably > > also ;-) > > > > > How do other plugins do this? > > > > > > > > > > 3) > > > > > Well I guess both options would be cool ... JSON is > definitely > > simpler, > > > > > but for high volume transports the binary counterparts > > definitely are worth > > > > > consideration. Currently PLC4X tries to deliver what you > > request, but > > > > > that's actually something we're currently discussing on > > refactoring. But > > > > > for the moment - as shown in the example code I referenced > a few > > lines > > > > > above - you do a TypedRequest and for example ask for an > > Integer, then you > > > > > will receive an array (probably of size 1) of Integers. > > > > > > > > > > 4) > > > > > Well I agree ... well at least I can't even say that I > make a > > secret about > > > > > where I stole things from ;-) > > > > > > > > > > If I can be of any assistance ... just ask. > > > > > > > > > > Thanks for taking the time. > > > > > > > > > > Chris > > > > > > > > > > > > > > > > > > > > Am 22.08.18, 17:55 schrieb "Sagar" < > sagarmeansoc...@gmail.com>: > > > > > > > > > > Hi All, > > > > > > > > > > I was going through the K-Connect stubs created by > Chris in > > the kafka > > > > > feature branch. > > > > > > > > > > Some of the findings I found are here(let me know if > they > > are valid or > > > > > not): > > > > > > > > > > 1) > > > > > > > > > > > > > https://github.com/apache/incubator-plc4x/blob/feature/apache-kafka/integrations/apache-kafka/src/main/java/org/apache/plc4x/kafka/source/Plc4xSourceTask.java#L98 > > > > > > > > > > Should this block of code be within an infinite loop > like > > while(true)? > > > > > I am > > > > > not exactly sure of the semantics of the PlcReader > hence > > asking this > > > > > question. > > > > > > > > > > 2) Another question is, what are the maxTasks that we > > envision here? > > > > > > > > > > > > > https://github.com/apache/incubator-plc4x/blob/feature/apache-kafka/integrations/apache-kafka/src/main/java/org/apache/plc4x/kafka/Plc4xSourceConnector.java#L46 > > > > > > > > > > Also, as part of documentation, there's a utility > called > > ConnectorUtils > > > > > which typically should be used to create the > configs(not a > > hard and > > > > > fast > > > > > rule though): > > > > > > > > > > > > > > > > > > https://docs.confluent.io/current/connect/javadocs/index.html?org/apache/kafka/connect/util/ConnectorUtils.html > > > > > > > > > > If we go that route, then we also need to specify how > the > > offsets > > > > > would be > > > > > stored in the offsets topic(by using the task name). > So, if > > it can be > > > > > figured out as to how would the connectors be setup, > then > > that'll be > > > > > helpful. > > > > > > > > > > 3) While building the SourceRecord -> > > > > > > > > > > > > > > > > > > https://github.com/apache/incubator-plc4x/blob/feature/apache-kafka/integrations/apache-kafka/src/main/java/org/apache/plc4x/kafka/source/Plc4xSourceTask.java#L109 > > > > > > > > > > , we would also need some DataConverter layer to have > them > > mapped to > > > > > the > > > > > connect types. Also, which message types would be > supported? > > Json or > > > > > binary > > > > > protocols like Avro/protobuf etc or some other > protocols? > > Those things > > > > > might also need to be factored in. > > > > > > > > > > 4) Lastly, need to remove the JdbcSourceTask from the > catch > > block here > > > > > :) -> > > > > > > > > > > > > > > > > > > https://github.com/apache/incubator-plc4x/blob/feature/apache-kafka/integrations/apache-kafka/src/main/java/org/apache/plc4x/kafka/source/Plc4xSourceTask.java#L67 > > > > > > > > > > Thanks! > > > > > Sagar. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >