Hi Cristofer, Looking at the other e-mail that you sent for the work that has happened, looks like a lot of great progress has been made.
I just got caught up with some other things so could never start off post our discussions here :( Wanted to understand, once you have some bandwidth, what are the next steps with the k-connect integration? Can I sync up with someone and start looking at some of the pieces? Thanks! Sagar. On Thu, Aug 30, 2018 at 12:47 AM Christofer Dutz <christofer.d...@c-ware.de> wrote: > Hi Sagar, > > thanks for the Infos ... this way I learn more and more :-) > > Looking forward to answering the questions as they come. > > Chris > > Am 29.08.18, 19:28 schrieb "Sagar" <sagarmeansoc...@gmail.com>: > > Hi Chris, > > Thanks. Typically kafka cluster will be separate set of nodes. And so > would > be the k-connect workers which will connect to the PLC devices or > databases > or whatever is the source and push to Kafka. > > I will start off with this information and extend your feature branch. > Would keep asking questions along the way > > Sagar. > > On Wed, Aug 29, 2018 at 7:51 PM Christofer Dutz < > christofer.d...@c-ware.de> > wrote: > > > Hi Sagar, > > > > Great that we seem to be on the same page now ;-) > > > > Regarding the "Kafka Connecting" ... what I meant is that the > > Kafca-Connect-PLC4X-Instance connects ... I was assuming the driver > to be > > running on a Kafka Node, but that's just due to my limited knowledge > of > > everything ;-) > > > > Well the code for actively reading stuff from a PLC should already > be in > > my example implementation. It should work out of the box this way > ... As I > > have seen several Mock Drivers implemented in PLC4X, I am currently > > thinking of implementing one that you should be able to just import > and use > > ... however I'm currently working hard on refactoring the API > completely, > > so I would postpone that to after these changes are in there. But > rest > > assured ... I would handle the refactoring so you could just assume > that it > > works. > > > > Alternatively I could have you an account for our IoT VPN created. > Then > > you could log-in to our VPN and talk to some real PLCs ... > > > > I think I wanted to create an account for Julian, but my guy > responsible > > for creating them was on holidays ... will re-check this. > > > > Chris > > > > > > > > Am 29.08.18, 16:11 schrieb "Sagar" <sagarmeansoc...@gmail.com>: > > > > Hi Chris, > > > > That's perfectly fine :) > > > > So, the way I understand this now is, we will have a bunch of > worker > > nodes(in kafka connect terminology, a worker is a JVM process > which > > runs a > > set of connectors/tasks to poll a source and push data to Kafka). > > > > So, vis-a-vis a JDBC connection, we will have a connection URL > which > > will > > let us connect to these PLC devices poll(poll in the sense you > meant it > > above), and then push data to Kafka. If this looks fine, then > can you > > give > > me some documentation to refer to and also how can I start > testing > > these? > > > > And just one thing I wanted to clarify when you say Kafka nodes > > connecting > > to devices. That's something which doesn't happen. Kafka doesn't > > connect to > > any device. I think you just mean it in a more abstract way > right? > > > > @Julian, > > > > Thanks, I was going through the link you sent. So, you're saying > via > > scraping, we can push events to Kafka? Is that already happening > and we > > should look to move this functionality out? > > > > Thanks! > > Sagar. > > > > > > On Wed, Aug 29, 2018 at 11:05 AM Julian Feinauer < > > j.feina...@pragmaticminds.de> wrote: > > > > > Hey Sagar, > > > > > > hey Chris, > > > > > > > > > > > > I want to join your discussion for part b3 as this is > something we > > usually > > > require. > > > > > > We use a module we call the plc-scraper for that task (the term > > scraping > > > in that content is borrowed from Prometheus where it is used > > extensively in > > > this context [1]). > > > > > > Generally speaking the scraper takes a config containing > addresses, > > > addresses and scrape rates and runs than as daemon and pushes > the > > scrape > > > results downstream (usually Kafka but we also use other > "Queues"). > > > > > > > > > > > > As we are currently rewriting this scraper I already considered > > donating > > > it to plc4x in the form of an example or perhaps even as a > standalone > > > module. > > > > > > > > > > > > So I agree with Chris that this should not be part of the > PlcDriver > > Level > > > but rather on another layer "on top" and I would be more > interested > > in the > > > specification of a "line protocol" which describes how message > are > > > serialized for Kafka (or other sources). > > > > > > Can we come up with a common "schema" which fits many use > cases? > > > > > > > > > > > > Our messages contain the following informations: > > > > > > - timestamp > > > > > > - source > > > > > > - values > > > > > > - additional tags > > > > > > > > > > > > Best > > > > > > Julian > > > > > > > > > > > > [1] > https://prometheus.io/docs/prometheus/latest/getting_started/ > > > > > > > > > > > > Am 28.08.18, 22:53 schrieb "Christofer Dutz" < > > christofer.d...@c-ware.de>: > > > > > > > > > > > > Hi Sagar, > > > > > > > > > > > > sorry for not responding ... your mail must have skipped > my eye > > ... > > > sorry for that. > > > > > > > > > > > > a) PLC4X works exactly the same way ... it consists of > > plc4x-core, > > > which only contains the DriverManager and plc4x-api which > contains > > the API > > > clases. So with these two jars you can build a full PLC4X > > application. > > > > > > In order to connect to a PLC you need to add the jar > containing > > the > > > required driver to the classpath. > > > > > > > > > > > > b) Well if using something other than the connection url as > > partition > > > key as partition key, it is possible that multiple kafka > connect > > nodes > > > would connect to the same PLC. In that case it could be a > problem to > > > control the order. I guess using the timestamp when receiving a > > response > > > (Probably generated by the KC plc4x driver) could be a valid > > approach. > > > > > > > > > > > > b2) Regarding the infinite loop ... I think we won't need > such a > > > mechanism. If we think of a set of fields from a PLC, we can > think > > of a PLC > > > as a one-row database table. Producing diffs should be a lot > simpler > > that > > > way. > > > > > > > > > > > > b3) Regarding push events ... PLC4X has a subscription > mode next > > to > > > the polling. So it would be possible to also define PLC4X > > datasources that > > > actively push events to kafka ... I have to admit that this > would be > > the > > > mode I would prefer most. But as not all protocols and PLCs > support > > this > > > mode, I think it would be safest to use polling and to add push > > support > > > after that. > > > > > > > > > > > > Regarding different languages: Currently we are > concentrating > > mainly > > > on the Java implementation as it's the biggest challenge to > > understand and > > > implement the protocols. Porting them to other languages > (especially > > C and > > > C++) shouldn't be as hard as implementing the first version. > But > > that's > > > currently a base uncovered as we don't have the resources to > > implement all > > > of them at once. > > > > > > > > > > > > And there are no stupid questions :-) > > > > > > > > > > > > Hope I could answer all of yours. If not, just ask and I'll > > probably > > > not miss that one ;-) > > > > > > > > > > > > Chris > > > > > > > > > > > > > > > > > > Am 23.08.18, 19:52 schrieb "Sagar" < > sagarmeansoc...@gmail.com>: > > > > > > > > > > > > Hi Chirstofer, > > > > > > > > > > > > Thanks for the detailed responses. I would like to ask > a > > couple of > > > more > > > > > > questions(which may be borderline naive or stupid :D ). > > > > > > > > > > > > First thing that I would like to know- ignore my lack > of > > knowledge > > > on PLCs- > > > > > > but from what I understand are devices which are small > > devices > > > used to > > > > > > execute program instructions. These would have very > small > > memory > > > footprints > > > > > > as well I believe? Also, when you say the Siemens one > can > > handle 20 > > > > > > connections, would it be from different devices > connecting > > to it? > > > The > > > > > > reason I ask these questions are these -> > > > > > > > > > > > > a) The way the kafka-connect framework is executed is > by > > > installing the > > > > > > whole framework with all the relevant jars needed on > the > > > classpath. So, if > > > > > > you talk about the JDBC connector for K-Connect, it > would > > need the > > > mysql > > > > > > driver jar(for example) and other jars needed to > support the > > > framework. If > > > > > > we say choose to use avro, then we would need more > jars to > > support > > > that. > > > > > > Would we be able to install all that? > > > > > > > > > > > > b) Also, if multiple devices do connect to it, then > won't we > > have > > > events > > > > > > arriving out of order from them? Does the ordering > matter > > amongst > > > events > > > > > > that are being pushed? > > > > > > > > > > > > Regarding the infinite loop question, the reason JDBC > > connector > > > uses that > > > > > > is that it creates tasks for a given table and fires > queries > > to > > > find > > > > > > deltas. So, if the polling frequency is 2 seconds, and > it > > last ran > > > on > > > > > > 12.00.00 then it would run at 12.00.02 to figure out > what > > changed > > > in that > > > > > > time frame. So, the way PlcReaders read() runs, would > it keep > > > returning > > > > > > newer data? > > > > > > > > > > > > We can skip over the rest of the parts, but looking at > parts > > a and > > > b above, > > > > > > would it make sense to have something like a > kafka-connect > > > framework for > > > > > > pushing data to Kafka? Also, from the github link, the > > drivers are > > > to be > > > > > > supported in 3 languages as well. How would that play > out? > > > > > > > > > > > > Again- apologies if the questions seem stupid. > > > > > > > > > > > > Thanks! > > > > > > Sagar. > > > > > > > > > > > > On Wed, Aug 22, 2018 at 10:39 PM Christofer Dutz < > > > christofer.d...@c-ware.de> > > > > > > wrote: > > > > > > > > > > > > > Hi Sagar, > > > > > > > > > > > > > > great that you managed to have a look ... I'll try to > > answer your > > > > > > > questions. > > > > > > > (I like to answer them postfix as whenever emails > are sort > > of > > > answered > > > > > > > in-line, they are extremely hard to read and follow > on > > mobile > > > email clients > > > > > > > __ ) > > > > > > > > > > > > > > First of all I created the original plugin via the > > archetype for > > > > > > > kafka-connect plugins. The next thing I did, was to > have a > > look > > > at the code > > > > > > > of the JDBC Kafka Connect plugin (as you might have > > guessed) as > > > I thought > > > > > > > that it would have similar structure as we do. > > Unfortunately I > > > think the > > > > > > > JDBC plugin is far more complex than the plc4x > connector > > will > > > have to be. I > > > > > > > sort of picked some of the things I liked with the > > archetype and > > > some I > > > > > > > liked with the jdbc ... if there was a third, even > cooler > > option > > > ... I will > > > > > > > definitely have missed that. So if you think there > is a > > thing > > > worth > > > > > > > changing ... you can change anything you like. > > > > > > > > > > > > > > 1) > > > > > > > The code of the jdbc plugin showed such a > while(true) loop, > > > however I > > > > > > > think this was because the jdbc query could return a > lot > > of rows > > > and hereby > > > > > > > Kafka events. In our case we have one request and > get one > > > response. The > > > > > > > code in my example directly calls "get()" on the > request > > and is > > > hereby > > > > > > > blocking. I don't know if this is good, but from > reading > > the > > > jdbc example, > > > > > > > this should be blocking too ... > > > > > > > So the PlcReaders read() method returns a completable > > future ... > > > this > > > > > > > could be completed asynchronously and the callback > could > > fire > > > the kafka > > > > > > > events, but I didn't know if this was ok with kafka. > If it > > is > > > possible, > > > > > > > please have a look at this example code: > > > > > > > > > > > > > https://github.com/apache/incubator-plc4x/blob/master/plc4j/protocols/s7/src/test/java/org/apache/plc4x/java/s7/S7PlcReaderSample.java > > > > > > > It demonstrates with comments the different usage > types. > > > > > > > > > > > > > > While at it ... is there also an option for a Kafka > > connector > > > that is able > > > > > > > to push data? So if an incoming event arrives, this > is > > > automatically pushed > > > > > > > without a fixed polling interval? > > > > > > > > > > > > > > 2) > > > > > > > I have absolutely no idea as I am not quite familiar > with > > the > > > concepts > > > > > > > inside kafka. What I do know is that probably the > > partition-key > > > should be > > > > > > > based upon the connection url. The problem is, that > with > > kafka I > > > could have > > > > > > > 1000 nodes connecting to one PLC. While Kafka > wouldn't have > > > problems with > > > > > > > that, the PLCs have very limited resources. So as > far as I > > > decoded the > > > > > > > responses of my Siemens S7 1200 it can handle up to > 20 > > > connections (Usually > > > > > > > a control-system already consuming 2-3 of them) ... I > > think it > > > would be > > > > > > > ideal, if on one Kafka node (or partition) there > would be > > one > > > PlcConnection > > > > > > > ... this connection should then be shared among all > > requests to > > > a PLC with > > > > > > > a shared connection url (I hope I'm not writing > nonsense). > > So if > > > a > > > > > > > workerTask is responsible for managing all request > to one > > > partition, then > > > > > > > I'd say it should be 1 ... otherwise the number > could be > > bigger. > > > > > > > > > > > > > > If it makes things easier, I'm absolutely fine with > using > > those > > > > > > > ConnectorUtils > > > > > > > > > > > > > > Regarding the connector offsets ... are you > referring to > > that > > > counter > > > > > > > Kafka uses to let the clients know the sequence of > events > > and > > > which they > > > > > > > use to sort of say: "Hi, I have number 237367 of > topic > > 'ABC', > > > plese > > > > > > > continue" ... is that what you are referring to? If > it is, > > well > > > ... I have > > > > > > > to admit ... I don't know ... ok ... if it isn't then > > probably > > > also ;-) > > > > > > > How do other plugins do this? > > > > > > > > > > > > > > 3) > > > > > > > Well I guess both options would be cool ... JSON is > > definitely > > > simpler, > > > > > > > but for high volume transports the binary > counterparts > > > definitely are worth > > > > > > > consideration. Currently PLC4X tries to deliver what > you > > > request, but > > > > > > > that's actually something we're currently discussing > on > > > refactoring. But > > > > > > > for the moment - as shown in the example code I > referenced > > a few > > > lines > > > > > > > above - you do a TypedRequest and for example ask > for an > > > Integer, then you > > > > > > > will receive an array (probably of size 1) of > Integers. > > > > > > > > > > > > > > 4) > > > > > > > Well I agree ... well at least I can't even say that > I > > make a > > > secret about > > > > > > > where I stole things from ;-) > > > > > > > > > > > > > > If I can be of any assistance ... just ask. > > > > > > > > > > > > > > Thanks for taking the time. > > > > > > > > > > > > > > Chris > > > > > > > > > > > > > > > > > > > > > > > > > > > > Am 22.08.18, 17:55 schrieb "Sagar" < > > sagarmeansoc...@gmail.com>: > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > I was going through the K-Connect stubs created > by > > Chris in > > > the kafka > > > > > > > feature branch. > > > > > > > > > > > > > > Some of the findings I found are here(let me > know if > > they > > > are valid or > > > > > > > not): > > > > > > > > > > > > > > 1) > > > > > > > > > > > > > > > > > > > > https://github.com/apache/incubator-plc4x/blob/feature/apache-kafka/integrations/apache-kafka/src/main/java/org/apache/plc4x/kafka/source/Plc4xSourceTask.java#L98 > > > > > > > > > > > > > > Should this block of code be within an infinite > loop > > like > > > while(true)? > > > > > > > I am > > > > > > > not exactly sure of the semantics of the > PlcReader > > hence > > > asking this > > > > > > > question. > > > > > > > > > > > > > > 2) Another question is, what are the maxTasks > that we > > > envision here? > > > > > > > > > > > > > > > > > > > > https://github.com/apache/incubator-plc4x/blob/feature/apache-kafka/integrations/apache-kafka/src/main/java/org/apache/plc4x/kafka/Plc4xSourceConnector.java#L46 > > > > > > > > > > > > > > Also, as part of documentation, there's a utility > > called > > > ConnectorUtils > > > > > > > which typically should be used to create the > > configs(not a > > > hard and > > > > > > > fast > > > > > > > rule though): > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.confluent.io/current/connect/javadocs/index.html?org/apache/kafka/connect/util/ConnectorUtils.html > > > > > > > > > > > > > > If we go that route, then we also need to > specify how > > the > > > offsets > > > > > > > would be > > > > > > > stored in the offsets topic(by using the task > name). > > So, if > > > it can be > > > > > > > figured out as to how would the connectors be > setup, > > then > > > that'll be > > > > > > > helpful. > > > > > > > > > > > > > > 3) While building the SourceRecord -> > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/incubator-plc4x/blob/feature/apache-kafka/integrations/apache-kafka/src/main/java/org/apache/plc4x/kafka/source/Plc4xSourceTask.java#L109 > > > > > > > > > > > > > > , we would also need some DataConverter layer to > have > > them > > > mapped to > > > > > > > the > > > > > > > connect types. Also, which message types would be > > supported? > > > Json or > > > > > > > binary > > > > > > > protocols like Avro/protobuf etc or some other > > protocols? > > > Those things > > > > > > > might also need to be factored in. > > > > > > > > > > > > > > 4) Lastly, need to remove the JdbcSourceTask > from the > > catch > > > block here > > > > > > > :) -> > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/incubator-plc4x/blob/feature/apache-kafka/integrations/apache-kafka/src/main/java/org/apache/plc4x/kafka/source/Plc4xSourceTask.java#L67 > > > > > > > > > > > > > > Thanks! > > > > > > > Sagar. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >