Hi Chris, great that you will work on the connector.
I am not deep technical, but if you need guidance from Kafka Connect experts, I can connect you to a Confluent colleague to can help with best practices for building the connector. For example, we have implemented a wildcard option into our MQTT Connector to map MQTT Topics to Kafka Topics in a more flexible way (e.g. 1000s of cars from different MQTT Topics can be routed into 1 Kafka Topic). This might also be interesting for this connector as you expect to various PLCs. This guide might also help: https://www.confluent.io/wp-content/uploads/Verification-Guide-Confluent-Platform-Connectors-Integrations.pdf On Wed, Jul 31, 2019 at 4:39 PM Christofer Dutz <christofer.d...@c-ware.de> wrote: > Hi all, > > I am currently planning on cleaning up the Kafka Connect adapter a little > as this was implemented as part of a proof of concept and is still I a > state I wouldn’t use in production ;-) > But a lot has happened since then and I’m planning on making it a really > usable tool in the next few days. > > A lot has changed since we created the integration module QT3 2018 and I > would like to refactor it to use the Scraper for the heavy lifting. > > Currently a user has to provide a parameter “query” which contains a > comma-separated list of connection-strings with appended address. This is > purely unmanageable. > > I would like to make it configurable via JSON or Yaml file. > > I think it would make sense to define groups of fields that are collected > on one device at an equal rate. So it’s pretty similar to the scraper > example, however I would like to not specify the source in the job, but the > other way around. > When specifying the “sources” I would also provide which jobs should run > on a given collection. > As the connector was initially showcased in a scenario where data had to > be collected on a big number of PLCs with equal specs, > I think this is the probably most important use-case and in this it is > also probably more common to add new devices to collect standard data on > than the other way around. > > Also should we provide the means to also set per connection to which > kafka-topic the data should be sent to. > We could provide the means to set a default and make it optional however. > When posting to a topic we also need to provide means for partitioning, so > I would provide sources with an optional “name”. > Each message would not only have the data requested, but also the > source-url, source-name and the job-name with a timestamp. > > So I guess it would look something like this: > > # > ---------------------------------------------------------------------------- > # Licensed to the Apache Software Foundation (ASF) under one > # or more contributor license agreements. See the NOTICE file > # distributed with this work for additional information > # regarding copyright ownership. The ASF licenses this file > # to you under the Apache License, Version 2.0 (the > # "License"); you may not use this file except in compliance > # with the License. You may obtain a copy of the License at > # > # http://www.apache.org/licenses/LICENSE-2.0 > # > # Unless required by applicable law or agreed to in writing, > # software distributed under the License is distributed on an > # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY > # KIND, either express or implied. See the License for the > # specific language governing permissions and limitations > # under the License. > # > ---------------------------------------------------------------------------- > --- > # Defaults used throughout all collections > defaults: > # If not specified, all data goes to this topic (optional) > default-topic: some/default > > # Defines connections to PLCs > sources: > # Connection to a S7 device > - name: machineA > # PLC4X connection URL > url: s7://1.2.3.4/1/1 > jobs: > # Just references the job "s7-dashboard". All data will be published > to the default topic > - name: s7-dashboard > # References the job "s7-heartbeat", however is configures the > output to go to the topic "heartbeat" > - name: s7-heartbeat > topic: heartbeat > > # Connection to a second S7 device > - name: machineB > url: s7://10.20.30.40/1/1 > # Sets the default topic for this connection. All jobs data will go to > "heartbeat" > topic: heartbeat > jobs: > - s7-heartbeat > > # Connection to a Beckhoff device > - name: machineC > url: ads://1.2.3.4.5.6 > topic: heartbeat > jobs: > - ads-heartbeat > > # Defines what should be collected how often > jobs: > # Defines a job to collect a set of fields on s7 devices every 500ms > - name: s7-dashboard > scrapeRate: 500 > fields: > # The key will be used in the Kafka message to identify this field, > the value here contains the PLC4X address > inputPreasure: %DB.DB1.4:INT > outputPreasure: %Q1:BYTE > temperature: %I3:INT > > # Defines a second job to collect a set of fields on s7 devices every > 1000ms > - name: s7-heartbeat > scrapeRate: 1000 > fields: > active: %I0.2:BOOL > > # Defines a third job that collects data on Beckhoff devices > - name: ads-heartbeat > scrapeRate: 1000 > fields: > active: Main.running > > I think it should be self-explanatory with my comments inline. > > What do you think? > > Chris > >