Hi all, I am currently planning on cleaning up the Kafka Connect adapter a little as this was implemented as part of a proof of concept and is still I a state I wouldn’t use in production ;-) But a lot has happened since then and I’m planning on making it a really usable tool in the next few days.
A lot has changed since we created the integration module QT3 2018 and I would like to refactor it to use the Scraper for the heavy lifting. Currently a user has to provide a parameter “query” which contains a comma-separated list of connection-strings with appended address. This is purely unmanageable. I would like to make it configurable via JSON or Yaml file. I think it would make sense to define groups of fields that are collected on one device at an equal rate. So it’s pretty similar to the scraper example, however I would like to not specify the source in the job, but the other way around. When specifying the “sources” I would also provide which jobs should run on a given collection. As the connector was initially showcased in a scenario where data had to be collected on a big number of PLCs with equal specs, I think this is the probably most important use-case and in this it is also probably more common to add new devices to collect standard data on than the other way around. Also should we provide the means to also set per connection to which kafka-topic the data should be sent to. We could provide the means to set a default and make it optional however. When posting to a topic we also need to provide means for partitioning, so I would provide sources with an optional “name”. Each message would not only have the data requested, but also the source-url, source-name and the job-name with a timestamp. So I guess it would look something like this: # ---------------------------------------------------------------------------- # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, # software distributed under the License is distributed on an # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY # KIND, either express or implied. See the License for the # specific language governing permissions and limitations # under the License. # ---------------------------------------------------------------------------- --- # Defaults used throughout all collections defaults: # If not specified, all data goes to this topic (optional) default-topic: some/default # Defines connections to PLCs sources: # Connection to a S7 device - name: machineA # PLC4X connection URL url: s7://1.2.3.4/1/1 jobs: # Just references the job "s7-dashboard". All data will be published to the default topic - name: s7-dashboard # References the job "s7-heartbeat", however is configures the output to go to the topic "heartbeat" - name: s7-heartbeat topic: heartbeat # Connection to a second S7 device - name: machineB url: s7://10.20.30.40/1/1 # Sets the default topic for this connection. All jobs data will go to "heartbeat" topic: heartbeat jobs: - s7-heartbeat # Connection to a Beckhoff device - name: machineC url: ads://1.2.3.4.5.6 topic: heartbeat jobs: - ads-heartbeat # Defines what should be collected how often jobs: # Defines a job to collect a set of fields on s7 devices every 500ms - name: s7-dashboard scrapeRate: 500 fields: # The key will be used in the Kafka message to identify this field, the value here contains the PLC4X address inputPreasure: %DB.DB1.4:INT outputPreasure: %Q1:BYTE temperature: %I3:INT # Defines a second job to collect a set of fields on s7 devices every 1000ms - name: s7-heartbeat scrapeRate: 1000 fields: active: %I0.2:BOOL # Defines a third job that collects data on Beckhoff devices - name: ads-heartbeat scrapeRate: 1000 fields: active: Main.running I think it should be self-explanatory with my comments inline. What do you think? Chris