[KAFKA] Refactoring the Kafka Connect plugin?

Christofer Dutz Wed, 31 Jul 2019 07:40:16 -0700

Hi all,

I am currently planning on cleaning up the Kafka Connect adapter a little as 
this was implemented as part of a proof of concept and is still I a state I 
wouldn’t use in production ;-)
But a lot has happened since then and I’m planning on making it a really usable 
tool in the next few days.


A lot has changed since we created the integration module QT3 2018 and I would 
like to refactor it to use the Scraper for the heavy lifting.

Currently a user has to provide a parameter “query” which contains a 
comma-separated list of connection-strings with appended address. This is 
purely unmanageable.

I would like to make it configurable via JSON or Yaml file.

I think it would make sense to define groups of fields that are collected on 
one device at an equal rate. So it’s pretty similar to the scraper example, 
however I would like to not specify the source in the job, but the other way 
around.
When specifying the “sources” I would also provide which jobs should run on a 
given collection.
As the connector was initially showcased in a scenario where data had to be 
collected on a big number of PLCs with equal specs,
I think this is the probably most important use-case and in this it is also 
probably more common to add new devices to collect standard data on than the 
other way around.

Also should we provide the means to also set per connection to which 
kafka-topic the data should be sent to.
We could provide the means to set a default and make it optional however.
When posting to a topic we also need to provide means for partitioning, so I 
would provide sources with an optional “name”.
Each message would not only have the data requested, but also the source-url, 
source-name and the job-name with a timestamp.

So I guess it would look something like this:

# ----------------------------------------------------------------------------
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
# ----------------------------------------------------------------------------
---
# Defaults used throughout all collections
defaults:
  # If not specified, all data goes to this topic (optional)
  default-topic: some/default

# Defines connections to PLCs
sources:
  # Connection to a S7 device
 - name: machineA
    # PLC4X connection URL
    url: s7://1.2.3.4/1/1
    jobs:
      # Just references the job "s7-dashboard". All data will be published to 
the default topic
      - name: s7-dashboard
      # References the job "s7-heartbeat", however is configures the output to 
go to the topic "heartbeat"
      - name: s7-heartbeat
        topic: heartbeat

  # Connection to a second S7 device
  - name: machineB
    url: s7://10.20.30.40/1/1
    # Sets the default topic for this connection. All jobs data will go to 
"heartbeat"
    topic: heartbeat
    jobs:
      - s7-heartbeat

  # Connection to a Beckhoff device
  - name: machineC
    url: ads://1.2.3.4.5.6
    topic: heartbeat
    jobs:
      - ads-heartbeat

# Defines what should be collected how often
jobs:
  # Defines a job to collect a set of fields on s7 devices every 500ms
  - name: s7-dashboard
    scrapeRate: 500
    fields:
      # The key will be used in the Kafka message to identify this field, the 
value here contains the PLC4X address
      inputPreasure: %DB.DB1.4:INT
      outputPreasure: %Q1:BYTE
      temperature: %I3:INT

  # Defines a second job to collect a set of fields on s7 devices every 1000ms
  - name: s7-heartbeat
    scrapeRate: 1000
    fields:
      active: %I0.2:BOOL

  # Defines a third job that collects data on Beckhoff devices
  - name: ads-heartbeat
    scrapeRate: 1000
    fields:
      active: Main.running

I think it should be self-explanatory with my comments inline.

What do you think?

Chris

[KAFKA] Refactoring the Kafka Connect plugin?

Reply via email to