Re: [KAFKA] Refactoring the Kafka Connect plugin?

Kai Wähner Wed, 31 Jul 2019 08:04:10 -0700

Hi Chris,

great that you will work on the connector.


I am not deep technical, but if you need guidance from Kafka Connect
experts, I can connect you to a Confluent colleague to can help with best
practices for building the connector.

For example, we have implemented a wildcard option into our MQTT Connector
to map MQTT Topics to Kafka Topics in a more flexible way (e.g. 1000s of
cars from different MQTT Topics can be routed into 1 Kafka Topic). This
might also be interesting for this connector as you expect to various PLCs.

This guide might also help:
https://www.confluent.io/wp-content/uploads/Verification-Guide-Confluent-Platform-Connectors-Integrations.pdf



On Wed, Jul 31, 2019 at 4:39 PM Christofer Dutz <christofer.d...@c-ware.de>
wrote:

> Hi all,
>
> I am currently planning on cleaning up the Kafka Connect adapter a little
> as this was implemented as part of a proof of concept and is still I a
> state I wouldn’t use in production ;-)
> But a lot has happened since then and I’m planning on making it a really
> usable tool in the next few days.
>
> A lot has changed since we created the integration module QT3 2018 and I
> would like to refactor it to use the Scraper for the heavy lifting.
>
> Currently a user has to provide a parameter “query” which contains a
> comma-separated list of connection-strings with appended address. This is
> purely unmanageable.
>
> I would like to make it configurable via JSON or Yaml file.
>
> I think it would make sense to define groups of fields that are collected
> on one device at an equal rate. So it’s pretty similar to the scraper
> example, however I would like to not specify the source in the job, but the
> other way around.
> When specifying the “sources” I would also provide which jobs should run
> on a given collection.
> As the connector was initially showcased in a scenario where data had to
> be collected on a big number of PLCs with equal specs,
> I think this is the probably most important use-case and in this it is
> also probably more common to add new devices to collect standard data on
> than the other way around.
>
> Also should we provide the means to also set per connection to which
> kafka-topic the data should be sent to.
> We could provide the means to set a default and make it optional however.
> When posting to a topic we also need to provide means for partitioning, so
> I would provide sources with an optional “name”.
> Each message would not only have the data requested, but also the
> source-url, source-name and the job-name with a timestamp.
>
> So I guess it would look something like this:
>
> #
> ----------------------------------------------------------------------------
> # Licensed to the Apache Software Foundation (ASF) under one
> # or more contributor license agreements.  See the NOTICE file
> # distributed with this work for additional information
> # regarding copyright ownership.  The ASF licenses this file
> # to you under the Apache License, Version 2.0 (the
> # "License"); you may not use this file except in compliance
> # with the License.  You may obtain a copy of the License at
> #
> #    http://www.apache.org/licenses/LICENSE-2.0
> #
> # Unless required by applicable law or agreed to in writing,
> # software distributed under the License is distributed on an
> # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
> # KIND, either express or implied.  See the License for the
> # specific language governing permissions and limitations
> # under the License.
> #
> ----------------------------------------------------------------------------
> ---
> # Defaults used throughout all collections
> defaults:
>   # If not specified, all data goes to this topic (optional)
>   default-topic: some/default
>
> # Defines connections to PLCs
> sources:
>   # Connection to a S7 device
>  - name: machineA
>     # PLC4X connection URL
>     url: s7://1.2.3.4/1/1
>     jobs:
>       # Just references the job "s7-dashboard". All data will be published
> to the default topic
>       - name: s7-dashboard
>       # References the job "s7-heartbeat", however is configures the
> output to go to the topic "heartbeat"
>       - name: s7-heartbeat
>         topic: heartbeat
>
>   # Connection to a second S7 device
>   - name: machineB
>     url: s7://10.20.30.40/1/1
>     # Sets the default topic for this connection. All jobs data will go to
> "heartbeat"
>     topic: heartbeat
>     jobs:
>       - s7-heartbeat
>
>   # Connection to a Beckhoff device
>   - name: machineC
>     url: ads://1.2.3.4.5.6
>     topic: heartbeat
>     jobs:
>       - ads-heartbeat
>
> # Defines what should be collected how often
> jobs:
>   # Defines a job to collect a set of fields on s7 devices every 500ms
>   - name: s7-dashboard
>     scrapeRate: 500
>     fields:
>       # The key will be used in the Kafka message to identify this field,
> the value here contains the PLC4X address
>       inputPreasure: %DB.DB1.4:INT
>       outputPreasure: %Q1:BYTE
>       temperature: %I3:INT
>
>   # Defines a second job to collect a set of fields on s7 devices every
> 1000ms
>   - name: s7-heartbeat
>     scrapeRate: 1000
>     fields:
>       active: %I0.2:BOOL
>
>   # Defines a third job that collects data on Beckhoff devices
>   - name: ads-heartbeat
>     scrapeRate: 1000
>     fields:
>       active: Main.running
>
> I think it should be self-explanatory with my comments inline.
>
> What do you think?
>
> Chris
>
>

Re: [KAFKA] Refactoring the Kafka Connect plugin?

Reply via email to