Re: [KAFKA] Refactoring the Kafka Connect plugin?

Christofer Dutz Tue, 13 Aug 2019 06:53:07 -0700

Hi all,

so while I was working on the Kafka Sink I sort of started thinking if such a 
thing is a good idea to have at all.
A Kafka system would be able to provide a vast stream of data that would have 
to be routed to a PLCs.
On the one side, I haven't seen a single usecase of someone writing to a PLC 
with PLC4X and I doubt such a scenario is ideal for the Kafka setup.
A PLC is not able to consume a lot of data until it chokes and stops working.


So in a first round I'll concentrate on implementing a robust Scraper-based 
PLC4X Kafka Connect Source, but drop the Sink entirely.

If we find out that a Sink makes sense we'll be able to add it in a later 
version.

What do you think?

Chris


Am 02.08.19, 11:24 schrieb "Julian Feinauer" <j.feina...@pragmaticminds.de>:

    Hi,
    
    I think that's the best way then.
    
    I agree with your suggestion.
    
    Julian
    
    Von meinem Mobiltelefon gesendet
    
    
    -------- Ursprüngliche Nachricht --------
    Betreff: Re: [KAFKA] Refactoring the Kafka Connect plugin?
    Von: Christofer Dutz
    An: dev@plc4x.apache.org
    Cc:
    
    Hi all,
    
    Ok so it seems that Kafka only supports configuration via simple 
unstructured maps.
    So if we want some hierarchical configuration like the proposed one, we'll 
have to do it log4j.properties-style:
    
    name=plc-source-test
    connector.class=org.apache.plc4x.kafka.Plc4xSourceConnector
    
    defaults.default-topic=some/default
    
    sources.machineA.url=s7://1.2.3.4/1/1
    sources.machineA.jobs.s7-dashboard.enabled=true
    sources.machineA.jobs.s7-heartbeat.enabled=true
    sources.machineA.jobs.s7-heartbeat.topic=heartbeat
    
    sources.machineB.url=s7://10.20.30.40/1/1
    sources.machineB.topic=heartbeat
    sources.machineB.jobs.s7-heartbeat.enabled=true
    
    sources.machineC.url=ads://1.2.3.4.5.6
    sources.machineC.topic=heartbeat
    sources.machineC.jobs.ads-heartbeat.enabled=true
    
    jobs.s7-dashboard.scrapeRate=500
    jobs.s7-dashboard.fields.inputPreasure=%DB.DB1.4:INT
    jobs.s7-dashboard.fields.outputPreasure=%Q1:BYT
    jobs.s7-dashboard.fields.temperature=%I3:INT
    
    jobs.s7-heartbeat.scrapeRate=1000
    jobs.s7-heartbeat.fields.active=%I0.2:BOOL
    
    jobs.ads-heartbeat.scrapeRate=1000
    jobs.ads-heartbeat.active=Main.running
    
    
    Chris
    
    
    
    Am 01.08.19, 15:46 schrieb "Christofer Dutz" <christofer.d...@c-ware.de>:
    
        Hi Kai,
    
        that document is exactly the one I'm currently using.
        What I'm currently working on is updating the current plugin to not 
schedule and handle the connection stuff manually, but use the scraper 
component of PLC4X.
        Also is the current configuration not production ready and I'll be 
working on to make it more easily usable.
    
        But it will definitely not hurt to have some Kafka Pro have a look at 
what we did and propose improvements. After all we want the thing to be 
rock-solid :-)
    
        Chris
    
    
    
        Am 31.07.19, 17:03 schrieb "Kai Wähner" <megachu...@gmail.com>:
    
            Hi Chris,
    
            great that you will work on the connector.
    
            I am not deep technical, but if you need guidance from Kafka Connect
            experts, I can connect you to a Confluent colleague to can help 
with best
            practices for building the connector.
    
            For example, we have implemented a wildcard option into our MQTT 
Connector
            to map MQTT Topics to Kafka Topics in a more flexible way (e.g. 
1000s of
            cars from different MQTT Topics can be routed into 1 Kafka Topic). 
This
            might also be interesting for this connector as you expect to 
various PLCs.
    
            This guide might also help:
            
https://www.confluent.io/wp-content/uploads/Verification-Guide-Confluent-Platform-Connectors-Integrations.pdf
    
    
    
            On Wed, Jul 31, 2019 at 4:39 PM Christofer Dutz 
<christofer.d...@c-ware.de>
            wrote:
    
            > Hi all,
            >
            > I am currently planning on cleaning up the Kafka Connect adapter 
a little
            > as this was implemented as part of a proof of concept and is 
still I a
            > state I wouldn’t use in production ;-)
            > But a lot has happened since then and I’m planning on making it a 
really
            > usable tool in the next few days.
            >
            > A lot has changed since we created the integration module QT3 
2018 and I
            > would like to refactor it to use the Scraper for the heavy 
lifting.
            >
            > Currently a user has to provide a parameter “query” which 
contains a
            > comma-separated list of connection-strings with appended address. 
This is
            > purely unmanageable.
            >
            > I would like to make it configurable via JSON or Yaml file.
            >
            > I think it would make sense to define groups of fields that are 
collected
            > on one device at an equal rate. So it’s pretty similar to the 
scraper
            > example, however I would like to not specify the source in the 
job, but the
            > other way around.
            > When specifying the “sources” I would also provide which jobs 
should run
            > on a given collection.
            > As the connector was initially showcased in a scenario where data 
had to
            > be collected on a big number of PLCs with equal specs,
            > I think this is the probably most important use-case and in this 
it is
            > also probably more common to add new devices to collect standard 
data on
            > than the other way around.
            >
            > Also should we provide the means to also set per connection to 
which
            > kafka-topic the data should be sent to.
            > We could provide the means to set a default and make it optional 
however.
            > When posting to a topic we also need to provide means for 
partitioning, so
            > I would provide sources with an optional “name”.
            > Each message would not only have the data requested, but also the
            > source-url, source-name and the job-name with a timestamp.
            >
            > So I guess it would look something like this:
            >
            > #
            > 
----------------------------------------------------------------------------
            > # Licensed to the Apache Software Foundation (ASF) under one
            > # or more contributor license agreements.  See the NOTICE file
            > # distributed with this work for additional information
            > # regarding copyright ownership.  The ASF licenses this file
            > # to you under the Apache License, Version 2.0 (the
            > # "License"); you may not use this file except in compliance
            > # with the License.  You may obtain a copy of the License at
            > #
            > #    http://www.apache.org/licenses/LICENSE-2.0
            > #
            > # Unless required by applicable law or agreed to in writing,
            > # software distributed under the License is distributed on an
            > # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
            > # KIND, either express or implied.  See the License for the
            > # specific language governing permissions and limitations
            > # under the License.
            > #
            > 
----------------------------------------------------------------------------
            > ---
            > # Defaults used throughout all collections
            > defaults:
            >   # If not specified, all data goes to this topic (optional)
            >   default-topic: some/default
            >
            > # Defines connections to PLCs
            > sources:
            >   # Connection to a S7 device
            >  - name: machineA
            >     # PLC4X connection URL
            >     url: s7://1.2.3.4/1/1
            >     jobs:
            >       # Just references the job "s7-dashboard". All data will be 
published
            > to the default topic
            >       - name: s7-dashboard
            >       # References the job "s7-heartbeat", however is configures 
the
            > output to go to the topic "heartbeat"
            >       - name: s7-heartbeat
            >         topic: heartbeat
            >
            >   # Connection to a second S7 device
            >   - name: machineB
            >     url: s7://10.20.30.40/1/1
            >     # Sets the default topic for this connection. All jobs data 
will go to
            > "heartbeat"
            >     topic: heartbeat
            >     jobs:
            >       - s7-heartbeat
            >
            >   # Connection to a Beckhoff device
            >   - name: machineC
            >     url: ads://1.2.3.4.5.6
            >     topic: heartbeat
            >     jobs:
            >       - ads-heartbeat
            >
            > # Defines what should be collected how often
            > jobs:
            >   # Defines a job to collect a set of fields on s7 devices every 
500ms
            >   - name: s7-dashboard
            >     scrapeRate: 500
            >     fields:
            >       # The key will be used in the Kafka message to identify 
this field,
            > the value here contains the PLC4X address
            >       inputPreasure: %DB.DB1.4:INT
            >       outputPreasure: %Q1:BYTE
            >       temperature: %I3:INT
            >
            >   # Defines a second job to collect a set of fields on s7 devices 
every
            > 1000ms
            >   - name: s7-heartbeat
            >     scrapeRate: 1000
            >     fields:
            >       active: %I0.2:BOOL
            >
            >   # Defines a third job that collects data on Beckhoff devices
            >   - name: ads-heartbeat
            >     scrapeRate: 1000
            >     fields:
            >       active: Main.running
            >
            > I think it should be self-explanatory with my comments inline.
            >
            > What do you think?
            >
            > Chris
            >
            >

Re: [KAFKA] Refactoring the Kafka Connect plugin?

Reply via email to