[DISCUSS] General configuration strategy for scraper based integrations

Christofer Dutz Tue, 11 Feb 2020 07:01:21 -0800

Hi all,

I would like to bring up a topic which I would like to address in the next few 
days.


We currently have multiple scraper-based integration modules. All of these need 
configuration.
However the configuration strategies differ slightly. I think it would be good 
to unify them.
Even if I know there will be slight differences for every integration, I think 
the general concept should be the same.

The Logstash integration follows a strategy which is directly linked to the 
scrapers internal structure:

## logstash pipeline config - input
input {
   ## use plc4x plugin (logstash-input-plc4x)
   plc4x {
      ## define sources (opc-ua examples)
      sources => {
         source1 => "opcua:tcp://opcua-server:4840/"
         source2 => "opcua:tcp://opcua-server1:4840/"
      }
      ## define jobs
      jobs => {
         job1 => {
            # pull rate in milliseconds
            rate => 1000
            # sources queried by job1
            sources => ["source1"]
            # defined queries [logstash_internal_fieldname => "IIoT query"]
            queries =>  {
               PreStage => "ns=2;i=3"
               MidStage => "ns=2;i=4"
               PostStage => "ns=2;i=5"
               Motor => "ns=2;i=6"
               ConvoyerBeltTimestamp => "ns=2;i=7"
               RobotArmTimestamp => "ns=2;i=8"
            }
         }
      }
   }
}

For the Kafka Connect adapter I took a slightly different route:


name=plc-0
connector.class=org.apache.plc4x.kafka.Plc4xSourceConnector
default-topic=machineData
tasks.max=2

sources=machineA

sources.machineA.connectionString=s7://10.10.64.20
sources.machineA.jobReferences=s7-dashboard,s7-heartbeat
sources.machineA.jobReferences.s7-heartbeat.topic=heartbeat

jobs=s7-dashboard,s7-heartbeat

jobs.s7-dashboard.interval=1000
jobs.s7-dashboard.fields=running,conveyorEntry,load,unload,transferLeft,transferRight,conveyorLeft,conveyorRight,numLargeBoxes,numSmallBoxes
jobs.s7-dashboard.fields.running=%DB3.DB31.0:BOOL
jobs.s7-dashboard.fields.conveyorEntry=%Q0.0:BOOL
jobs.s7-dashboard.fields.load=%Q0.1:BOOL
jobs.s7-dashboard.fields.unload=%Q0.2:BOOL
jobs.s7-dashboard.fields.transferLeft=%Q0.3:BOOL
jobs.s7-dashboard.fields.transferRight=%Q0.4:BOOL
jobs.s7-dashboard.fields.conveyorLeft=%Q0.5:BOOL
jobs.s7-dashboard.fields.conveyorRight=%Q0.6:BOOL
jobs.s7-dashboard.fields.numLargeBoxes=%DB3.DBW32:INT
jobs.s7-dashboard.fields.numSmallBoxes=%DB3.DBW34:INT

jobs.s7-heartbeat.interval=500
jobs.s7-heartbeat.fields=active
jobs.s7-heartbeat.fields.active=%DB3.DB31.0:BOOL

bootstrap.servers=localhost:9092
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000

The main difference is that in Logstash the “sources” are somewhat “dumb” and 
they are referenced from the jobs.
In my case the sources are a little heavier as they reference which jobs they 
should be used with.

The reason I did this, was that I thought the case of adding a new PLC into the 
picture would then be to define the source and then add the jobs we want to 
collect on this source.

I guess the other philosophy is that the PLCs are sort of static and the Jobs 
are subject to frequent change.

I personally like the option where the source tells which jobs to run on, but I 
guess Julian and his team prefer the other (as they built the scraper that way) 
…

So I would like to hear some general feedback on which way we should be going 
and then I’ll make sure all integrations use a similar strategy.

Chris

[DISCUSS] General configuration strategy for scraper based integrations

Reply via email to