Re: Nifi integration record oriented processor for reading

Iñigo Angulo Mon, 05 Jul 2021 03:38:35 -0700

Hi Otto,

thank you for the review of the PR and your comments.


We have answered them, and will review the sections you suggested. I would like 
to know if there are any things you want us to priorize. Maybe you have some 
deadlines for the next release or something, and we could try to get things 
done by this time. If you miss some task we could do here before doing the 
merge, please let us know.

iñigo

----------------------------------------- 
Iñigo Angulo 

ZYLK.net :: consultoría.openSource 
telf.: 747412337 
Ribera de Axpe, 11 
Edificio A, modulo 201-203 
48950 Erandio (Bizkaia) 
-----------------------------------------

----- Mensaje original -----
De: "ottobackwards" <[email protected]>
Para: "dev" <[email protected]>
Enviados: Jueves, 24 de Junio 2021 14:40:12
Asunto: Re: Nifi integration record oriented processor for reading

Great,
I’m doing the review, so you should mark it ready :)



From: Iñigo Angulo <[email protected]> <[email protected]>
Reply: [email protected] <[email protected]> <[email protected]>
Date: June 24, 2021 at 04:52:19
To: dev <[email protected]> <[email protected]>
Subject:  Re: Nifi integration record oriented processor for reading

Hi Otto,

I have opened the new pull request, from a dedicated branch in our fork,
and a commit per contributor, the team members that have been working on
this.

Please take a look, and let me know if you want me to mark it as ready to
review.

iñigo

----------------------------------------- br/>Iñigo Anngulo br/> <
ZYLK.net :: consultoría.openSource br/>telf.: 7474123337 br/>Ribera de
Axpe, 11 br/>Edificio A, modulo 201-203 br/>48950 Erandio (Bizkaia)
br/>----------------------------------------- <

----- Mensaje original -----
De: "ottobackwards" <[email protected]>
Para: "dev" <[email protected]>
Enviados: Jueves, 17 de Junio 2021 16:38:13
Asunto: Re: Nifi integration record oriented processor for reading

That sounds great.

> On Jun 17, 2021, at 10:14, Iñigo Angulo <[email protected]> wrote:
> br/>> Hi Otto, <
> br/>> So we did this workaround, we re-forked the reepo and added the
changes of the nifi-record feature on a dedicated branch. I closed the
previous PR, will upload the new one on the following days (we are doing
some last test just in case we forgot to copy something...)
> br/>> I will keep you informed. <
> br/>> regards, <
> iñigo
> br/>> ----------------------------------------- br/>> Iñigo Angulo br/>>
br/>> ZYLK.net :: consultoría.openSource br/>> telf.: 747412337 br/>>
Ribera de Axpe, 11 br/>> Edificio A, modulo 201-203 br/>> 48950 Erandio
(Bizkaia) br/>> ----------------------------------------- <
> br/>> ----- Mensaje original ----- <
> De: "ottobackwards" <[email protected]>
> Para: "dev" <[email protected]>
> Enviados: Jueves, 10 de Junio 2021 16:14:54
> Asunto: Re: Nifi integration record oriented processor for reading
> br/>> Yes, that sounds great. <
> Thank you
> br/>>> On Jun 10, 2021, at 03:43, Iñigo Anggulo <[email protected]>
wrote:
>> br/>>> Hi Otto, <
>> br/>>> I have tried to do the rebase of our ccommits, but I am having
difficulties at it... br/>>> br/>>> The issue is: we forked the repo on
september 2020, and started making tests and commits to our fork (on
'develop' branch). Now I am trying to do a rebase (using git rebase -i ID)
specifying the ID of our first commit. But when the file is open for
interactive mode, it gets ~900 commits on the develop branch (belonging to
members of the PLC4X community). I think this happens because before
opening the PullRequest, I did a 'merge upstream' with the actual PLC4X
repo, to get the updates of the code. So, I understand that in the
interactive mode file, I have to leave all commits to 'pick' (by default),
and change our commits of the nifi feature to 'squash' except from the
first one (which also remains as 'pick'). However, when I tried this, many
conflicts appear, almost one per each commit (comunity members' commits)...
>> I may be doing something wrong (never did a rebase before...) and I
prefered just to ask, as I dont want to break or cause any conflict to the
repo code.. If you see anything Im missing please let me know.
>> br/>>> As a workaround, I was thinking we coulld close the PR, re-do the
fork of the PLC4X repo, and add the changes to the code on a dedicated
'feature-nifi-record' branch. Maybe this could make things clearer...
>> What do you think?
>> br/>>> thank you, <
>> iñigo
>> br/>>> ------------------------------------------ br/>>> Iñigo Angulo
br/>>> br/>>> ZYLK.net :: consultoría.openSource br/>>> telf.: 747412337
br/>>> Ribera de Axpe, 11 br/>>> Edificio A, modulo 201-203 br/>>> 48950
Erandio (Bizkaia) br/>>> ----------------------------------------- <
>> br/>>> ----- Mensaje original ----- <
>> De: "ottobackwards" <[email protected]>
>> Para: "dev" <[email protected]>
>> Enviados: Jueves, 27 de Mayo 2021 16:10:19
>> Asunto: Re: Nifi integration record oriented processor for reading
>> br/>>> Awesome. <
>> br/>>> If you can, can I ask you to: <
>> br/>>> 1. Mark the PR as ready to review in github
>> 2. rebase or squash it to a single commit and force push to your branch
to clean it up
>> br/>>> br/>>> br/>>>> On May 27, 2021, at 06:45, Iñigo Angulo
<[email protected]> wrote:
>>> br/>>>> Hi Otto, Chris <
>>> br/>>>> we have finally commited the uupdates on the Nifi Processor to
the Pull Request. The changes we have done are the following:
>>> - deducing avro datatypes from PlcResponse. Here we may check the
method (org.apache.plc4x.nifi.util.Plc4xCommon.createSchema()), in order to
see if it is the best way to do it.
>>> - we have added in the plc4j/pom.xml an "ignoredDependency" for
org.apache.nifi:nifi-standard-nar, as it is used on runtime and was rising
errors during compilation.
>>> - we have changed onScheduled method in Plc4xSourceRecordProcessor
(comparing to BaseProcessor), as we have included the posibility to have an
input connection into the processor, and indicate the target addressMap
through flowfile attributes. The addressMap is now created in the onTrigger
method.
>>> - we have tested the performance with S7 and Modbus protocols (using a
Siemens S7-1200 and Schneider M221). We will upload an updated nifi
template for both protocols, but regarding this, do you have any testing
environment to simulate PLCs? If that is the case, we could prepare the
Processors configuration to match these ones (connection Strings and
addressMaps).
>>> br/>>>> Please take a look at the code,, any suggestion will be very
welcome.
>>> br/>>>> iñigo <
>>> br/>>>> ------------------------------------------ br/>>>> Iñigo Angulo
br/>>>> br/>>>> ZYLK.net :: consultoría.openSource br/>>>> telf.: 747412337
br/>>>> Ribera de Axpe, 11 br/>>>> Edificio A, modulo 201-203 br/>>>> 48950
Erandio (Bizkaia) br/>>>> ----------------------------------------- <
>>> br/>>>> ----- Mensaje original ----- <
>>> De: "Iñigo Angulo" <[email protected]>
>>> Para: "dev" <[email protected]>
>>> Enviados: Miércoles, 12 de Mayo 2021 14:42:22
>>> Asunto: Re: Nifi integration record oriented processor for reading
>>> br/>>>> Hi Otto, Chris, <
>>> br/>>>> we have been working on the prrocessor to include the logic of
getting the values for variables from the PLC response. We tested it with
the S7-1200 and seems to work fine, however we would like to make some
further tests before commiting it.
>>> br/>>>> Regarding the actual method whiich takes the datatype from the
response object, we did it in the following way:
>>> br/>>>> //PlcReadResponse readResponse
>>> Map<String, ? extends PlcValue> responseDataStructure =
readResponse.getAsPlcValue().getStruct();
>>> for (Map.Entry<String, ? extends PlcValue> entry :
responseDataStructure.entrySet()) {
>>> br/>>>> if (entry.getValue() instanceoof PlcINT) { br/>>>> 
>>> builder.name(fielldName).type().unionOf().nullBuilder().endNull().and().intType().endUnion().noDefault();

>>> }else if (entry.getValue() instanceof PlcREAL) {
>>> builder.name(fieldName).type().unionOf().nullBuilder().endNull().and().doubleType().endUnion().noDefault();

>>> } ... and so on for the rest of the classes on the package
(org.apache.plc4x.java.spi.values.*)
>>> br/>>>> //the builder object is used tto build avro schema, with
desired datatypes (for example intType())
>>> }
>>> br/>>>> br/>>>> Is this the solution you had in mind?? If you think
there is a better way to access PlcValues, please let us know and we will
update it.
>>> br/>>>> We will upload the code soon soo that you can take a deeper
look.
>>> br/>>>> thank you!!
>>> iñigo
>>> br/>>>> ------------------------------------------ br/>>>> Iñigo Angulo
br/>>>> br/>>>> ZYLK.net :: consultoría.openSource br/>>>> telf.: 747412337
br/>>>> Ribera de Axpe, 11 br/>>>> Edificio A, modulo 201-203 br/>>>> 48950
Erandio (Bizkaia) br/>>>> ----------------------------------------- <
>>> br/>>>> ----- Mensaje original ----- <
>>> De: "ottobackwards" <[email protected]>
>>> Para: "dev" <[email protected]>
>>> Enviados: Viernes, 30 de Abril 2021 14:50:11
>>> Asunto: Re: Nifi integration record oriented processor for reading
>>> br/>>>> Sounds like a good plan!!
>>> br/>>>>> On Apr 30, 2021, at 05:26, Iñigo Angulo
<[email protected]> wrote:
>>>> br/>>>>> Hi Otto, Chris, <
>>>> br/>>>>> we have been reviewingg the comments on the pull request, and
started to think about the approach of extracting values from response
directly. During next week, we will work on this and make some updates to
the code with the suggestions you made. We keep you informed
>>>> br/>>>>> thank you, <
>>>> br/>>>>> iñigo <
>>>> br/>>>>> ------------------------------------------ br/>>>>> Iñigo
Angulo br/>>>>> br/>>>>> ZYLK.net :: consultoría.openSource br/>>>>> telf.:
747412337 br/>>>>> Ribera de Axpe, 11 br/>>>>> Edificio A, modulo 201-203
br/>>>>> 48950 Erandio (Bizkaia) br/>>>>>
----------------------------------------- <
>>>> br/>>>>> ----- Mensaje originall -----
>>>> De: "Christofer Dutz" <[email protected]>
>>>> Para: "dev" <[email protected]>
>>>> Enviados: Viernes, 23 de Abril 2021 18:10:35
>>>> Asunto: AW: AW: Nifi integration record oriented processor for reading
>>>> br/>>>>> Hi Inigo, <
>>>> br/>>>>> especially if you havee a look at the KNX protocol. This
doesn't define the usual IEC datatypes we tried to use for all normal PLC
drivers.
>>>> Here we have hundreds of datatypes that don't match any other
protocol. I think the PLCValue approach would be the simplest.
>>>> The one thing you have to keep in mind, is that you should check a
PLCValue, if it's a list (Array type) or a Structure (Which sort of relates
to komplex types with a more sophisticated structure).
>>>> br/>>>>> Chris <
>>>> br/>>>>> br/>>>>> -----Ursprüngliche Nachricht----- <
>>>> Von: Iñigo Angulo <[email protected]> br/>>>>> Gesendet:
FFreitag, 23. April 2021 15:34
>>>> An: dev <[email protected]>
>>>> Betreff: Re: AW: Nifi integration record oriented processor for
reading
>>>> br/>>>>> Hi Otto, Chris, <
>>>> br/>>>>> Yes, I think the approoach you propose will be best. By now,
we are generating the schema ourselves. We have a record writer who is in
charge of reading PLC values. Schema is defined previously to reading the
values. We build this schema getting the protocol from the
'connectionString' (S7, Modbus) and the specified variable type from the
'PLC resource address String' containing the list of variable to read. From
this we deduce the expected Avro datatype when reading, for instance, a
word in S7 or a coil in Modbus. br/>&ggt;>>> br/>>>>> However, as you
mentioned, the oother approach will be much clearer and useful. Ideally,
getting the actual datatype from PLCValue when getting the response.
Regarding this, we tried to keep the previously described 'mapping'
separated from the rest of the code, so that hopefully it can be easily
replaced..
>>>> br/>>>>> We have done the pull rrequest, hope you can take a look at
the code and let us know what you think. We will fill the ICLA document
too.
>>>> br/>>>>> thank you <
>>>> iñigo
>>>> br/>>>>> br/>>>>> br/>>>>> ----------------------------------------- <
>>>> Iñigo Angulo br/>>>>> br/>>>>> ZYLK.net :: consultoría.openSource <
>>>> telf.: 747412337
>>>> Ribera de Axpe, 11
>>>> Edificio A, modulo 201-203
>>>> 48950 Erandio (Bizkaia)
>>>> -----------------------------------------
>>>> br/>>>>> ----- Mensaje original -----
>>>> De: "Christofer Dutz" <[email protected]>
>>>> Para: "dev" <[email protected]>
>>>> Enviados: Jueves, 22 de Abril 2021 17:12:49
>>>> Asunto: AW: Nifi integration record oriented processor for reading
>>>> br/>>>>> Hi all, <
>>>> br/>>>>> Well, you get PlcValuees from the response that wrap the
different datatypes. So generally you shouldn't care about the detail type.
>>>> br/>>>>> However, you can call ggetObject() which returns the core
value the plc-value has ... so if it's the PLCValue for a Short, getObject
will return a short value.
>>>> br/>>>>> Does that help??
>>>> br/>>>>> Chris <
>>>> br/>>>>> br/>>>>> -----Ursprüngliche Nachricht----- <
>>>> Von: Otto Fowler <[email protected]>
>>>> Gesendet: Donnerstag, 22. April 2021 15:21
>>>> An: [email protected]
>>>> Betreff: Re: Nifi integration record oriented processor for reading
>>>> br/>>>>> So, you are generating the schema yourself, such that
downstream if they inherit schema they will just get what you generate? And
you are trying to do that by the connection string? If so, a different way
I could imagine doing would be to get the ’types’ of the data from the
responses themselves. This would be more generic. The flow I could imagine
( in OnTrigger ):
>>>> br/>>>>> DO READ <
>>>> IF NOT HAS SCHEMA
>>>> GENERATE SCHEMA FROM RESPONSE AND CACHE IN ATOMIC WRITE WITH SCHEMA
>>>> br/>>>>> Maybe Chris can speak tto how to get the types from the
responses.
>>>> br/>>>>> br/>>>>>> On Apr 22, 2021, at 05:48, Iñigo Anguulo
<[email protected]> wrote:
>>>>> br/>>>>>> Hi Chris, Otto,,
>>>>> br/>>>>>> Regarding the RRecord Processor concept, i will try to give
an overview. In Nifi, information packages are called Flowfiles, and these
are the actual units of information that are exchanged between Procesors,
all along the dataflow. Flowfiles have two sections where we can manage
data: Attributes and Content. In the "traditional" Nifi approach, you work
with both sections, extracting information from the Content to the
Attributes and viceversa to perform operations. This approach could have
one limitation when you are processing batch data (lines from a CSV file
for instance), where you need to split each of the lines into different
Flowfiles. Thus, a 1000 line CSV file leads to 1000 Flowfiles to process,
each of them containing a single record.
>>>>> br/>>>>>> On later versioons of the product, they introduced the
Record oriented approach. This approach allows you to manage multiple
records on a single Flowfile's Content, as long as these records have all
the same schema. This means that the operations defined by the Processors
are applied simultaneously to the whole content at once. Following with the
previous example, a 1000 line CSV file could produce a single Flowfile with
a content of 1000 records. br/>>>>>> br/>>>>>> To do this, Nifi uses Avro,
to serialize thee Flowfile's Content. Then, the Record Oriented Processors
use Writers and Readers to present this information in the desired format
(such as Avro, Json, CSV, etc). Basically, with the record oriented
approach, Nifi introduced multiple new Processors, and also included the
Record version of many of the "old" ones. Using this Record approach, Nifi
perfomance enhances notably, specially when working with large structured
information.
>>>>> br/>>>>>> The work we didd was creating a Record Oriented Processor,
based on the previously existing one Plc4xSourceProcessor, to read values
from the devices. We have also included a README on the
plc4x/plc4j/integrations/apache-nifi module explaining the Processor
configuration and giving an example. Moreover, we put a nifi template with
a dataflow for testing these processors, if useful.
>>>>> br/>>>>>> Otto, regardingg the idea behind this new Processor, that
is right. We added the writer capability to the existing
PLC4XSourceProcessor, so that it formats the output to the desired
configuration in a record manner. At the actual implementation, we did this
"protocol adaptation" from the sintax of the particular properties on
Processor's configuration. For example, from connection string
's7://IP:PORT', we extract the S7 idenifier and map variable datatypes to
the actual Avro datatypes for build the record output schema. However, here
we dont have vast experience with PLC4X libraries, and for sure there will
be better ways for doing this.
>>>>> Also about the Base Processor, we were thinking that maybe the best
approach could be to have this Base Processor, and then implement readers
for particular protocols as Controller Services. But here also, it could be
very helpful to have your opinion.
>>>>> br/>>>>>> Lastly, regardiing the pull request, do you have any
documentation on br/>>>&ggt;>> how to do this? I mean, maybe you have
defined some naming br/>>>>>> conventions, or expected structure to
faciliitate later work. At the br/>>>>>> present, we have a fork of the
project where we have been working on br/>>>>&ggt;> these Nifi changes. We
updated the content of our fork (fetch/merge
>>>>> upstream) about 2 weeks ago, and commited our changes to the
'develop' br/>>>>>> branch. Do we bettter create a new branch with our
commits? how do you br/>>>&>>> prefer to receive the code? (we are not very
experts on git, just in br/>>>>>> case we could cause some problemss...)
>>>>> br/>>>>>> thank you in addvance
>>>>> br/>>>>>> iñigo <
>>>>> br/>>>>>> br/>>>>>> ----------------------------------------- <
>>>>> Iñigo Angulo
>>>>> br/>>>>>> ZYLK.net :: connsultoría.openSource
>>>>> telf.: 747412337
>>>>> Ribera de Axpe, 11
>>>>> Edificio A, modulo 201-203
>>>>> 48950 Erandio (Bizkaia)
>>>>> -----------------------------------------
>>>>> br/>>>>>> ----- Mensaje ooriginal -----
>>>>> De: "Christofer Dutz" <[email protected]>
>>>>> Para: "dev" <[email protected]>
>>>>> Enviados: Miércoles, 21 de Abril 2021 20:01:15
>>>>> Asunto: AW: Nifi integration record oriented processor for reading
>>>>> br/>>>>>> The more I thinnk of it,
>>>>> br/>>>>>> Perhaps we shouuld also think of potentially providing some
information on supported configuration options.
>>>>> Wouldn't it be cool if the driver could say: "I generally have these
options and they have these datatypes and mean this"
>>>>> Additionally, the transports could too say: "I generally have these
options and they have these datatypes and mean this"
>>>>> br/>>>>>> I would be our StreamPipes friends would love something
like that? Right?
>>>>> br/>>>>>> Chris <
>>>>> br/>>>>>> br/>>>>>> -----Ursprüngliche Nachricht----- <
>>>>> Von: Otto Fowler <[email protected]>
>>>>> Gesendet: Mittwoch, 21. April 2021 17:46
>>>>> An: [email protected]
>>>>> Betreff: Re: Nifi integration record oriented processor for reading
>>>>> br/>>>>>> Hi Inigo, <
>>>>> br/>>>>>> I’m a coommitter on Apache Nifi as well as PLC4X, I would
be happy to review your processor.
>>>>> If I understand what you are saying correctly, you have a single
processor which supports record writing output?
>>>>> br/>>>>>> plc4x -> reccords
>>>>> br/>>>>>> And that you haave, for configuration purposes for that
processor created support on a per protocol basis for configuration and
validation?
>>>>> br/>>>>>> If there is perr protocol configuration / validation etc,
it may be better to have a base processor, and derived processors per
protocol to handle those differences.
>>>>> br/>>>>>> I look forward to seeing the code.
>>>>> br/>>>>>> br/>>>>>>> On Apr 21, 2021, at 04:05, Iñigo Angulo
<[email protected]> wrote:
>>>>>> br/>>>>>>> Hi all,,
>>>>>> br/>>>>>>> I am wrriting as we have been working on the Apache Nifi
integration part of the project. We have created a Record oriented
processor for reading PLC data. It is based on the previous existing
SourceProcessor, but works with records, using a Nifi Writer (such as Avro,
Json, and so on) to write data on flowfiles content. br/>>>>>>> br/>>>>>>>
We updated the code on our fork with thee actual PLC4X git repo about 2
weeks ago, and tested it reading values with S7 from a S7-1200 CPU from
Nifi. Also, one of our customers has recently started to use it for
validation. br/>>>>>>> br/>>>>>>> Currently, it works with S7 and Modbus
oover TCP. This is because we had to write some classes to map
connectionString and variableList properties (sintax) of the processor to
the actual protocol, to be able to build then avro schema for output
flowfile, taking into account variable datatypes, etc. We only did this for
S7 and Modbus. I am sure that there is a better way to do this, so at this
point you maybe could take a look to find the best solution and avoid
needing to do this mapping. br/>>>>>>> br/>>>>>>> If you find this useful,
we could do a ppull request to the main PLC4x repo. Let us know what you
think. br/>>>>>>>> br/>>>>>>> best regards,
>>>>>> iñigo
>>>>>> br/>>>>>>> ------------------------------------------
>>>>>> Iñigo Angulo
>>>>>> br/>>>>>>> ZYLK.neet :: consultoría.openSource
>>>>>> telf.: 747412337
>>>>>> Ribera de Axpe, 11
>>>>>> Edificio A, modulo 201-203
>>>>>> 48950 Erandio (Bizkaia)
>>>>>> -----------------------------------------

Re: Nifi integration record oriented processor for reading

Reply via email to