Re: Nifi integration record oriented processor for reading

Iñigo Angulo Thu, 10 Jun 2021 00:44:18 -0700

Hi Otto,

I have tried to do the rebase of our commits, but I am having difficulties at 
it...


The issue is: we forked the repo on september 2020, and started making tests 
and commits to our fork (on 'develop' branch). Now I am trying to do a rebase 
(using git rebase -i ID) specifying the ID of our first commit. But when the 
file is open for interactive mode, it gets ~900 commits on the develop branch 
(belonging to members of the PLC4X community). I think this happens because 
before opening the PullRequest, I did a 'merge upstream' with the actual PLC4X 
repo, to get the updates of the code. So, I understand that in the interactive 
mode file, I have to leave all commits to 'pick' (by default), and change our 
commits of the nifi feature to 'squash' except from the first one (which also 
remains as 'pick'). However, when I tried this, many conflicts appear, almost 
one per each commit (comunity members' commits)...
I may be doing something wrong (never did a rebase before...) and I prefered 
just to ask, as I dont want to break or cause any conflict to the repo code.. 
If you see anything Im missing please let me know.

As a workaround, I was thinking we could close the PR, re-do the fork of the 
PLC4X repo, and add the changes to the code on a dedicated 
'feature-nifi-record' branch. Maybe this could make things clearer...
What do you think?

thank you,
iñigo

----------------------------------------- 
Iñigo Angulo 

ZYLK.net :: consultoría.openSource 
telf.: 747412337 
Ribera de Axpe, 11 
Edificio A, modulo 201-203 
48950 Erandio (Bizkaia) 
-----------------------------------------

----- Mensaje original -----
De: "ottobackwards" <[email protected]>
Para: "dev" <[email protected]>
Enviados: Jueves, 27 de Mayo 2021 16:10:19
Asunto: Re: Nifi integration record oriented processor for reading

Awesome.

If you can, can I ask you to:

1.  Mark the PR as ready to review in github
2. rebase or squash it to a single commit and force push to your branch to 
clean it up



> On May 27, 2021, at 06:45, Iñigo Angulo <[email protected]> wrote:
> 
> Hi Otto, Chris
> 
> we have finally commited the updates on the Nifi Processor to the Pull 
> Request. The changes we have done are the following:
> - deducing avro datatypes from PlcResponse. Here we may check the method 
> (org.apache.plc4x.nifi.util.Plc4xCommon.createSchema()), in order to see if 
> it is the best way to do it.
> - we have added in the  plc4j/pom.xml an "ignoredDependency" for 
> org.apache.nifi:nifi-standard-nar, as it is used on runtime and was rising 
> errors during compilation.
> - we have changed onScheduled method in Plc4xSourceRecordProcessor (comparing 
> to BaseProcessor), as we have included the posibility to have an input 
> connection into the processor, and indicate the target addressMap through 
> flowfile attributes. The addressMap is now created in the onTrigger method.
> - we have tested the performance with S7 and Modbus protocols (using a 
> Siemens S7-1200 and Schneider M221). We will upload an updated nifi template 
> for both protocols, but regarding this, do you have any testing environment 
> to simulate PLCs? If that is the case, we could prepare the Processors 
> configuration to match these ones (connection Strings and addressMaps).
> 
> Please take a look at the code, any suggestion will be very welcome.
> 
> iñigo
> 
> ----------------------------------------- 
> Iñigo Angulo 
> 
> ZYLK.net :: consultoría.openSource 
> telf.: 747412337 
> Ribera de Axpe, 11 
> Edificio A, modulo 201-203 
> 48950 Erandio (Bizkaia) 
> -----------------------------------------
> 
> ----- Mensaje original -----
> De: "Iñigo Angulo" <[email protected]>
> Para: "dev" <[email protected]>
> Enviados: Miércoles, 12 de Mayo 2021 14:42:22
> Asunto: Re: Nifi integration record oriented processor for reading
> 
> Hi Otto, Chris,
> 
> we have been working on the processor to include the logic of getting the 
> values for variables from the PLC response. We tested it with the S7-1200 and 
> seems to work fine, however we would like to make some further tests before 
> commiting it.
> 
> Regarding the actual method which takes the datatype from the response 
> object, we did it in the following way:
> 
> //PlcReadResponse readResponse
> Map<String, ? extends PlcValue> responseDataStructure = 
> readResponse.getAsPlcValue().getStruct();
> for (Map.Entry<String, ? extends PlcValue> entry : 
> responseDataStructure.entrySet()) {
> 
>    if (entry.getValue() instanceof PlcINT) {                                  
>     
>        
> builder.name(fieldName).type().unionOf().nullBuilder().endNull().and().intType().endUnion().noDefault();
>    }else if (entry.getValue() instanceof PlcREAL) {
>        
> builder.name(fieldName).type().unionOf().nullBuilder().endNull().and().doubleType().endUnion().noDefault();
>    } ... and so on for the rest of the classes on the package 
> (org.apache.plc4x.java.spi.values.*)
> 
>    //the builder object is used to build avro schema, with desired datatypes 
> (for example intType())
> }
> 
> 
> Is this the solution you had in mind? If you think there is a better way to 
> access PlcValues, please let us know and we will update it.
> 
> We will upload the code soon so that you can take a deeper look.
> 
> thank you!
> iñigo
>       
> ----------------------------------------- 
> Iñigo Angulo 
> 
> ZYLK.net :: consultoría.openSource 
> telf.: 747412337 
> Ribera de Axpe, 11 
> Edificio A, modulo 201-203 
> 48950 Erandio (Bizkaia) 
> -----------------------------------------
> 
> ----- Mensaje original -----
> De: "ottobackwards" <[email protected]>
> Para: "dev" <[email protected]>
> Enviados: Viernes, 30 de Abril 2021 14:50:11
> Asunto: Re: Nifi integration record oriented processor for reading
> 
> Sounds like a good plan!
> 
>> On Apr 30, 2021, at 05:26, Iñigo Angulo <[email protected]> wrote:
>> 
>> Hi Otto, Chris,
>> 
>> we have been reviewing the comments on the pull request, and started to 
>> think about the approach of extracting values from response directly. During 
>> next week, we will work on this and make some updates to the code with the 
>> suggestions you made. We keep you informed
>> 
>> thank you,
>> 
>> iñigo
>> 
>> ----------------------------------------- 
>> Iñigo Angulo 
>> 
>> ZYLK.net :: consultoría.openSource 
>> telf.: 747412337 
>> Ribera de Axpe, 11 
>> Edificio A, modulo 201-203 
>> 48950 Erandio (Bizkaia) 
>> -----------------------------------------
>> 
>> ----- Mensaje original -----
>> De: "Christofer Dutz" <[email protected]>
>> Para: "dev" <[email protected]>
>> Enviados: Viernes, 23 de Abril 2021 18:10:35
>> Asunto: AW: AW: Nifi integration record oriented processor for reading
>> 
>> Hi Inigo,
>> 
>> especially if you have a look at the KNX protocol. This doesn't define the 
>> usual IEC datatypes we tried to use for all normal PLC drivers.
>> Here we have hundreds of datatypes that don't match any other protocol. I 
>> think the PLCValue approach would be the simplest.
>> The one thing you have to keep in mind, is that you should check a PLCValue, 
>> if it's a list (Array type) or a Structure (Which sort of relates to komplex 
>> types with a more sophisticated structure).
>> 
>> Chris
>> 
>> 
>> -----Ursprüngliche Nachricht-----
>> Von: Iñigo Angulo <[email protected]> 
>> Gesendet: Freitag, 23. April 2021 15:34
>> An: dev <[email protected]>
>> Betreff: Re: AW: Nifi integration record oriented processor for reading
>> 
>> Hi Otto, Chris,
>> 
>> Yes, I think the approach you propose will be best. By now, we are 
>> generating the schema ourselves. We have a record writer who is in charge of 
>> reading PLC values. Schema is defined previously to reading the values. We 
>> build this schema getting the protocol from the 'connectionString' (S7, 
>> Modbus) and the specified variable type from the 'PLC resource address 
>> String' containing the list of variable to read. From this we deduce the 
>> expected Avro datatype when reading, for instance, a word in S7 or a coil in 
>> Modbus. 
>> 
>> However, as you mentioned, the other approach will be much clearer and 
>> useful. Ideally, getting the actual datatype from PLCValue when getting the 
>> response. Regarding this, we tried to keep the previously described 
>> 'mapping' separated from the rest of the code, so that hopefully it can be 
>> easily replaced..
>> 
>> We have done the pull request, hope you can take a look at the code and let 
>> us know what you think. We will fill the ICLA document too.
>> 
>> thank you
>> iñigo
>> 
>> 
>> 
>> -----------------------------------------
>> Iñigo Angulo 
>> 
>> ZYLK.net :: consultoría.openSource
>> telf.: 747412337
>> Ribera de Axpe, 11
>> Edificio A, modulo 201-203
>> 48950 Erandio (Bizkaia)
>> -----------------------------------------
>> 
>> ----- Mensaje original -----
>> De: "Christofer Dutz" <[email protected]>
>> Para: "dev" <[email protected]>
>> Enviados: Jueves, 22 de Abril 2021 17:12:49
>> Asunto: AW: Nifi integration record oriented processor for reading
>> 
>> Hi all,
>> 
>> Well, you get PlcValues from the response that wrap the different datatypes. 
>> So generally you shouldn't care about the detail type.
>> 
>> However, you can call getObject() which returns the core value the plc-value 
>> has ... so if it's the PLCValue for a Short, getObject will return a short 
>> value.
>> 
>> Does that help?
>> 
>> Chris
>> 
>> 
>> -----Ursprüngliche Nachricht-----
>> Von: Otto Fowler <[email protected]>
>> Gesendet: Donnerstag, 22. April 2021 15:21
>> An: [email protected]
>> Betreff: Re: Nifi integration record oriented processor for reading
>> 
>> So, you are generating the schema yourself, such that downstream if they 
>> inherit schema they will just get what you generate?  And you are trying to 
>> do that by the connection string?  If so, a different way I could imagine 
>> doing would be to get the ’types’ of the data from the responses themselves. 
>>  This would be more generic.  The flow I could imagine ( in OnTrigger ):
>> 
>> DO READ
>> IF NOT HAS SCHEMA
>>      GENERATE SCHEMA FROM RESPONSE AND CACHE IN ATOMIC WRITE WITH SCHEMA
>> 
>> Maybe Chris can speak to how to get the types from the responses.
>> 
>> 
>>> On Apr 22, 2021, at 05:48, Iñigo Angulo <[email protected]> wrote:
>>> 
>>> Hi Chris, Otto,
>>> 
>>> Regarding the Record Processor concept, i will try to give an overview. In 
>>> Nifi, information packages are called Flowfiles, and these are the actual 
>>> units of information that are exchanged between Procesors, all along the 
>>> dataflow. Flowfiles have two sections where we can manage data: Attributes 
>>> and Content. In the "traditional" Nifi approach, you work with both 
>>> sections, extracting information from the Content to the Attributes and 
>>> viceversa to perform operations. This approach could have one limitation 
>>> when you are processing batch data (lines from a CSV file for instance), 
>>> where you need to split each of the lines into different Flowfiles. Thus, a 
>>> 1000 line CSV file leads to 1000 Flowfiles to process, each of them 
>>> containing a single record.
>>> 
>>> On later versions of the product, they introduced the Record oriented 
>>> approach. This approach allows you to manage multiple records on a single 
>>> Flowfile's Content, as long as these records have all the same schema. This 
>>> means that the operations defined by the Processors are applied 
>>> simultaneously to the whole content at once. Following with the previous 
>>> example, a 1000 line CSV file could produce a single Flowfile with a 
>>> content of 1000 records. 
>>> 
>>> To do this, Nifi uses Avro, to serialize the Flowfile's Content. Then, the 
>>> Record Oriented Processors use Writers and Readers to present this 
>>> information in the desired format (such as Avro, Json, CSV, etc). 
>>> Basically, with the record oriented approach, Nifi introduced multiple new 
>>> Processors, and also included the Record version of many of the "old" ones. 
>>> Using this Record approach, Nifi perfomance enhances notably, specially 
>>> when working with large structured information.
>>> 
>>> The work we did was creating a Record Oriented Processor, based on the 
>>> previously existing one Plc4xSourceProcessor, to read values from the 
>>> devices. We have also included a README on the 
>>> plc4x/plc4j/integrations/apache-nifi module explaining the Processor 
>>> configuration and giving an example. Moreover, we put a nifi template with 
>>> a dataflow for testing these processors, if useful.
>>> 
>>> Otto, regarding the idea behind this new Processor, that is right. We added 
>>> the writer capability to the existing PLC4XSourceProcessor, so that it 
>>> formats the output to the desired configuration in a record manner. At the 
>>> actual implementation, we did this "protocol adaptation" from the sintax of 
>>> the particular properties on Processor's configuration. For example, from 
>>> connection string 's7://IP:PORT', we extract the S7 idenifier and map 
>>> variable datatypes to the actual Avro datatypes for build the record output 
>>> schema. However, here we dont have vast experience with PLC4X libraries, 
>>> and for sure there will be better ways for doing this.
>>> Also about the Base Processor, we were thinking that maybe the best 
>>> approach could be to have this Base Processor, and then implement readers 
>>> for particular protocols as Controller Services. But here also, it could be 
>>> very helpful to have your opinion.
>>> 
>>> Lastly, regarding the pull request, do you have any documentation on 
>>> how to do this? I mean, maybe you have defined some naming 
>>> conventions, or expected structure to facilitate later work. At the 
>>> present, we have a fork of the project where we have been working on 
>>> these Nifi changes. We updated the content of our fork (fetch/merge
>>> upstream) about 2 weeks ago, and commited our changes to the 'develop' 
>>> branch. Do we better create a new branch with our commits? how do you 
>>> prefer to receive the code? (we are not very experts on git, just in 
>>> case we could cause some problems...)
>>> 
>>> thank you in advance
>>> 
>>> iñigo
>>> 
>>> 
>>> -----------------------------------------
>>> Iñigo Angulo
>>> 
>>> ZYLK.net :: consultoría.openSource
>>> telf.: 747412337
>>> Ribera de Axpe, 11
>>> Edificio A, modulo 201-203
>>> 48950 Erandio (Bizkaia)
>>> -----------------------------------------
>>> 
>>> ----- Mensaje original -----
>>> De: "Christofer Dutz" <[email protected]>
>>> Para: "dev" <[email protected]>
>>> Enviados: Miércoles, 21 de Abril 2021 20:01:15
>>> Asunto: AW: Nifi integration record oriented processor for reading
>>> 
>>> The more I think of it,
>>> 
>>> Perhaps we should also think of potentially providing some information on 
>>> supported configuration options.
>>> Wouldn't it be cool if the driver could say: "I generally have these 
>>> options and they have these datatypes and mean this"
>>> Additionally, the transports could too say: "I generally have these options 
>>> and they have these datatypes and mean this"
>>> 
>>> I would be our StreamPipes friends would love something like that? Right?
>>> 
>>> Chris
>>> 
>>> 
>>> -----Ursprüngliche Nachricht-----
>>> Von: Otto Fowler <[email protected]>
>>> Gesendet: Mittwoch, 21. April 2021 17:46
>>> An: [email protected]
>>> Betreff: Re: Nifi integration record oriented processor for reading
>>> 
>>> Hi Inigo,
>>> 
>>> I’m a committer on Apache Nifi as well as PLC4X, I would be happy to review 
>>> your processor.
>>> If I understand what you are saying correctly, you have a single processor 
>>> which supports record writing output?
>>> 
>>> plc4x -> records
>>> 
>>> And that you have, for configuration purposes for that processor created 
>>> support on a per protocol basis for configuration and validation?
>>> 
>>> If there is per protocol configuration / validation etc, it may be better 
>>> to have a base processor, and derived processors per protocol to handle 
>>> those differences.
>>> 
>>> I look forward to seeing the code.
>>> 
>>> 
>>>> On Apr 21, 2021, at 04:05, Iñigo Angulo <[email protected]> wrote:
>>>> 
>>>> Hi all,
>>>> 
>>>> I am writing as we have been working on the Apache Nifi integration part 
>>>> of the project. We have created a Record oriented processor for reading 
>>>> PLC data. It is based on the previous existing SourceProcessor, but works 
>>>> with records, using a Nifi Writer (such as Avro, Json, and so on) to write 
>>>> data on flowfiles content. 
>>>> 
>>>> We updated the code on our fork with the actual PLC4X git repo about 2 
>>>> weeks ago, and tested it reading values with S7 from a S7-1200 CPU from 
>>>> Nifi. Also, one of our customers has recently started to use it for 
>>>> validation. 
>>>> 
>>>> Currently, it works with S7 and Modbus over TCP. This is because we had to 
>>>> write some classes to map connectionString and variableList properties 
>>>> (sintax) of the processor to the actual protocol, to be able to build then 
>>>> avro schema for output flowfile, taking into account variable datatypes, 
>>>> etc. We only did this for S7 and Modbus. I am sure that there is a better 
>>>> way to do this, so at this point you maybe could take a look to find the 
>>>> best solution and avoid needing to do this mapping. 
>>>> 
>>>> If you find this useful, we could do a pull request to the main PLC4x 
>>>> repo. Let us know what you think. 
>>>> 
>>>> best regards,
>>>> iñigo
>>>> 
>>>> -----------------------------------------
>>>> Iñigo Angulo
>>>> 
>>>> ZYLK.net :: consultoría.openSource
>>>> telf.: 747412337
>>>> Ribera de Axpe, 11
>>>> Edificio A, modulo 201-203
>>>> 48950 Erandio (Bizkaia)
>>>> -----------------------------------------

Re: Nifi integration record oriented processor for reading

Reply via email to