Re: AW: Nifi integration record oriented processor for reading

Iñigo Angulo Thu, 22 Apr 2021 02:48:32 -0700

Hi Chris, Otto,

Regarding the Record Processor concept, i will try to give an overview. In 
Nifi, information packages are called Flowfiles, and these are the actual units 
of information that are exchanged between Procesors, all along the dataflow. 
Flowfiles have two sections where we can manage data: Attributes and Content. 
In the "traditional" Nifi approach, you work with both sections, extracting 
information from the Content to the Attributes and viceversa to perform 
operations. This approach could have one limitation when you are processing 
batch data (lines from a CSV file for instance), where you need to split each 
of the lines into different Flowfiles. Thus, a 1000 line CSV file leads to 1000 
Flowfiles to process, each of them containing a single record.

On later versions of the product, they introduced the Record oriented approach. 
This approach allows you to manage multiple records on a single Flowfile's 
Content, as long as these records have all the same schema. This means that the 
operations defined by the Processors are applied simultaneously to the whole 
content at once. Following with the previous example, a 1000 line CSV file 
could produce a single Flowfile with a content of 1000 records. 

To do this, Nifi uses Avro, to serialize the Flowfile's Content. Then, the 
Record Oriented Processors use Writers and Readers to present this information 
in the desired format (such as Avro, Json, CSV, etc). Basically, with the 
record oriented approach, Nifi introduced multiple new Processors, and also 
included the Record version of many of the "old" ones. Using this Record 
approach, Nifi perfomance enhances notably, specially when working with large 
structured information.

The work we did was creating a Record Oriented Processor, based on the 
previously existing one Plc4xSourceProcessor, to read values from the devices. 
We have also included a README on the plc4x/plc4j/integrations/apache-nifi 
module explaining the Processor configuration and giving an example. Moreover, 
we put a nifi template with a dataflow for testing these processors, if useful.

Otto, regarding the idea behind this new Processor, that is right. We added the 
writer capability to the existing PLC4XSourceProcessor, so that it formats the 
output to the desired configuration in a record manner. At the actual 
implementation, we did this "protocol adaptation" from the sintax of the 
particular properties on Processor's configuration. For example, from 
connection string 's7://IP:PORT', we extract the S7 idenifier and map variable 
datatypes to the actual Avro datatypes for build the record output schema. 
However, here we dont have vast experience with PLC4X libraries, and for sure 
there will be better ways for doing this.
Also about the Base Processor, we were thinking that maybe the best approach 
could be to have this Base Processor, and then implement readers for particular 
protocols as Controller Services. But here also, it could be very helpful to 
have your opinion.

Lastly, regarding the pull request, do you have any documentation on how to do 
this? I mean, maybe you have defined some naming conventions, or expected 
structure to facilitate later work. At the present, we have a fork of the 
project where we have been working on these Nifi changes. We updated the 
content of our fork (fetch/merge upstream) about 2 weeks ago, and commited our 
changes to the 'develop' branch. Do we better create a new branch with our 
commits? how do you prefer to receive the code? (we are not very experts on 
git, just in case we could cause some problems...)

thank you in advance

iñigo

----------------------------------------- 
Iñigo Angulo 

ZYLK.net :: consultoría.openSource 
telf.: 747412337 
Ribera de Axpe, 11 
Edificio A, modulo 201-203 
48950 Erandio (Bizkaia) 
-----------------------------------------

----- Mensaje original -----
De: "Christofer Dutz" <[email protected]>
Para: "dev" <[email protected]>
Enviados: Miércoles, 21 de Abril 2021 20:01:15
Asunto: AW: Nifi integration record oriented processor for reading

The more I think of it,

Perhaps we should also think of potentially providing some information on 
supported configuration options.
Wouldn't it be cool if the driver could say: "I generally have these options 
and they have these datatypes and mean this"
Additionally, the transports could too say: "I generally have these options and 
they have these datatypes and mean this"

I would be our StreamPipes friends would love something like that? Right?

Chris

-----Ursprüngliche Nachricht-----
Von: Otto Fowler <[email protected]> 
Gesendet: Mittwoch, 21. April 2021 17:46
An: [email protected]
Betreff: Re: Nifi integration record oriented processor for reading

Hi Inigo,

I’m a committer on Apache Nifi as well as PLC4X, I would be happy to review 
your processor.
If I understand what you are saying correctly, you have a single processor 
which supports record writing output?

plc4x -> records

And that you have, for configuration purposes for that processor created 
support on a per protocol basis for configuration and validation?

If there is per protocol configuration / validation etc, it may be better to 
have a base processor, and derived processors per protocol to handle those 
differences.

I look forward to seeing the code.

> On Apr 21, 2021, at 04:05, Iñigo Angulo <[email protected]> wrote:
> 
> Hi all, 
> 
> I am writing as we have been working on the Apache Nifi integration part of 
> the project. We have created a Record oriented processor for reading PLC 
> data. It is based on the previous existing SourceProcessor, but works with 
> records, using a Nifi Writer (such as Avro, Json, and so on) to write data on 
> flowfiles content. 
> 
> We updated the code on our fork with the actual PLC4X git repo about 2 weeks 
> ago, and tested it reading values with S7 from a S7-1200 CPU from Nifi. Also, 
> one of our customers has recently started to use it for validation. 
> 
> Currently, it works with S7 and Modbus over TCP. This is because we had to 
> write some classes to map connectionString and variableList properties 
> (sintax) of the processor to the actual protocol, to be able to build then 
> avro schema for output flowfile, taking into account variable datatypes, etc. 
> We only did this for S7 and Modbus. I am sure that there is a better way to 
> do this, so at this point you maybe could take a look to find the best 
> solution and avoid needing to do this mapping. 
> 
> If you find this useful, we could do a pull request to the main PLC4x repo. 
> Let us know what you think. 
> 
> best regards, 
> iñigo 
> 
> ----------------------------------------- 
> Iñigo Angulo 
> 
> ZYLK.net :: consultoría.openSource 
> telf.: 747412337 
> Ribera de Axpe, 11 
> Edificio A, modulo 201-203 
> 48950 Erandio (Bizkaia) 
> -----------------------------------------

Re: AW: Nifi integration record oriented processor for reading

Reply via email to