Hi Chris, Otto, Regarding the Record Processor concept, i will try to give an overview. In Nifi, information packages are called Flowfiles, and these are the actual units of information that are exchanged between Procesors, all along the dataflow. Flowfiles have two sections where we can manage data: Attributes and Content. In the "traditional" Nifi approach, you work with both sections, extracting information from the Content to the Attributes and viceversa to perform operations. This approach could have one limitation when you are processing batch data (lines from a CSV file for instance), where you need to split each of the lines into different Flowfiles. Thus, a 1000 line CSV file leads to 1000 Flowfiles to process, each of them containing a single record.
On later versions of the product, they introduced the Record oriented approach. This approach allows you to manage multiple records on a single Flowfile's Content, as long as these records have all the same schema. This means that the operations defined by the Processors are applied simultaneously to the whole content at once. Following with the previous example, a 1000 line CSV file could produce a single Flowfile with a content of 1000 records. To do this, Nifi uses Avro, to serialize the Flowfile's Content. Then, the Record Oriented Processors use Writers and Readers to present this information in the desired format (such as Avro, Json, CSV, etc). Basically, with the record oriented approach, Nifi introduced multiple new Processors, and also included the Record version of many of the "old" ones. Using this Record approach, Nifi perfomance enhances notably, specially when working with large structured information. The work we did was creating a Record Oriented Processor, based on the previously existing one Plc4xSourceProcessor, to read values from the devices. We have also included a README on the plc4x/plc4j/integrations/apache-nifi module explaining the Processor configuration and giving an example. Moreover, we put a nifi template with a dataflow for testing these processors, if useful. Otto, regarding the idea behind this new Processor, that is right. We added the writer capability to the existing PLC4XSourceProcessor, so that it formats the output to the desired configuration in a record manner. At the actual implementation, we did this "protocol adaptation" from the sintax of the particular properties on Processor's configuration. For example, from connection string 's7://IP:PORT', we extract the S7 idenifier and map variable datatypes to the actual Avro datatypes for build the record output schema. However, here we dont have vast experience with PLC4X libraries, and for sure there will be better ways for doing this. Also about the Base Processor, we were thinking that maybe the best approach could be to have this Base Processor, and then implement readers for particular protocols as Controller Services. But here also, it could be very helpful to have your opinion. Lastly, regarding the pull request, do you have any documentation on how to do this? I mean, maybe you have defined some naming conventions, or expected structure to facilitate later work. At the present, we have a fork of the project where we have been working on these Nifi changes. We updated the content of our fork (fetch/merge upstream) about 2 weeks ago, and commited our changes to the 'develop' branch. Do we better create a new branch with our commits? how do you prefer to receive the code? (we are not very experts on git, just in case we could cause some problems...) thank you in advance iñigo ----------------------------------------- Iñigo Angulo ZYLK.net :: consultoría.openSource telf.: 747412337 Ribera de Axpe, 11 Edificio A, modulo 201-203 48950 Erandio (Bizkaia) ----------------------------------------- ----- Mensaje original ----- De: "Christofer Dutz" <[email protected]> Para: "dev" <[email protected]> Enviados: Miércoles, 21 de Abril 2021 20:01:15 Asunto: AW: Nifi integration record oriented processor for reading The more I think of it, Perhaps we should also think of potentially providing some information on supported configuration options. Wouldn't it be cool if the driver could say: "I generally have these options and they have these datatypes and mean this" Additionally, the transports could too say: "I generally have these options and they have these datatypes and mean this" I would be our StreamPipes friends would love something like that? Right? Chris -----Ursprüngliche Nachricht----- Von: Otto Fowler <[email protected]> Gesendet: Mittwoch, 21. April 2021 17:46 An: [email protected] Betreff: Re: Nifi integration record oriented processor for reading Hi Inigo, I’m a committer on Apache Nifi as well as PLC4X, I would be happy to review your processor. If I understand what you are saying correctly, you have a single processor which supports record writing output? plc4x -> records And that you have, for configuration purposes for that processor created support on a per protocol basis for configuration and validation? If there is per protocol configuration / validation etc, it may be better to have a base processor, and derived processors per protocol to handle those differences. I look forward to seeing the code. > On Apr 21, 2021, at 04:05, Iñigo Angulo <[email protected]> wrote: > > Hi all, > > I am writing as we have been working on the Apache Nifi integration part of > the project. We have created a Record oriented processor for reading PLC > data. It is based on the previous existing SourceProcessor, but works with > records, using a Nifi Writer (such as Avro, Json, and so on) to write data on > flowfiles content. > > We updated the code on our fork with the actual PLC4X git repo about 2 weeks > ago, and tested it reading values with S7 from a S7-1200 CPU from Nifi. Also, > one of our customers has recently started to use it for validation. > > Currently, it works with S7 and Modbus over TCP. This is because we had to > write some classes to map connectionString and variableList properties > (sintax) of the processor to the actual protocol, to be able to build then > avro schema for output flowfile, taking into account variable datatypes, etc. > We only did this for S7 and Modbus. I am sure that there is a better way to > do this, so at this point you maybe could take a look to find the best > solution and avoid needing to do this mapping. > > If you find this useful, we could do a pull request to the main PLC4x repo. > Let us know what you think. > > best regards, > iñigo > > ----------------------------------------- > Iñigo Angulo > > ZYLK.net :: consultoría.openSource > telf.: 747412337 > Ribera de Axpe, 11 > Edificio A, modulo 201-203 > 48950 Erandio (Bizkaia) > -----------------------------------------
