Bryan Bende created NIFI-5324:
---------------------------------

             Summary: Implement syslog record readers
                 Key: NIFI-5324
                 URL: https://issues.apache.org/jira/browse/NIFI-5324
             Project: Apache NiFi
          Issue Type: Improvement
            Reporter: Bryan Bende


Creating this Jira based on discussion with [~ottobackwards] in the NiFi 
HipChat room...

We currently have ListenSyslog with optional parsing when batch size is 1, and 
ParseSyslog which also assumes 1 message per flow file. There is also 
ListenTCPRecord and ListenUDPRecord which can be used with a GrokReader to read 
log messages from the respective network connections.

The common scenario for wanting to parse the syslog messages is to extract a 
field from the syslog message into an attribute and then use the attribute to 
make decisions like routing/filtering.

Since the "1 message per flow file" pattern is generally something we try to 
avoid, it would be nice if we could keep batches of syslog messages together in 
a single flow file and then use record processors to process the batches.

For example, if we had a syslog record reader we could then use PartitionRecord 
to divide a flow file of many syslog records into smaller groups based on some 
field in the message, each group can then be routed somewhere based on the 
group value.

Another example would be to use QueryRecord to run a SQL query that selects 
specify syslog messages based on a field in the message.

It would also make it easy to convert syslog messages to a structured format 
using ConvertRecord with a syslog reader and a writer like JSON or Avro.

We would likely want two syslog record readers, one for each of the RFC formats.

One aspect to consider is related to the schema used/produced by the reader... 
typically the readers/writers have a "Schema Access Strategy" where they can 
obtain a schema from a schema registry, or from flow file attributes, or 
something specific to the format like an embedded Avro schema.

In this case, the schema is somewhat pre-determined by the specific syslog 
reader because the schema can only be at-most the fields produced by the reader 
parsing the messages. So this may be a case where there is no schema access 
strategy, and there are per-determined schemas.  It is sort of like the 
GrokReader where it creates a schema from the named fields in the expression, 
except in this case there is no user defined expression, and the named fields 
are dictated by the parser.

We may need to reuse syslog related code that is in nifi-standard-processors, 
so it might require moving that code to nifi-processor-utils, or creating a new 
nifi-syslog-utils module.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to