Bryan Bende created NIFI-5324:
---------------------------------
Summary: Implement syslog record readers
Key: NIFI-5324
URL: https://issues.apache.org/jira/browse/NIFI-5324
Project: Apache NiFi
Issue Type: Improvement
Reporter: Bryan Bende
Creating this Jira based on discussion with [~ottobackwards] in the NiFi
HipChat room...
We currently have ListenSyslog with optional parsing when batch size is 1, and
ParseSyslog which also assumes 1 message per flow file. There is also
ListenTCPRecord and ListenUDPRecord which can be used with a GrokReader to read
log messages from the respective network connections.
The common scenario for wanting to parse the syslog messages is to extract a
field from the syslog message into an attribute and then use the attribute to
make decisions like routing/filtering.
Since the "1 message per flow file" pattern is generally something we try to
avoid, it would be nice if we could keep batches of syslog messages together in
a single flow file and then use record processors to process the batches.
For example, if we had a syslog record reader we could then use PartitionRecord
to divide a flow file of many syslog records into smaller groups based on some
field in the message, each group can then be routed somewhere based on the
group value.
Another example would be to use QueryRecord to run a SQL query that selects
specify syslog messages based on a field in the message.
It would also make it easy to convert syslog messages to a structured format
using ConvertRecord with a syslog reader and a writer like JSON or Avro.
We would likely want two syslog record readers, one for each of the RFC formats.
One aspect to consider is related to the schema used/produced by the reader...
typically the readers/writers have a "Schema Access Strategy" where they can
obtain a schema from a schema registry, or from flow file attributes, or
something specific to the format like an embedded Avro schema.
In this case, the schema is somewhat pre-determined by the specific syslog
reader because the schema can only be at-most the fields produced by the reader
parsing the messages. So this may be a case where there is no schema access
strategy, and there are per-determined schemas. It is sort of like the
GrokReader where it creates a schema from the named fields in the expression,
except in this case there is no user defined expression, and the named fields
are dictated by the parser.
We may need to reuse syslog related code that is in nifi-standard-processors,
so it might require moving that code to nifi-processor-utils, or creating a new
nifi-syslog-utils module.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)