[
https://issues.apache.org/jira/browse/NIFI-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16520582#comment-16520582
]
Otto Fowler commented on NIFI-5324:
-----------------------------------
[~bende] super, thanks!
> Implement syslog record readers
> -------------------------------
>
> Key: NIFI-5324
> URL: https://issues.apache.org/jira/browse/NIFI-5324
> Project: Apache NiFi
> Issue Type: Improvement
> Reporter: Bryan Bende
> Assignee: Otto Fowler
> Priority: Major
>
> Creating this Jira based on discussion with [~ottobackwards] in the NiFi
> HipChat room...
> We currently have ListenSyslog with optional parsing when batch size is 1,
> and ParseSyslog which also assumes 1 message per flow file. There is also
> ListenTCPRecord and ListenUDPRecord which can be used with a GrokReader to
> read log messages from the respective network connections.
> The common scenario for wanting to parse the syslog messages is to extract a
> field from the syslog message into an attribute and then use the attribute to
> make decisions like routing/filtering.
> Since the "1 message per flow file" pattern is generally something we try to
> avoid, it would be nice if we could keep batches of syslog messages together
> in a single flow file and then use record processors to process the batches.
> For example, if we had a syslog record reader we could then use
> PartitionRecord to divide a flow file of many syslog records into smaller
> groups based on some field in the message, each group can then be routed
> somewhere based on the group value.
> Another example would be to use QueryRecord to run a SQL query that selects
> specify syslog messages based on a field in the message.
> It would also make it easy to convert syslog messages to a structured format
> using ConvertRecord with a syslog reader and a writer like JSON or Avro.
> We would likely want two syslog record readers, one for each of the RFC
> formats.
> One aspect to consider is related to the schema used/produced by the
> reader... typically the readers/writers have a "Schema Access Strategy" where
> they can obtain a schema from a schema registry, or from flow file
> attributes, or something specific to the format like an embedded Avro schema.
> In this case, the schema is somewhat pre-determined by the specific syslog
> reader because the schema can only be at-most the fields produced by the
> reader parsing the messages. So this may be a case where there is no schema
> access strategy, and there are per-determined schemas. It is sort of like
> the GrokReader where it creates a schema from the named fields in the
> expression, except in this case there is no user defined expression, and the
> named fields are dictated by the parser.
> We may need to reuse syslog related code that is in nifi-standard-processors,
> so it might require moving that code to nifi-processor-utils, or creating a
> new nifi-syslog-utils module.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)