[ 
https://issues.apache.org/jira/browse/NIFI-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16520582#comment-16520582
 ] 

Otto Fowler commented on NIFI-5324:
-----------------------------------

[~bende] super, thanks!

> Implement syslog record readers
> -------------------------------
>
>                 Key: NIFI-5324
>                 URL: https://issues.apache.org/jira/browse/NIFI-5324
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Bryan Bende
>            Assignee: Otto Fowler
>            Priority: Major
>
> Creating this Jira based on discussion with [~ottobackwards] in the NiFi 
> HipChat room...
> We currently have ListenSyslog with optional parsing when batch size is 1, 
> and ParseSyslog which also assumes 1 message per flow file. There is also 
> ListenTCPRecord and ListenUDPRecord which can be used with a GrokReader to 
> read log messages from the respective network connections.
> The common scenario for wanting to parse the syslog messages is to extract a 
> field from the syslog message into an attribute and then use the attribute to 
> make decisions like routing/filtering.
> Since the "1 message per flow file" pattern is generally something we try to 
> avoid, it would be nice if we could keep batches of syslog messages together 
> in a single flow file and then use record processors to process the batches.
> For example, if we had a syslog record reader we could then use 
> PartitionRecord to divide a flow file of many syslog records into smaller 
> groups based on some field in the message, each group can then be routed 
> somewhere based on the group value.
> Another example would be to use QueryRecord to run a SQL query that selects 
> specify syslog messages based on a field in the message.
> It would also make it easy to convert syslog messages to a structured format 
> using ConvertRecord with a syslog reader and a writer like JSON or Avro.
> We would likely want two syslog record readers, one for each of the RFC 
> formats.
> One aspect to consider is related to the schema used/produced by the 
> reader... typically the readers/writers have a "Schema Access Strategy" where 
> they can obtain a schema from a schema registry, or from flow file 
> attributes, or something specific to the format like an embedded Avro schema.
> In this case, the schema is somewhat pre-determined by the specific syslog 
> reader because the schema can only be at-most the fields produced by the 
> reader parsing the messages. So this may be a case where there is no schema 
> access strategy, and there are per-determined schemas.  It is sort of like 
> the GrokReader where it creates a schema from the named fields in the 
> expression, except in this case there is no user defined expression, and the 
> named fields are dictated by the parser.
> We may need to reuse syslog related code that is in nifi-standard-processors, 
> so it might require moving that code to nifi-processor-utils, or creating a 
> new nifi-syslog-utils module.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to