[ 
https://issues.apache.org/jira/browse/NIFI-919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739084#comment-14739084
 ] 

Ryan Blue commented on NIFI-919:
--------------------------------

Sorry to do this right after you added the warning, but I'm not sure we need to 
have one.

Bare records are not dangerous because we will ensure that the schema is 
tracked in the FlowFile attributes. It also makes it possible to build some of 
the other planned processors, like extract paths. I don't see a way to run 
extract paths on an entire file of Avro data, so this is essentially allowing 
users to work with individual records and reassemble files later.

For example, if I receive a file of logs, I could split it into individual 
records then extract the log level and route ERROR to extra handling, discard 
DEBUG, and reassemble the rest to store in a fact table. If we didn't have bare 
records, we could do the same thing, but it would require a more complicated 
processor that can extract paths and filter an Avro file.

> Support Splitting Avro Files
> ----------------------------
>
>                 Key: NIFI-919
>                 URL: https://issues.apache.org/jira/browse/NIFI-919
>             Project: Apache NiFi
>          Issue Type: New Feature
>            Reporter: Bryan Bende
>            Assignee: Bryan Bende
>            Priority: Minor
>             Fix For: 0.4.0
>
>         Attachments: NIFI-919-2.patch, NIFI-919.patch
>
>
> Provide a processor that splits an Avro file into multiple smaller files. 
> Would be nice to have a configurable batch size so a user could produce 
> single record files and also multi-record files of smaller size than the 
> original. Also consider making the output format configurable, data file vs 
> bare record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to