[
https://issues.apache.org/jira/browse/NIFI-919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739084#comment-14739084
]
Ryan Blue commented on NIFI-919:
--------------------------------
Sorry to do this right after you added the warning, but I'm not sure we need to
have one.
Bare records are not dangerous because we will ensure that the schema is
tracked in the FlowFile attributes. It also makes it possible to build some of
the other planned processors, like extract paths. I don't see a way to run
extract paths on an entire file of Avro data, so this is essentially allowing
users to work with individual records and reassemble files later.
For example, if I receive a file of logs, I could split it into individual
records then extract the log level and route ERROR to extra handling, discard
DEBUG, and reassemble the rest to store in a fact table. If we didn't have bare
records, we could do the same thing, but it would require a more complicated
processor that can extract paths and filter an Avro file.
> Support Splitting Avro Files
> ----------------------------
>
> Key: NIFI-919
> URL: https://issues.apache.org/jira/browse/NIFI-919
> Project: Apache NiFi
> Issue Type: New Feature
> Reporter: Bryan Bende
> Assignee: Bryan Bende
> Priority: Minor
> Fix For: 0.4.0
>
> Attachments: NIFI-919-2.patch, NIFI-919.patch
>
>
> Provide a processor that splits an Avro file into multiple smaller files.
> Would be nice to have a configurable batch size so a user could produce
> single record files and also multi-record files of smaller size than the
> original. Also consider making the output format configurable, data file vs
> bare record.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)