[ 
https://issues.apache.org/jira/browse/NIFI-919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730985#comment-14730985
 ] 

Bryan Bende commented on NIFI-919:
----------------------------------

I've been playing around with adding a new method to DataFileWriter which 
essentially works exactly the same as appendAllFrom(...), but it lets you 
specify the number of blocks:
https://issues.apache.org/jira/browse/AVRO-1726

I think if we got that in there it would work nicely for splitting up the files 
based on number of blocks.
Instead of making the "Split Size" property be the number of records (or 
approx. number), we would likely make it the number of blocks when Split 
Strategy = Block, and the number of records when Split Strategy = Record.

I'm thinking maybe first pass of this processor only supports the record 
strategy, unless we figure out something that doesn't require adding 
functionality to Avro.

> Support Splitting Avro Files
> ----------------------------
>
>                 Key: NIFI-919
>                 URL: https://issues.apache.org/jira/browse/NIFI-919
>             Project: Apache NiFi
>          Issue Type: New Feature
>            Reporter: Bryan Bende
>            Assignee: Bryan Bende
>            Priority: Minor
>             Fix For: 0.4.0
>
>
> Provide a processor that splits an Avro file into multiple smaller files. 
> Would be nice to have a configurable batch size so a user could produce 
> single record files and also multi-record files of smaller size than the 
> original. Also consider making the output format configurable, data file vs 
> bare record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to