[ 
https://issues.apache.org/jira/browse/NIFI-919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729596#comment-14729596
 ] 

Bryan Bende commented on NIFI-919:
----------------------------------

[~busbey] was wondering if you had some time to look at this... I pushed a 
branch of an initial processor I started working on. 
Was taking a stab at implementing a mode that splits on blocks as you 
suggested, but running into something weird.

Relevant code is here: 
https://github.com/apache/nifi/blob/NIFI-919/nifi-nar-bundles/nifi-avro-bundle/nifi-avro-processors/src/main/java/org/apache/nifi/processors/avro/SplitAvro.java#L275

Unit Test that fails here:
https://github.com/apache/nifi/blob/NIFI-919/nifi-nar-bundles/nifi-avro-bundle/nifi-avro-processors/src/test/java/org/apache/nifi/processors/avro/TestSplitAvro.java#L115

During the test it runs the processor fine, but when reading the resulting data 
back in to verify, it gets:
org.apache.avro.AvroRuntimeException: java.io.IOException: Block read 
partially, the data may be corrupt
        at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210)

I think this is because the appendEncoded() method is expecting a single record 
encoded, and not a whole block of records, but I am not totally sure. 

> Support Splitting Avro Files
> ----------------------------
>
>                 Key: NIFI-919
>                 URL: https://issues.apache.org/jira/browse/NIFI-919
>             Project: Apache NiFi
>          Issue Type: New Feature
>            Reporter: Bryan Bende
>            Assignee: Bryan Bende
>            Priority: Minor
>             Fix For: 0.4.0
>
>
> Provide a processor that splits an Avro file into multiple smaller files. 
> Would be nice to have a configurable batch size so a user could produce 
> single record files and also multi-record files of smaller size than the 
> original. Also consider making the output format configurable, data file vs 
> bare record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to