[
https://issues.apache.org/jira/browse/NIFI-919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729596#comment-14729596
]
Bryan Bende commented on NIFI-919:
----------------------------------
[~busbey] was wondering if you had some time to look at this... I pushed a
branch of an initial processor I started working on.
Was taking a stab at implementing a mode that splits on blocks as you
suggested, but running into something weird.
Relevant code is here:
https://github.com/apache/nifi/blob/NIFI-919/nifi-nar-bundles/nifi-avro-bundle/nifi-avro-processors/src/main/java/org/apache/nifi/processors/avro/SplitAvro.java#L275
Unit Test that fails here:
https://github.com/apache/nifi/blob/NIFI-919/nifi-nar-bundles/nifi-avro-bundle/nifi-avro-processors/src/test/java/org/apache/nifi/processors/avro/TestSplitAvro.java#L115
During the test it runs the processor fine, but when reading the resulting data
back in to verify, it gets:
org.apache.avro.AvroRuntimeException: java.io.IOException: Block read
partially, the data may be corrupt
at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210)
I think this is because the appendEncoded() method is expecting a single record
encoded, and not a whole block of records, but I am not totally sure.
> Support Splitting Avro Files
> ----------------------------
>
> Key: NIFI-919
> URL: https://issues.apache.org/jira/browse/NIFI-919
> Project: Apache NiFi
> Issue Type: New Feature
> Reporter: Bryan Bende
> Assignee: Bryan Bende
> Priority: Minor
> Fix For: 0.4.0
>
>
> Provide a processor that splits an Avro file into multiple smaller files.
> Would be nice to have a configurable batch size so a user could produce
> single record files and also multi-record files of smaller size than the
> original. Also consider making the output format configurable, data file vs
> bare record.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)