[
https://issues.apache.org/jira/browse/NIFI-821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14658791#comment-14658791
]
Bryan Bende commented on NIFI-821:
----------------------------------
The attached patch adds a new "Avro" merge format to MergeContent. The merge
code is based on porting the logic in the Avro ConcatTool.
One scenario that is problematic is the fact that the first record in the bin
is assumed to be the "correct" schema. So consider an example where we are
waiting for 100 entries in the bin, and the first file that comes in has
schema1 and the rest of the 99 files have schema2. With this implementation,
the 99 files will be considered unmergable failures, even though the first file
might have been the one that was bad.
We could possibly handle this scenario more gracefully with a specialized
MergeAvro processor that extended BinFiles and allowed Avro specific
properties, but then we may want to reuse a significant portion of logic in
MergeContent.
[~markap14] [~rdblue] [~aldrin] thoughts?
> Support Merging of Avro
> -----------------------
>
> Key: NIFI-821
> URL: https://issues.apache.org/jira/browse/NIFI-821
> Project: Apache NiFi
> Issue Type: Improvement
> Reporter: Bryan Bende
> Assignee: Bryan Bende
> Priority: Minor
> Fix For: 0.4.0
>
> Attachments: NIFI-821.patch
>
>
> We should support the ability to merge Avro files of the same schema, similar
> to how MergeContent works.
> Avro tools provides a command line tool for doing this which can be found
> here:
> https://github.com/apache/avro/blob/trunk/lang/java/tools/src/main/java/org/apache/avro/tool/ConcatTool.java
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)