[ 
https://issues.apache.org/jira/browse/NIFI-821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14680539#comment-14680539
 ] 

Mark Payne commented on NIFI-821:
---------------------------------

[~bende] Very much like the idea. Code looks good, but I had a few 
comments/suggestions:

* Looks like the Avro spec recommends that it be transferred over HTTP using 
the Content-Type of "avro/binary" but googling indicates that there's no 
official mime type for it. https://issues.apache.org/jira/browse/AVRO-488 
recommends "application/avro-binary". It looks like MergeContent is using 
application/avro. Perhaps it should use "application/avro-binary" instead?

* The AvroMerger class has several different places that it calls 
"reader.close()". Should probably wrap that entire block with a try/finally to 
ensure that it is closed.

* The patch adds a new getUnmergedFlowFiles method, where all other mergers 
implement it by returning "new ArrayList<>()". Would recommend that be changed 
to return Collections.emptyList() instead.

* As far as I can tell, if you pump several different Avro messages in to the 
MergeContent processor with different schemas, it will bin them together, 
detect that they are different, and push those that are not the same as the 
first back on the queue. This was described above and would be avoided if the 
correlation attribute was tied to the schema. In the implementation, though, 
when this occurs, it logs an ERROR-level message. As this would be a very 
normal condition, I'd recommend changing that from ERROR to DEBUG level (unless 
I'm reading something wrong?)



> Support Merging of Avro
> -----------------------
>
>                 Key: NIFI-821
>                 URL: https://issues.apache.org/jira/browse/NIFI-821
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Bryan Bende
>            Assignee: Bryan Bende
>            Priority: Minor
>             Fix For: 0.3.0
>
>         Attachments: NIFI-821.patch
>
>
> We should support the ability to merge Avro files of the same schema, similar 
> to how MergeContent works.
> Avro tools provides a command line tool for doing this which can be found 
> here: 
> https://github.com/apache/avro/blob/trunk/lang/java/tools/src/main/java/org/apache/avro/tool/ConcatTool.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to