[
https://issues.apache.org/jira/browse/NIFI-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15988716#comment-15988716
]
Joseph Witt commented on NIFI-3759:
-----------------------------------
Hello - thanks for your contribution.
A couple of concerns come to mind.
The PutHDFS processor is format agnostic and for good reason. Adding specific
formats brings in a range of complexities in terms of dependencies and options
appropriate to those formats. This is an extremely heavily used processor so
expanding it to take on this sort of thing must be measured. Here we see we're
adding a file format but the only format is Avro.
A separate concern from the above is in this case with Avro we have to consider
the schema of the records being written. MergeContent has the logic to handle
merging things of like schema but this patch does not appear to do that. I
would strongly prefer we do not complicate it with that though.
To Andre's point with this upcoming nifi 1.2.0 release we have a new
abstraction available oriented around structured records. This allows Record
aware processors to plug-in controller services that handle how to deserialize
data into Records using a RecordReader and serialize records to whatever output
format using RecordWriters. With this approach there is now a PR to do things
like have a PutParquet processor that takes in Records (the processor doesn't
know whether the data is avro, csv, json, etc... because it doesn't have to)
and writes out Parquet formatted columnar goodness to HDFS. It also has the
reverse. Doing the same with ORC next makes sense. Also, there are a host of
record oriented processors in now to run streaming sql and so on. Anyway, long
story short this probably offers a great path to do what you want here.
Finally, I've long been wishing we never added 'append' support in PutHDFS.
Can you describe your use case a bit more which makes you want to add to a file
in HDFS after it has already been written?
> Enable Avro append for Put HDFS
> -------------------------------
>
> Key: NIFI-3759
> URL: https://issues.apache.org/jira/browse/NIFI-3759
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Core Framework
> Affects Versions: 1.1.0, 1.1.1
> Reporter: Jonas Hartwig
> Priority: Minor
>
> It would be nice where Nifi already supports working with AVRO to enable hdfs
> append working correctly with avro files coming from MergeContent processor.
> I suggest making a similar choice in as in MergeContent to be able to choose
> a file format which applies file format specific logic when required.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)