[jira] [Commented] (NIFI-5706) Processor ConvertAvroToParquet

Bryan Bende (JIRA) Tue, 16 Oct 2018 10:45:58 -0700


    [ 
https://issues.apache.org/jira/browse/NIFI-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652138#comment-16652138
 ]


Bryan Bende commented on NIFI-5706:
-----------------------------------

I originally implemented PutParquet which allowed a record reader and then 
required writing to HDFS, and I did this because I didn't think it was possible 
to write Parquet to the OutputStream of a flow file, since Parquet's whole API 
is based on the Hadoop Filesystem object.

However, the approach in this PR is quite interesting and shows that it might 
actually be possible!! Nice work.

I do agree with Pierre's comment, that if we can take this approach and instead 
implement a Parquet record writer, we can then use ConvertRecord to convert any 
format to Parquet.

> Processor ConvertAvroToParquet 
> -------------------------------
>
>                 Key: NIFI-5706
>                 URL: https://issues.apache.org/jira/browse/NIFI-5706
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>    Affects Versions: 1.7.1
>            Reporter: Mohit
>            Priority: Major
>              Labels: pull-request-available
>
> *Why*?
> PutParquet support is limited to HDFS. 
> PutParquet bypasses the _flowfile_ implementation and writes the file 
> directly to sink. 
> We need a processor for parquet that works like _ConvertAvroToOrc_.
> *What*?
> _ConvertAvroToParquet_ will convert the incoming avro flowfile to a parquet 
> flowfile. Unlike PutParquet, which writes to the hdfs file system, processor 
> ConvertAvroToParquet would write into the flowfile, which can be pipelined to 
> put into other sinks, like _local_, _S3, Azure data lake_ etc.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFI-5706) Processor ConvertAvroToParquet

Reply via email to