Re: how to convert text parquet in flume serialization

Ryan Blue Tue, 27 Oct 2015 09:08:35 -0700

I wouldn't recommend writing directly from Flume to Parquet. Parquetcan't guarantee that data is on disk until a file is closed, so you endup with long-running transactions that back up into your file channel.Plus, if you are writing to a partitioned dataset you end up withseveral open files and huge memory consumption. I recommend firstwriting to Avro and then using a batch job to convert into Parquet.

If you really need to write directly to Parquet, take a look at the KiteDatasetSink instead of using the HDFS sink. That allows you to writedirectly to Parquet.


rb

On 10/26/2015 11:29 PM, [email protected] wrote:


hi all,
     i want to convert the flume sink to the parquet format in the 
serialization, but the parquet writer constructor need a path parameter, while 
the flume serialization just provide a outputstream interface. i don't how to 
solve it. who can give me a sample ,thanks。


[email protected]



--
Ryan Blue
Software Engineer
Cloudera, Inc.

Re: how to convert text parquet in flume serialization

Reply via email to