thanks to Ryan,  i will do as you say.

[email protected]
 
From: Ryan Blue
Date: 2015-10-28 00:07
To: dev
Subject: Re: how to convert text parquet in flume serialization
I wouldn't recommend writing directly from Flume to Parquet. Parquet 
can't guarantee that data is on disk until a file is closed, so you end 
up with long-running transactions that back up into your file channel. 
Plus, if you are writing to a partitioned dataset you end up with 
several open files and huge memory consumption. I recommend first 
writing to Avro and then using a batch job to convert into Parquet.
 
If you really need to write directly to Parquet, take a look at the Kite 
DatasetSink instead of using the HDFS sink. That allows you to write 
directly to Parquet.
 
rb
 
On 10/26/2015 11:29 PM, [email protected] wrote:
>
> hi all,
>      i want to convert the flume sink to the parquet format in the 
> serialization, but the parquet writer constructor need a path parameter, 
> while the flume serialization just provide a outputstream interface. i don't 
> how to solve it. who can give me a sample ,thanks。
>
>
> [email protected]
>
 
 
-- 
Ryan Blue
Software Engineer
Cloudera, Inc.

Reply via email to