thanks to Ryan, i will do as you say.
[email protected] From: Ryan Blue Date: 2015-10-28 00:07 To: dev Subject: Re: how to convert text parquet in flume serialization I wouldn't recommend writing directly from Flume to Parquet. Parquet can't guarantee that data is on disk until a file is closed, so you end up with long-running transactions that back up into your file channel. Plus, if you are writing to a partitioned dataset you end up with several open files and huge memory consumption. I recommend first writing to Avro and then using a batch job to convert into Parquet. If you really need to write directly to Parquet, take a look at the Kite DatasetSink instead of using the HDFS sink. That allows you to write directly to Parquet. rb On 10/26/2015 11:29 PM, [email protected] wrote: > > hi all, > i want to convert the flume sink to the parquet format in the > serialization, but the parquet writer constructor need a path parameter, > while the flume serialization just provide a outputstream interface. i don't > how to solve it. who can give me a sample ,thanks。 > > > [email protected] > -- Ryan Blue Software Engineer Cloudera, Inc.
