Selina,
I would use parquet-avro to create a writer. Kafka messages are commonly
encoded as Avro, so you may already be working with Avro objects. If
not, then convert to Avro and then write to the AvroParquetWriter.
You can create a the writer that creates S3 files by setting up your S3
file system settings in a Configuration and then using paths that look
like this: s3n://s3bucket-name/path/within/bucket. You would just pass
that Path to the AvroParquetWriter.builder method, configure the
builder, and call build() to get a configured writer.
rb
On 11/09/2015 04:55 PM, Selina Tech wrote:
Hi, Ryan:
Thanks a lot for your suggestion. I do not have to get the
output stream if I could write my continually Kafka message (in json,
cvs or avro format) to AWS S3 in parquet format. Would you like to
introduce a little bit more detail about it and then I find some
solution in detail?
There is one solution. Create a Parquet table by presto in
Hive, and use Presto Hive connector to sql data and save the data to
Hive and then send the data to S3. I am wondering if there is a better
solution?
Sincerely,
Selina
On Mon, Nov 9, 2015 at 9:35 AM, Ryan Blue <[email protected]
<mailto:[email protected]>> wrote:
Selina,
You should be able to write to S3 without needing to flush to an
output stream. You would just use the S3 FileSystem to write data
instead of HDFS. This doesn't need to require Parquet to write to an
OutputStream instead of a file. Is there a reason why you want to
supply an output stream instead?
rb
On 11/05/2015 05:56 PM, Selina Tech wrote:
Dear all:
I am wondering if I could read input stream such as
Kafka and convert
it Parquet data and write back to output stream? All example I
found
convert data file to Parquet data.
I know this feature is not available last year. How
about right now?
I am trying to aggregate Kafka message by Samza and
convert it to
Parquet data and then save it to S3. What is the best one to
implement it?
Sincerely,
Selina
reference:
https://github.com/Parquet/parquet-mr/issues/231
--
Ryan Blue
Software Engineer
Cloudera, Inc.
--
Ryan Blue
Software Engineer
Cloudera, Inc.