Re: Reading Parquet data from input stream and write to output stream

Selina Tech Mon, 09 Nov 2015 20:31:45 -0800

Hi, Ryan:

        Thanks a lot for your suggestion.  I do not have to get the output
stream if I could write my continually Kafka message (in json, cvs or avro
format) to AWS S3 in parquet format. Would you like to introduce a little
bit more detail about it and then I find some solution in detail?


         There is one solution. Create a Parquet table by presto in Hive,
and use Presto Hive connector to sql data and save the data to Hive and
then send the data to S3. I am wondering if there is a better solution?

Sincerely,
Selina

On Mon, Nov 9, 2015 at 9:35 AM, Ryan Blue <[email protected]> wrote:

> Selina,
>
> You should be able to write to S3 without needing to flush to an output
> stream. You would just use the S3 FileSystem to write data instead of HDFS.
> This doesn't need to require Parquet to write to an OutputStream instead of
> a file. Is there a reason why you want to supply an output stream instead?
>
> rb
>
>
> On 11/05/2015 05:56 PM, Selina Tech wrote:
>
>> Dear all:
>>
>>        I am wondering if I could read input stream such as Kafka and
>> convert
>> it Parquet data  and write back to output stream?  All example I found
>> convert data file to Parquet data.
>>
>>        I know this feature is not available last year. How about right
>> now?
>>
>>        I am trying to aggregate Kafka message by Samza and convert it to
>> Parquet data and then save it to S3. What is the best one to implement it?
>>
>>
>> Sincerely,
>> Selina
>>
>> reference:
>> https://github.com/Parquet/parquet-mr/issues/231
>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Cloudera, Inc.
>

Re: Reading Parquet data from input stream and write to output stream

Reply via email to