Jean-Baptiste,

We're planning a release that will include the new OutputFile class, which
I think you should be able to use. Is there anything you'd change to make
this work more easily with Beam?

rb

On Tue, Feb 13, 2018 at 12:31 PM, Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

> Hi guys,
>
> I'm working on the Apache Beam ParquetIO:
>
> https://github.com/apache/beam/pull/1851
>
> In Beam, thanks to FileIO, we support several filesystems (HDFS, S3, ...).
>
> If I was able to implement the Read part using AvroParquetReader
> leveraging Beam
>  FileIO, I'm struggling on the writing part.
>
> I have to create ParquetSink implementing FileIO.Sink. Especially, I have
> to
> implement the open(WritableByteChannel channel) method.
>
> It's not possible to use AvroParquetWriter here as it takes a Path as
> argument
> (and from the channel, I can only have an OutputStream).
>
> As a workaround, I wanted to use org.apache.parquet.hadoop.
> ParquetFileWriter,
> providing my own implementation of org.apache.parquet.io.OutputFile.
>
> Unfortunately OutputFile (and the updated method in ParquetFileWriter)
> exists on
> Parquet master branch, but it was different on Parquet 1.9.0.
>
> So, I have two questions:
> - do you plan a Parquet 1.9.1 release including org.apache.parquet.io.
> OutputFile
> and updated org.apache.parquet.hadoop.ParquetFileWriter ?
> - using Parquet 1.9.0, do you have any advice how to use
> AvroParquetWriter/ParquetFileWriter with an OutputStream (or any object
> that I
> can get from WritableByteChannel) ?
>
> Thanks !
>
> Regards
> JB
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>



-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to