Hi  Ryan,

Thanks for the update.

Ideally for Beam, it would be great to have the AvroParquetReader and
AvroParquetWriter using the InputFile/OutputFile interfaces. It would allow me
to directly leverage Beam FileIO.

Do you have a rough date for the Parquet release with that ?

Thanks
Regards
JB

On 02/14/2018 02:01 AM, Ryan Blue wrote:
> Jean-Baptiste,
> 
> We're planning a release that will include the new OutputFile class, which I
> think you should be able to use. Is there anything you'd change to make this
> work more easily with Beam?
> 
> rb
> 
> On Tue, Feb 13, 2018 at 12:31 PM, Jean-Baptiste Onofré <j...@nanthrax.net
> <mailto:j...@nanthrax.net>> wrote:
> 
>     Hi guys,
> 
>     I'm working on the Apache Beam ParquetIO:
> 
>     https://github.com/apache/beam/pull/1851
>     <https://github.com/apache/beam/pull/1851>
> 
>     In Beam, thanks to FileIO, we support several filesystems (HDFS, S3, ...).
> 
>     If I was able to implement the Read part using AvroParquetReader 
> leveraging Beam
>      FileIO, I'm struggling on the writing part.
> 
>     I have to create ParquetSink implementing FileIO.Sink. Especially, I have 
> to
>     implement the open(WritableByteChannel channel) method.
> 
>     It's not possible to use AvroParquetWriter here as it takes a Path as 
> argument
>     (and from the channel, I can only have an OutputStream).
> 
>     As a workaround, I wanted to use 
> org.apache.parquet.hadoop.ParquetFileWriter,
>     providing my own implementation of org.apache.parquet.io
>     <http://org.apache.parquet.io>.OutputFile.
> 
>     Unfortunately OutputFile (and the updated method in ParquetFileWriter) 
> exists on
>     Parquet master branch, but it was different on Parquet 1.9.0.
> 
>     So, I have two questions:
>     - do you plan a Parquet 1.9.1 release including org.apache.parquet.io
>     <http://org.apache.parquet.io>.OutputFile
>     and updated org.apache.parquet.hadoop.ParquetFileWriter ?
>     - using Parquet 1.9.0, do you have any advice how to use
>     AvroParquetWriter/ParquetFileWriter with an OutputStream (or any object 
> that I
>     can get from WritableByteChannel) ?
> 
>     Thanks !
> 
>     Regards
>     JB
>     --
>     Jean-Baptiste Onofré
>     jbono...@apache.org <mailto:jbono...@apache.org>
>     http://blog.nanthrax.net
>     Talend - http://www.talend.com
> 
> 
> 
> 
> -- 
> Ryan Blue
> Software Engineer
> Netflix

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Reply via email to