Additions to the builders are easy enough that we can get that in. There's
a PR out there that needs to be fixed:
https://github.com/apache/parquet-mr/pull/446

I've asked the author for just the builder changes. If we don't hear back,
we can add another PR but I'd like to give the author some time to update.

rb

On Tue, Feb 13, 2018 at 9:20 PM, Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

> Hi  Ryan,
>
> Thanks for the update.
>
> Ideally for Beam, it would be great to have the AvroParquetReader and
> AvroParquetWriter using the InputFile/OutputFile interfaces. It would
> allow me
> to directly leverage Beam FileIO.
>
> Do you have a rough date for the Parquet release with that ?
>
> Thanks
> Regards
> JB
>
> On 02/14/2018 02:01 AM, Ryan Blue wrote:
> > Jean-Baptiste,
> >
> > We're planning a release that will include the new OutputFile class,
> which I
> > think you should be able to use. Is there anything you'd change to make
> this
> > work more easily with Beam?
> >
> > rb
> >
> > On Tue, Feb 13, 2018 at 12:31 PM, Jean-Baptiste Onofré <j...@nanthrax.net
> > <mailto:j...@nanthrax.net>> wrote:
> >
> >     Hi guys,
> >
> >     I'm working on the Apache Beam ParquetIO:
> >
> >     https://github.com/apache/beam/pull/1851
> >     <https://github.com/apache/beam/pull/1851>
> >
> >     In Beam, thanks to FileIO, we support several filesystems (HDFS, S3,
> ...).
> >
> >     If I was able to implement the Read part using AvroParquetReader
> leveraging Beam
> >      FileIO, I'm struggling on the writing part.
> >
> >     I have to create ParquetSink implementing FileIO.Sink. Especially, I
> have to
> >     implement the open(WritableByteChannel channel) method.
> >
> >     It's not possible to use AvroParquetWriter here as it takes a Path
> as argument
> >     (and from the channel, I can only have an OutputStream).
> >
> >     As a workaround, I wanted to use org.apache.parquet.hadoop.
> ParquetFileWriter,
> >     providing my own implementation of org.apache.parquet.io
> >     <http://org.apache.parquet.io>.OutputFile.
> >
> >     Unfortunately OutputFile (and the updated method in
> ParquetFileWriter) exists on
> >     Parquet master branch, but it was different on Parquet 1.9.0.
> >
> >     So, I have two questions:
> >     - do you plan a Parquet 1.9.1 release including
> org.apache.parquet.io
> >     <http://org.apache.parquet.io>.OutputFile
> >     and updated org.apache.parquet.hadoop.ParquetFileWriter ?
> >     - using Parquet 1.9.0, do you have any advice how to use
> >     AvroParquetWriter/ParquetFileWriter with an OutputStream (or any
> object that I
> >     can get from WritableByteChannel) ?
> >
> >     Thanks !
> >
> >     Regards
> >     JB
> >     --
> >     Jean-Baptiste Onofré
> >     jbono...@apache.org <mailto:jbono...@apache.org>
> >     http://blog.nanthrax.net
> >     Talend - http://www.talend.com
> >
> >
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>



-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to