Very cool! JB, time to update your PR? On Thu, Apr 19, 2018 at 9:17 AM Alexey Romanenko <aromanenko....@gmail.com> wrote:
> FYI: Apache Parquet 1.10.0 was release recently. > It contains *org.apache.parquet.io.OutputFile *and updated > *org.apache.parquet.hadoop.ParquetFileWriter* > > WBR, > Alexey > > > On 14 Feb 2018, at 20:10, Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > > Great !! > > In the mean time, I started to PoC around directly parquet-common to see > if I > can implement a BeamParquetReader and a BeamParquetWriter. > > I might also propose some PRs. > > I will continue tomorrow around that. > > Thanks again ! > Regards > JB > > On 02/14/2018 08:04 PM, Ryan Blue wrote: > > Additions to the builders are easy enough that we can get that in. There's > a PR out there that needs to be fixed: > https://github.com/apache/parquet-mr/pull/446 > > I've asked the author for just the builder changes. If we don't hear back, > we can add another PR but I'd like to give the author some time to update. > > rb > > On Tue, Feb 13, 2018 at 9:20 PM, Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > > Hi Ryan, > > Thanks for the update. > > Ideally for Beam, it would be great to have the AvroParquetReader and > AvroParquetWriter using the InputFile/OutputFile interfaces. It would > allow me > to directly leverage Beam FileIO. > > Do you have a rough date for the Parquet release with that ? > > Thanks > Regards > JB > > On 02/14/2018 02:01 AM, Ryan Blue wrote: > > Jean-Baptiste, > > We're planning a release that will include the new OutputFile class, > > which I > > think you should be able to use. Is there anything you'd change to make > > this > > work more easily with Beam? > > rb > > On Tue, Feb 13, 2018 at 12:31 PM, Jean-Baptiste Onofré <j...@nanthrax.net > <mailto:j...@nanthrax.net>> wrote: > > Hi guys, > > I'm working on the Apache Beam ParquetIO: > > https://github.com/apache/beam/pull/1851 > <https://github.com/apache/beam/pull/1851> > > In Beam, thanks to FileIO, we support several filesystems (HDFS, S3, > > ...). > > > If I was able to implement the Read part using AvroParquetReader > > leveraging Beam > > FileIO, I'm struggling on the writing part. > > I have to create ParquetSink implementing FileIO.Sink. Especially, I > > have to > > implement the open(WritableByteChannel channel) method. > > It's not possible to use AvroParquetWriter here as it takes a Path > > as argument > > (and from the channel, I can only have an OutputStream). > > As a workaround, I wanted to use org.apache.parquet.hadoop. > > ParquetFileWriter, > > providing my own implementation of org.apache.parquet.io > <http://org.apache.parquet.io>.OutputFile. > > Unfortunately OutputFile (and the updated method in > > ParquetFileWriter) exists on > > Parquet master branch, but it was different on Parquet 1.9.0. > > So, I have two questions: > - do you plan a Parquet 1.9.1 release including > > org.apache.parquet.io > > <http://org.apache.parquet.io>.OutputFile > and updated org.apache.parquet.hadoop.ParquetFileWriter ? > - using Parquet 1.9.0, do you have any advice how to use > AvroParquetWriter/ParquetFileWriter with an OutputStream (or any > > object that I > > can get from WritableByteChannel) ? > > Thanks ! > > Regards > JB > -- > Jean-Baptiste Onofré > jbono...@apache.org <mailto:jbono...@apache.org> > http://blog.nanthrax.net > Talend - http://www.talend.com > > > > > -- > Ryan Blue > Software Engineer > Netflix > > > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com > > > > > > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com > > >