Hi Ryan, sorry to have been quite, but I was busy traveling recently :)
Just a quick update about this one: - I asked a guy from my team to work with me on the Beam ParquetIO. We're also seeing several users expected this new IO. - I will update my current PR to use Parquet SNAPSHOT and verify that OutputFile/InputFile are convenient for Beam use case. I should be able to do it tomorrow. - Then, if OutFile/InputFile are OK for ParquetIO, I will let you know and kindly ask for a Parquet release. Is it OK for you ? Thanks ! Regards JB On 02/14/2018 02:01 AM, Ryan Blue wrote: > Jean-Baptiste, > > We're planning a release that will include the new OutputFile class, which > I think you should be able to use. Is there anything you'd change to make > this work more easily with Beam? > > rb > > On Tue, Feb 13, 2018 at 12:31 PM, Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > >> Hi guys, >> >> I'm working on the Apache Beam ParquetIO: >> >> https://github.com/apache/beam/pull/1851 >> >> In Beam, thanks to FileIO, we support several filesystems (HDFS, S3, ...). >> >> If I was able to implement the Read part using AvroParquetReader >> leveraging Beam >> FileIO, I'm struggling on the writing part. >> >> I have to create ParquetSink implementing FileIO.Sink. Especially, I have >> to >> implement the open(WritableByteChannel channel) method. >> >> It's not possible to use AvroParquetWriter here as it takes a Path as >> argument >> (and from the channel, I can only have an OutputStream). >> >> As a workaround, I wanted to use org.apache.parquet.hadoop. >> ParquetFileWriter, >> providing my own implementation of org.apache.parquet.io.OutputFile. >> >> Unfortunately OutputFile (and the updated method in ParquetFileWriter) >> exists on >> Parquet master branch, but it was different on Parquet 1.9.0. >> >> So, I have two questions: >> - do you plan a Parquet 1.9.1 release including org.apache.parquet.io. >> OutputFile >> and updated org.apache.parquet.hadoop.ParquetFileWriter ? >> - using Parquet 1.9.0, do you have any advice how to use >> AvroParquetWriter/ParquetFileWriter with an OutputStream (or any object >> that I >> can get from WritableByteChannel) ? >> >> Thanks ! >> >> Regards >> JB >> -- >> Jean-Baptiste Onofré >> jbono...@apache.org >> http://blog.nanthrax.net >> Talend - http://www.talend.com >> > > > -- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com