Hi guys, I'm working on the Apache Beam ParquetIO:
https://github.com/apache/beam/pull/1851 In Beam, thanks to FileIO, we support several filesystems (HDFS, S3, ...). If I was able to implement the Read part using AvroParquetReader leveraging Beam FileIO, I'm struggling on the writing part. I have to create ParquetSink implementing FileIO.Sink. Especially, I have to implement the open(WritableByteChannel channel) method. It's not possible to use AvroParquetWriter here as it takes a Path as argument (and from the channel, I can only have an OutputStream). As a workaround, I wanted to use org.apache.parquet.hadoop.ParquetFileWriter, providing my own implementation of org.apache.parquet.io.OutputFile. Unfortunately OutputFile (and the updated method in ParquetFileWriter) exists on Parquet master branch, but it was different on Parquet 1.9.0. So, I have two questions: - do you plan a Parquet 1.9.1 release including org.apache.parquet.io.OutputFile and updated org.apache.parquet.hadoop.ParquetFileWriter ? - using Parquet 1.9.0, do you have any advice how to use AvroParquetWriter/ParquetFileWriter with an OutputStream (or any object that I can get from WritableByteChannel) ? Thanks ! Regards JB -- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com