Hi Ryan,

sorry to have been quite, but I was busy traveling recently :)

Just a quick update about this one:

- I asked a guy from my team to work with me on the Beam ParquetIO. We're also
seeing several users expected this new IO.
- I will update my current PR to use Parquet SNAPSHOT and verify that
OutputFile/InputFile are convenient for Beam use case. I should be able to do it
tomorrow.
- Then, if OutFile/InputFile are OK for ParquetIO, I will let you know and
kindly ask for a Parquet release.

Is it OK for you ?

Thanks !
Regards
JB

On 02/14/2018 02:01 AM, Ryan Blue wrote:
> Jean-Baptiste,
> 
> We're planning a release that will include the new OutputFile class, which
> I think you should be able to use. Is there anything you'd change to make
> this work more easily with Beam?
> 
> rb
> 
> On Tue, Feb 13, 2018 at 12:31 PM, Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
> 
>> Hi guys,
>>
>> I'm working on the Apache Beam ParquetIO:
>>
>> https://github.com/apache/beam/pull/1851
>>
>> In Beam, thanks to FileIO, we support several filesystems (HDFS, S3, ...).
>>
>> If I was able to implement the Read part using AvroParquetReader
>> leveraging Beam
>>  FileIO, I'm struggling on the writing part.
>>
>> I have to create ParquetSink implementing FileIO.Sink. Especially, I have
>> to
>> implement the open(WritableByteChannel channel) method.
>>
>> It's not possible to use AvroParquetWriter here as it takes a Path as
>> argument
>> (and from the channel, I can only have an OutputStream).
>>
>> As a workaround, I wanted to use org.apache.parquet.hadoop.
>> ParquetFileWriter,
>> providing my own implementation of org.apache.parquet.io.OutputFile.
>>
>> Unfortunately OutputFile (and the updated method in ParquetFileWriter)
>> exists on
>> Parquet master branch, but it was different on Parquet 1.9.0.
>>
>> So, I have two questions:
>> - do you plan a Parquet 1.9.1 release including org.apache.parquet.io.
>> OutputFile
>> and updated org.apache.parquet.hadoop.ParquetFileWriter ?
>> - using Parquet 1.9.0, do you have any advice how to use
>> AvroParquetWriter/ParquetFileWriter with an OutputStream (or any object
>> that I
>> can get from WritableByteChannel) ?
>>
>> Thanks !
>>
>> Regards
>> JB
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
> 
> 
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Reply via email to