Yup, that's great. I will update the PR when back from vacation. Regards JB
Le 20 avr. 2018 à 02:26, à 02:26, Eugene Kirpichov <kirpic...@google.com> a écrit: >Very cool! JB, time to update your PR? > >On Thu, Apr 19, 2018 at 9:17 AM Alexey Romanenko ><aromanenko....@gmail.com> >wrote: > >> FYI: Apache Parquet 1.10.0 was release recently. >> It contains *org.apache.parquet.io.OutputFile *and updated >> *org.apache.parquet.hadoop.ParquetFileWriter* >> >> WBR, >> Alexey >> >> >> On 14 Feb 2018, at 20:10, Jean-Baptiste Onofré <j...@nanthrax.net> >wrote: >> >> Great !! >> >> In the mean time, I started to PoC around directly parquet-common to >see >> if I >> can implement a BeamParquetReader and a BeamParquetWriter. >> >> I might also propose some PRs. >> >> I will continue tomorrow around that. >> >> Thanks again ! >> Regards >> JB >> >> On 02/14/2018 08:04 PM, Ryan Blue wrote: >> >> Additions to the builders are easy enough that we can get that in. >There's >> a PR out there that needs to be fixed: >> https://github.com/apache/parquet-mr/pull/446 >> >> I've asked the author for just the builder changes. If we don't hear >back, >> we can add another PR but I'd like to give the author some time to >update. >> >> rb >> >> On Tue, Feb 13, 2018 at 9:20 PM, Jean-Baptiste Onofré ><j...@nanthrax.net> >> wrote: >> >> Hi Ryan, >> >> Thanks for the update. >> >> Ideally for Beam, it would be great to have the AvroParquetReader and >> AvroParquetWriter using the InputFile/OutputFile interfaces. It would >> allow me >> to directly leverage Beam FileIO. >> >> Do you have a rough date for the Parquet release with that ? >> >> Thanks >> Regards >> JB >> >> On 02/14/2018 02:01 AM, Ryan Blue wrote: >> >> Jean-Baptiste, >> >> We're planning a release that will include the new OutputFile class, >> >> which I >> >> think you should be able to use. Is there anything you'd change to >make >> >> this >> >> work more easily with Beam? >> >> rb >> >> On Tue, Feb 13, 2018 at 12:31 PM, Jean-Baptiste Onofré ><j...@nanthrax.net >> <mailto:j...@nanthrax.net>> wrote: >> >> Hi guys, >> >> I'm working on the Apache Beam ParquetIO: >> >> https://github.com/apache/beam/pull/1851 >> <https://github.com/apache/beam/pull/1851> >> >> In Beam, thanks to FileIO, we support several filesystems (HDFS, >S3, >> >> ...). >> >> >> If I was able to implement the Read part using AvroParquetReader >> >> leveraging Beam >> >> FileIO, I'm struggling on the writing part. >> >> I have to create ParquetSink implementing FileIO.Sink. Especially, >I >> >> have to >> >> implement the open(WritableByteChannel channel) method. >> >> It's not possible to use AvroParquetWriter here as it takes a Path >> >> as argument >> >> (and from the channel, I can only have an OutputStream). >> >> As a workaround, I wanted to use org.apache.parquet.hadoop. >> >> ParquetFileWriter, >> >> providing my own implementation of org.apache.parquet.io >> <http://org.apache.parquet.io>.OutputFile. >> >> Unfortunately OutputFile (and the updated method in >> >> ParquetFileWriter) exists on >> >> Parquet master branch, but it was different on Parquet 1.9.0. >> >> So, I have two questions: >> - do you plan a Parquet 1.9.1 release including >> >> org.apache.parquet.io >> >> <http://org.apache.parquet.io>.OutputFile >> and updated org.apache.parquet.hadoop.ParquetFileWriter ? >> - using Parquet 1.9.0, do you have any advice how to use >> AvroParquetWriter/ParquetFileWriter with an OutputStream (or any >> >> object that I >> >> can get from WritableByteChannel) ? >> >> Thanks ! >> >> Regards >> JB >> -- >> Jean-Baptiste Onofré >> jbono...@apache.org <mailto:jbono...@apache.org> >> http://blog.nanthrax.net >> Talend - http://www.talend.com >> >> >> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix >> >> >> -- >> Jean-Baptiste Onofré >> jbono...@apache.org >> http://blog.nanthrax.net >> Talend - http://www.talend.com >> >> >> >> >> >> -- >> Jean-Baptiste Onofré >> jbono...@apache.org >> http://blog.nanthrax.net >> Talend - http://www.talend.com >> >> >>