Very cool! JB, time to update your PR?

On Thu, Apr 19, 2018 at 9:17 AM Alexey Romanenko <aromanenko....@gmail.com>
wrote:

> FYI: Apache Parquet 1.10.0 was release recently.
> It contains *org.apache.parquet.io.OutputFile *and updated
> *org.apache.parquet.hadoop.ParquetFileWriter*
>
> WBR,
> Alexey
>
>
> On 14 Feb 2018, at 20:10, Jean-Baptiste Onofré <j...@nanthrax.net> wrote:
>
> Great !!
>
> In the mean time, I started to PoC around directly parquet-common to see
> if I
> can implement a BeamParquetReader and a BeamParquetWriter.
>
> I might also propose some PRs.
>
> I will continue tomorrow around that.
>
> Thanks again !
> Regards
> JB
>
> On 02/14/2018 08:04 PM, Ryan Blue wrote:
>
> Additions to the builders are easy enough that we can get that in. There's
> a PR out there that needs to be fixed:
> https://github.com/apache/parquet-mr/pull/446
>
> I've asked the author for just the builder changes. If we don't hear back,
> we can add another PR but I'd like to give the author some time to update.
>
> rb
>
> On Tue, Feb 13, 2018 at 9:20 PM, Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
>
> Hi  Ryan,
>
> Thanks for the update.
>
> Ideally for Beam, it would be great to have the AvroParquetReader and
> AvroParquetWriter using the InputFile/OutputFile interfaces. It would
> allow me
> to directly leverage Beam FileIO.
>
> Do you have a rough date for the Parquet release with that ?
>
> Thanks
> Regards
> JB
>
> On 02/14/2018 02:01 AM, Ryan Blue wrote:
>
> Jean-Baptiste,
>
> We're planning a release that will include the new OutputFile class,
>
> which I
>
> think you should be able to use. Is there anything you'd change to make
>
> this
>
> work more easily with Beam?
>
> rb
>
> On Tue, Feb 13, 2018 at 12:31 PM, Jean-Baptiste Onofré <j...@nanthrax.net
> <mailto:j...@nanthrax.net>> wrote:
>
>    Hi guys,
>
>    I'm working on the Apache Beam ParquetIO:
>
>    https://github.com/apache/beam/pull/1851
>    <https://github.com/apache/beam/pull/1851>
>
>    In Beam, thanks to FileIO, we support several filesystems (HDFS, S3,
>
> ...).
>
>
>    If I was able to implement the Read part using AvroParquetReader
>
> leveraging Beam
>
>     FileIO, I'm struggling on the writing part.
>
>    I have to create ParquetSink implementing FileIO.Sink. Especially, I
>
> have to
>
>    implement the open(WritableByteChannel channel) method.
>
>    It's not possible to use AvroParquetWriter here as it takes a Path
>
> as argument
>
>    (and from the channel, I can only have an OutputStream).
>
>    As a workaround, I wanted to use org.apache.parquet.hadoop.
>
> ParquetFileWriter,
>
>    providing my own implementation of org.apache.parquet.io
>    <http://org.apache.parquet.io>.OutputFile.
>
>    Unfortunately OutputFile (and the updated method in
>
> ParquetFileWriter) exists on
>
>    Parquet master branch, but it was different on Parquet 1.9.0.
>
>    So, I have two questions:
>    - do you plan a Parquet 1.9.1 release including
>
> org.apache.parquet.io
>
>    <http://org.apache.parquet.io>.OutputFile
>    and updated org.apache.parquet.hadoop.ParquetFileWriter ?
>    - using Parquet 1.9.0, do you have any advice how to use
>    AvroParquetWriter/ParquetFileWriter with an OutputStream (or any
>
> object that I
>
>    can get from WritableByteChannel) ?
>
>    Thanks !
>
>    Regards
>    JB
>    --
>    Jean-Baptiste Onofré
>    jbono...@apache.org <mailto:jbono...@apache.org>
>    http://blog.nanthrax.net
>    Talend - http://www.talend.com
>
>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
>
>
>
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
>
>

Reply via email to