Re: Arrow Support in Parquet Writers

Ryan Blue Wed, 08 Jul 2020 10:40:13 -0700

Yes, I think there is a reasonable path to an implementation that doesn't
require the Iceberg API. While the first step is just getting it working, I
think we could refactor and remove the dependency on iceberg-api and
iceberg-parquet. Then we would have a module that can be used independently.


That same approach would work for a write path, although maybe we would
just build it that way from the start.

On Mon, Jul 6, 2020 at 2:54 PM Wes McKinney <[email protected]> wrote:

> Is there is a path to having an Arrow<->Parquet implementation in Java
> that does not have a hard dependency on Iceberg? This is a common ask
> and it seems like it would be a clear community win that would net
> more contributors than something Iceberg-specific.
>
> On Mon, Jul 6, 2020 at 2:54 PM Ryan Blue <[email protected]>
> wrote:
> >
> > Sure, if you need an Arrow writer and want to work on it, we would be
> happy to include it in Iceberg.
> >
> > What is your use case? The main reason why we don't have one is that
> neither Presto nor Spark uses Arrow for writing.
> >
> > On Mon, Jul 6, 2020 at 9:04 AM Chen Song <[email protected]> wrote:
> >>
> >> I looked at the Iceberg Data API and found that the write is row based.
> If I want to use a columnar data file format like Parquet and efficiently
> sink columnar data in memory (like Arrow). I assume it is not currently
> implemented but OK to enhance the data API to support this?
> >>
> >> --
> >> Chen Song
> >>
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: Arrow Support in Parquet Writers

Reply via email to