Re: Arrow Support in Parquet Writers

Chen Song Tue, 07 Jul 2020 06:24:33 -0700

I agree that we should leverage the existing Java-Parquet implementation as
much as possible and hopefully have Iceberg depending on that impl/lib with
a very thin adaptor/wrapper layer.
Chen



On Mon, Jul 6, 2020 at 5:54 PM Wes McKinney <wesmck...@gmail.com> wrote:

> Is there is a path to having an Arrow<->Parquet implementation in Java
> that does not have a hard dependency on Iceberg? This is a common ask
> and it seems like it would be a clear community win that would net
> more contributors than something Iceberg-specific.
>
> On Mon, Jul 6, 2020 at 2:54 PM Ryan Blue <rb...@netflix.com.invalid>
> wrote:
> >
> > Sure, if you need an Arrow writer and want to work on it, we would be
> happy to include it in Iceberg.
> >
> > What is your use case? The main reason why we don't have one is that
> neither Presto nor Spark uses Arrow for writing.
> >
> > On Mon, Jul 6, 2020 at 9:04 AM Chen Song <chen.song...@gmail.com> wrote:
> >>
> >> I looked at the Iceberg Data API and found that the write is row based.
> If I want to use a columnar data file format like Parquet and efficiently
> sink columnar data in memory (like Arrow). I assume it is not currently
> implemented but OK to enhance the data API to support this?
> >>
> >> --
> >> Chen Song
> >>
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
>


-- 
Chen Song

Re: Arrow Support in Parquet Writers

Reply via email to