I agree that we should leverage the existing Java-Parquet implementation as much as possible and hopefully have Iceberg depending on that impl/lib with a very thin adaptor/wrapper layer. Chen
On Mon, Jul 6, 2020 at 5:54 PM Wes McKinney <wesmck...@gmail.com> wrote: > Is there is a path to having an Arrow<->Parquet implementation in Java > that does not have a hard dependency on Iceberg? This is a common ask > and it seems like it would be a clear community win that would net > more contributors than something Iceberg-specific. > > On Mon, Jul 6, 2020 at 2:54 PM Ryan Blue <rb...@netflix.com.invalid> > wrote: > > > > Sure, if you need an Arrow writer and want to work on it, we would be > happy to include it in Iceberg. > > > > What is your use case? The main reason why we don't have one is that > neither Presto nor Spark uses Arrow for writing. > > > > On Mon, Jul 6, 2020 at 9:04 AM Chen Song <chen.song...@gmail.com> wrote: > >> > >> I looked at the Iceberg Data API and found that the write is row based. > If I want to use a columnar data file format like Parquet and efficiently > sink columnar data in memory (like Arrow). I assume it is not currently > implemented but OK to enhance the data API to support this? > >> > >> -- > >> Chen Song > >> > > > > > > -- > > Ryan Blue > > Software Engineer > > Netflix > -- Chen Song