Yes, I think there is a reasonable path to an implementation that doesn't require the Iceberg API. While the first step is just getting it working, I think we could refactor and remove the dependency on iceberg-api and iceberg-parquet. Then we would have a module that can be used independently.
That same approach would work for a write path, although maybe we would just build it that way from the start. On Mon, Jul 6, 2020 at 2:54 PM Wes McKinney <wesmck...@gmail.com> wrote: > Is there is a path to having an Arrow<->Parquet implementation in Java > that does not have a hard dependency on Iceberg? This is a common ask > and it seems like it would be a clear community win that would net > more contributors than something Iceberg-specific. > > On Mon, Jul 6, 2020 at 2:54 PM Ryan Blue <rb...@netflix.com.invalid> > wrote: > > > > Sure, if you need an Arrow writer and want to work on it, we would be > happy to include it in Iceberg. > > > > What is your use case? The main reason why we don't have one is that > neither Presto nor Spark uses Arrow for writing. > > > > On Mon, Jul 6, 2020 at 9:04 AM Chen Song <chen.song...@gmail.com> wrote: > >> > >> I looked at the Iceberg Data API and found that the write is row based. > If I want to use a columnar data file format like Parquet and efficiently > sink columnar data in memory (like Arrow). I assume it is not currently > implemented but OK to enhance the data API to support this? > >> > >> -- > >> Chen Song > >> > > > > > > -- > > Ryan Blue > > Software Engineer > > Netflix > -- Ryan Blue Software Engineer Netflix