Right now, the implementation in Iceberg is focused on building out a read
path. The write path still uses a row-oriented interface. If you have Arrow
data and you need to write it to Parquet, you should be able to use a
combination of Spark’s Arrow support and Iceberg. Spark has a ColumnarBatch
implementation that can read Arrow as InternalRow, then Iceberg has
utilities to write InternalRow to Parquet. Here’s the Iceberg side
<https://github.com/apache/incubator-iceberg/blob/master/spark/src/test/java/org/apache/iceberg/spark/data/TestSparkParquetWriter.java#L79-L84>
:

    try (FileAppender<InternalRow> writer =
Parquet.write(Files.localOutput(testFile))
        .schema(schema)
        .createWriterFunc(msgType ->
SparkParquetWriters.buildWriter(schema, msgType))
        .build()) {
      writer.addAll(records);
    }


On Mon, Jan 6, 2020 at 10:17 AM Wes McKinney <[email protected]> wrote:

> There may be some work around this happening in Apache Iceberg
> (incubating). I don't know of other independent fully open source
> Arrow<->Parquet implementations in Java
>
> On Mon, Jan 6, 2020 at 11:54 AM saurabh pratap singh
> <[email protected]> wrote:
> >
> > forgot to mention using JAVA
> >
> > On Thu, Jan 2, 2020 at 3:42 PM saurabh pratap singh <
> [email protected]>
> > wrote:
> >
> > > Hi
> > >
> > > I wanted to know whether there is a support/library available for
> > > writing arrow tables as parquet files.
> > > Meanwhile I tried writing my own converter where I am using
> > > SchemaConverter provided by arrow (to convert arrow schema to parquet )
> > > Then Converting Arrow table to Group(ParquetExample Group
> reader/writer as
> > > a reference from parquet-mr) and dump as parquet .This works for
> > > primitive types without any issues but for nested types it will be
> little
> > > complicated so wanted to know if anything like this already exists or
> > > planned in near future .
> > >
> > > Please let me know if some other information is required from my side.
> > >
> > > Thanks in advance.
> > >
> > >
>


-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to