Hello Rahul,

the benefit of using Arrow for the row-wise-to-columnar conversion is
mainly that the API is much simpler to use than the plain parquet-cpp
API (see
https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html
) Performance-wise, there is no difference. 

Uwe

On Tue, Aug 29, 2017, at 09:42 AM, rahul challapalli wrote:
> Thanks for your response Wes. The example at [1] uses column writers and
> column readers. So for converting row based data into columnar format, is
> there any benefit to using arrow? (I am mainly using parquet for
> compression benefits. Once the data is read, I immediately convert it
> into
> row-based data)
> 
> [1]
> https://github.com/apache/parquet-cpp/blob/master/examples/reader-writer.cc
> 
> On Mon, Aug 28, 2017 at 1:38 PM, Wes McKinney <[email protected]>
> wrote:
> 
> > hi Rahul,
> >
> > This is not easy to do in the C++ API right now, because the writer
> > must be initialized with a static schema. Theoretically you could
> > expand the schema while you are writing the first row group, but it
> > would be difficult to make this possible.
> >
> > The writer API is also designed for writing one column at a time
> > instead of one row at a time, so one option for you is to create an
> > auxiliary data structure (this is not provided by the Parquet C++
> > library) to convert records into columnar form, then write to the
> > Parquet writer API once you have appended all your records and know
> > the final schema.
> >
> > - Wes
> >
> > On Fri, Aug 25, 2017 at 1:34 PM, rahul challapalli
> > <[email protected]> wrote:
> > > Hi,
> > >
> > > I am using the parquet writer (cpp) and I want to see if I can add a new
> > > column after writing out a few records, but before the close method is
> > > called. An example would be helpful if this is feasible.
> > >
> > > Rahul
> >

Reply via email to