Hello Rahul, the benefit of using Arrow for the row-wise-to-columnar conversion is mainly that the API is much simpler to use than the plain parquet-cpp API (see https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html ) Performance-wise, there is no difference.
Uwe On Tue, Aug 29, 2017, at 09:42 AM, rahul challapalli wrote: > Thanks for your response Wes. The example at [1] uses column writers and > column readers. So for converting row based data into columnar format, is > there any benefit to using arrow? (I am mainly using parquet for > compression benefits. Once the data is read, I immediately convert it > into > row-based data) > > [1] > https://github.com/apache/parquet-cpp/blob/master/examples/reader-writer.cc > > On Mon, Aug 28, 2017 at 1:38 PM, Wes McKinney <[email protected]> > wrote: > > > hi Rahul, > > > > This is not easy to do in the C++ API right now, because the writer > > must be initialized with a static schema. Theoretically you could > > expand the schema while you are writing the first row group, but it > > would be difficult to make this possible. > > > > The writer API is also designed for writing one column at a time > > instead of one row at a time, so one option for you is to create an > > auxiliary data structure (this is not provided by the Parquet C++ > > library) to convert records into columnar form, then write to the > > Parquet writer API once you have appended all your records and know > > the final schema. > > > > - Wes > > > > On Fri, Aug 25, 2017 at 1:34 PM, rahul challapalli > > <[email protected]> wrote: > > > Hi, > > > > > > I am using the parquet writer (cpp) and I want to see if I can add a new > > > column after writing out a few records, but before the close method is > > > called. An example would be helpful if this is feasible. > > > > > > Rahul > >
