Thank you Uwe! On Tue, Aug 29, 2017 at 12:49 AM, Uwe L. Korn <[email protected]> wrote:
> Hello Rahul, > > the benefit of using Arrow for the row-wise-to-columnar conversion is > mainly that the API is much simpler to use than the plain parquet-cpp > API (see > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html > ) Performance-wise, there is no difference. > > Uwe > > On Tue, Aug 29, 2017, at 09:42 AM, rahul challapalli wrote: > > Thanks for your response Wes. The example at [1] uses column writers and > > column readers. So for converting row based data into columnar format, is > > there any benefit to using arrow? (I am mainly using parquet for > > compression benefits. Once the data is read, I immediately convert it > > into > > row-based data) > > > > [1] > > https://github.com/apache/parquet-cpp/blob/master/ > examples/reader-writer.cc > > > > On Mon, Aug 28, 2017 at 1:38 PM, Wes McKinney <[email protected]> > > wrote: > > > > > hi Rahul, > > > > > > This is not easy to do in the C++ API right now, because the writer > > > must be initialized with a static schema. Theoretically you could > > > expand the schema while you are writing the first row group, but it > > > would be difficult to make this possible. > > > > > > The writer API is also designed for writing one column at a time > > > instead of one row at a time, so one option for you is to create an > > > auxiliary data structure (this is not provided by the Parquet C++ > > > library) to convert records into columnar form, then write to the > > > Parquet writer API once you have appended all your records and know > > > the final schema. > > > > > > - Wes > > > > > > On Fri, Aug 25, 2017 at 1:34 PM, rahul challapalli > > > <[email protected]> wrote: > > > > Hi, > > > > > > > > I am using the parquet writer (cpp) and I want to see if I can add a > new > > > > column after writing out a few records, but before the close method > is > > > > called. An example would be helpful if this is feasible. > > > > > > > > Rahul > > > >
