I ran the same program on master and I see the following error. "Parquet write error: More rows were written in the column chunk than expected"
This bug should throw at https://github.com/apache/parquet-cpp/blob/ 5e59bc5c6491a7505585c08fd62aa52f9a6c9afc/src/parquet/column/writer.cc#L337 However, I see the same problem as Grant posted for commit 1c4492a111b00ef48663982171e3face1ca2192d The core dump is because of two parquet exceptions being handled. This is fixed in commit 076011b08498317d213cdbc0a64128a5dd8da4c0. First exception at https://github.com/apache/parquet-cpp/blob/ 5e59bc5c6491a7505585c08fd62aa52f9a6c9afc/src/parquet/column/writer.cc#L337 Now the Parquet Writer destructor tries to write close the file and encounters https://github.com/apache/parquet-cpp/blob/5e59bc5c6491a7505 585c08fd62aa52f9a6c9afc/src/parquet/column/writer.cc#L159 On Mon, Mar 13, 2017 at 6:03 PM, Wes McKinney <[email protected]> wrote: > See https://issues.apache.org/jira/browse/PARQUET-914 > > On Mon, Mar 13, 2017 at 6:01 PM, Wes McKinney <[email protected]> wrote: > > hi Grant, > > > > the exception is coming from > > > > if (num_rows_ != expected_rows_) { > > throw ParquetException( > > "Less than the number of expected rows written in" > > " the current column chunk"); > > } > > > > https://github.com/apache/parquet-cpp/blob/5e59bc5c6491a7505 > 585c08fd62aa52f9a6c9afc/src/parquet/column/writer.cc#L159 > > > > This is double buggy -- the size of the row group and the number of > > values written is different, but you're writing *more* values than the > > row group contains. I'm opening a JIRA to throw a better exception > > > > See the logic for forming num_rows_ for columns with > max_repetition_level > 0: > > > > https://github.com/apache/parquet-cpp/blob/master/src/parque > t/column/writer.cc#L323 > > > > num_rows_ is incremented each time a new record begins > > (repetition_level 0). You can write as many repeated values as you > > like in a row group as long as the repetition levels encode the > > corresponding number of records -- if you run into a case where this > > happens, can you open a JIRA so we can add a test case and fix? > > > > Thanks > > Wes > > > > On Mon, Mar 13, 2017 at 12:14 PM, Grant Monroe <[email protected]> wrote: > >> I should also mention that I built parquet-cpp from github, commit > >> 1c4492a111b00ef48663982171e3face1ca2192d. > >> > >> On Mon, Mar 13, 2017 at 12:10 PM, Grant Monroe <[email protected]> wrote: > >> > >>> I'm struggling to get a simple parquet writer working using the c++ > >>> library. The source is here: > >>> > >>> https://gist.github.com/tnarg/8878a38d4a22104328c4d289319f9ac1 > >>> > >>> and I'm compiling like so > >>> > >>> g++ --std=c++11 -o writer writer.cc -lparquet -larrow -larrow_io > >>> > >>> When I run this program, I get the following error > >>> > >>> gmonroe@foo:~$ ./writer > >>> terminate called after throwing an instance of > 'parquet::ParquetException' > >>> what(): Less than the number of expected rows written in the current > >>> column chunk > >>> Aborted (core dumped) > >>> > >>> If I change NUM_ROWS_PER_ROW_GROUP=3, this writer succeeds. This > suggests > >>> that every column needs to contain N values such that N > >>> % NUM_ROWS_PER_ROW_GROUP = 0 and N > 0. For an arbitrarily complex set > of > >>> values the only reasonable choice for NUM_ROWS_PER_ROW_GROUP is 1. > >>> > >>> Is this a bug in the c++ library or am I missing something in the API? > >>> > >>> Regards, > >>> Grant Monroe > >>> > -- regards, Deepak Majeti
