Hi Grant, Can you use the master branch or the 1.0.0-rc5 release and try again? You will just get the error and not the core dump.
Just to clarify, the NUM_ROWS_PER_ROW_GROUP value is NOT an upper bound to the total number of rows in a RowGroup. The number of rows being added must be exactly equal to the NUM_ROWS_PER_ROW_GROUP value. On Thu, Mar 16, 2017 at 12:41 AM, Grant Monroe <[email protected]> wrote: > Yes, I realized after posting that my example was faulty because I'm not > creating a new row group every 3 rows. But consider an even simpler > example: > > https://gist.github.com/tnarg/caa2f098091760255e3c60da2cf17438 > > I want to write a single json object: > > { > "foo": false, > "bars": [1,2,3] > } > > I would create two columns in my schema, I choose a row group size of 10, > and write 1 row to the "foo" column and 3 rows to the "bars" column. I get > an error because I didn't write exactly 10 rows to each column. This seems > broken. > > gmonroe@blah:~$ ./writer > terminate called after throwing an instance of 'parquet::ParquetException' > what(): Less than the number of expected rows written in the current > column chunk > Aborted (core dumped) > > > On 2017-03-13 18:01 (-0400), Wes McKinney <[email protected]> wrote: > > hi Grant, > > > > the exception is coming from > > > > if (num_rows_ != expected_rows_) { > > throw ParquetException( > > "Less than the number of expected rows written in" > > " the current column chunk"); > > } > > > > > https://github.com/apache/parquet-cpp/blob/5e59bc5c6491a7505585c08fd62aa5 > 2f9a6c9afc/src/parquet/column/writer.cc#L159 > > > > This is double buggy -- the size of the row group and the number of > > values written is different, but you're writing *more* values than the > > row group contains. I'm opening a JIRA to throw a better exception > > > > See the logic for forming num_rows_ for columns with max_repetition_level > > 0: > > > > > https://github.com/apache/parquet-cpp/blob/master/src/ > parquet/column/writer.cc#L323 > > > > num_rows_ is incremented each time a new record begins > > (repetition_level 0). You can write as many repeated values as you > > like in a row group as long as the repetition levels encode the > > corresponding number of records -- if you run into a case where this > > happens, can you open a JIRA so we can add a test case and fix? > > > > Thanks > > Wes > > > > On Mon, Mar 13, 2017 at 12:14 PM, Grant Monroe <[email protected]> wrote: > > > I should also mention that I built parquet-cpp from github, commit > > > 1c4492a111b00ef48663982171e3face1ca2192d. > > > > > > On Mon, Mar 13, 2017 at 12:10 PM, Grant Monroe <[email protected]> > wrote: > > > > > >> I'm struggling to get a simple parquet writer working using the c > > >> library. The source is here: > > >> > > >> https://gist.github.com/tnarg/8878a38d4a22104328c4d289319f9ac1 > > >> > > >> and I'm compiling like so > > >> > > >> g --std=c 11 -o writer writer.cc -lparquet -larrow -larrow_io > > >> > > >> When I run this program, I get the following error > > >> > > >> gmonroe@foo:~$ ./writer > > >> terminate called after throwing an instance of > 'parquet::ParquetException' > > >> what(): Less than the number of expected rows written in the > current > > >> column chunk > > >> Aborted (core dumped) > > >> > > >> If I change NUM_ROWS_PER_ROW_GROUP=3, this writer succeeds. This > suggests > > >> that every column needs to contain N values such that N > > >> % NUM_ROWS_PER_ROW_GROUP = 0 and N > 0. For an arbitrarily complex set > of > > >> values the only reasonable choice for NUM_ROWS_PER_ROW_GROUP is 1. > > >> > > >> Is this a bug in the c library or am I missing something in the API? > > >> > > >> Regards, > > >> Grant Monroe > > >> > > > -- regards, Deepak Majeti
