Thank you Deepak, that's very helpful -- @Grant are you using the master branch / 1.0.0-rc5 or something older?
On Mon, Mar 13, 2017 at 6:31 PM, Deepak Majeti <[email protected]> wrote: > I ran the same program on master and I see the following error. > "Parquet write error: More rows were written in the column chunk than > expected" > > This bug should throw at https://github.com/apache/parquet-cpp/blob/ > 5e59bc5c6491a7505585c08fd62aa52f9a6c9afc/src/parquet/column/writer.cc#L337 > > However, I see the same problem as Grant posted for commit > 1c4492a111b00ef48663982171e3face1ca2192d > The core dump is because of two parquet exceptions being handled. This is > fixed in commit 076011b08498317d213cdbc0a64128a5dd8da4c0. > > First exception at https://github.com/apache/parquet-cpp/blob/ > 5e59bc5c6491a7505585c08fd62aa52f9a6c9afc/src/parquet/column/writer.cc#L337 > Now the Parquet Writer destructor tries to write close the file and > encounters https://github.com/apache/parquet-cpp/blob/5e59bc5c6491a7505 > 585c08fd62aa52f9a6c9afc/src/parquet/column/writer.cc#L159 > > > On Mon, Mar 13, 2017 at 6:03 PM, Wes McKinney <[email protected]> wrote: > >> See https://issues.apache.org/jira/browse/PARQUET-914 >> >> On Mon, Mar 13, 2017 at 6:01 PM, Wes McKinney <[email protected]> wrote: >> > hi Grant, >> > >> > the exception is coming from >> > >> > if (num_rows_ != expected_rows_) { >> > throw ParquetException( >> > "Less than the number of expected rows written in" >> > " the current column chunk"); >> > } >> > >> > https://github.com/apache/parquet-cpp/blob/5e59bc5c6491a7505 >> 585c08fd62aa52f9a6c9afc/src/parquet/column/writer.cc#L159 >> > >> > This is double buggy -- the size of the row group and the number of >> > values written is different, but you're writing *more* values than the >> > row group contains. I'm opening a JIRA to throw a better exception >> > >> > See the logic for forming num_rows_ for columns with >> max_repetition_level > 0: >> > >> > https://github.com/apache/parquet-cpp/blob/master/src/parque >> t/column/writer.cc#L323 >> > >> > num_rows_ is incremented each time a new record begins >> > (repetition_level 0). You can write as many repeated values as you >> > like in a row group as long as the repetition levels encode the >> > corresponding number of records -- if you run into a case where this >> > happens, can you open a JIRA so we can add a test case and fix? >> > >> > Thanks >> > Wes >> > >> > On Mon, Mar 13, 2017 at 12:14 PM, Grant Monroe <[email protected]> wrote: >> >> I should also mention that I built parquet-cpp from github, commit >> >> 1c4492a111b00ef48663982171e3face1ca2192d. >> >> >> >> On Mon, Mar 13, 2017 at 12:10 PM, Grant Monroe <[email protected]> wrote: >> >> >> >>> I'm struggling to get a simple parquet writer working using the c++ >> >>> library. The source is here: >> >>> >> >>> https://gist.github.com/tnarg/8878a38d4a22104328c4d289319f9ac1 >> >>> >> >>> and I'm compiling like so >> >>> >> >>> g++ --std=c++11 -o writer writer.cc -lparquet -larrow -larrow_io >> >>> >> >>> When I run this program, I get the following error >> >>> >> >>> gmonroe@foo:~$ ./writer >> >>> terminate called after throwing an instance of >> 'parquet::ParquetException' >> >>> what(): Less than the number of expected rows written in the current >> >>> column chunk >> >>> Aborted (core dumped) >> >>> >> >>> If I change NUM_ROWS_PER_ROW_GROUP=3, this writer succeeds. This >> suggests >> >>> that every column needs to contain N values such that N >> >>> % NUM_ROWS_PER_ROW_GROUP = 0 and N > 0. For an arbitrarily complex set >> of >> >>> values the only reasonable choice for NUM_ROWS_PER_ROW_GROUP is 1. >> >>> >> >>> Is this a bug in the c++ library or am I missing something in the API? >> >>> >> >>> Regards, >> >>> Grant Monroe >> >>> >> > > > > -- > regards, > Deepak Majeti
