Thank you Deepak, that's very helpful -- @Grant are you using the
master branch / 1.0.0-rc5 or something older?

On Mon, Mar 13, 2017 at 6:31 PM, Deepak Majeti <[email protected]> wrote:
> I ran the same program on master and I see the following error.
> "Parquet write error: More rows were written in the column chunk than
> expected"
>
> This bug should throw at  https://github.com/apache/parquet-cpp/blob/
> 5e59bc5c6491a7505585c08fd62aa52f9a6c9afc/src/parquet/column/writer.cc#L337
>
> However, I see the same problem as Grant posted for commit
> 1c4492a111b00ef48663982171e3face1ca2192d
> The core dump is because of two parquet exceptions being handled. This is
> fixed in commit 076011b08498317d213cdbc0a64128a5dd8da4c0.
>
> First exception at https://github.com/apache/parquet-cpp/blob/
> 5e59bc5c6491a7505585c08fd62aa52f9a6c9afc/src/parquet/column/writer.cc#L337
> Now the Parquet Writer destructor tries to write close the file and
> encounters https://github.com/apache/parquet-cpp/blob/5e59bc5c6491a7505
> 585c08fd62aa52f9a6c9afc/src/parquet/column/writer.cc#L159
>
>
> On Mon, Mar 13, 2017 at 6:03 PM, Wes McKinney <[email protected]> wrote:
>
>> See https://issues.apache.org/jira/browse/PARQUET-914
>>
>> On Mon, Mar 13, 2017 at 6:01 PM, Wes McKinney <[email protected]> wrote:
>> > hi Grant,
>> >
>> > the exception is coming from
>> >
>> >   if (num_rows_ != expected_rows_) {
>> >     throw ParquetException(
>> >         "Less than the number of expected rows written in"
>> >         " the current column chunk");
>> >   }
>> >
>> > https://github.com/apache/parquet-cpp/blob/5e59bc5c6491a7505
>> 585c08fd62aa52f9a6c9afc/src/parquet/column/writer.cc#L159
>> >
>> > This is double buggy -- the size of the row group and the number of
>> > values written is different, but you're writing *more* values than the
>> > row group contains. I'm opening a JIRA to throw a better exception
>> >
>> > See the logic for forming num_rows_ for columns with
>> max_repetition_level > 0:
>> >
>> > https://github.com/apache/parquet-cpp/blob/master/src/parque
>> t/column/writer.cc#L323
>> >
>> > num_rows_ is incremented each time a new record begins
>> > (repetition_level 0). You can write as many repeated values as you
>> > like in a row group as long as the repetition levels encode the
>> > corresponding number of records -- if you run into a case where this
>> > happens, can you open a JIRA so we can add a test case and fix?
>> >
>> > Thanks
>> > Wes
>> >
>> > On Mon, Mar 13, 2017 at 12:14 PM, Grant Monroe <[email protected]> wrote:
>> >> I should also mention that I built parquet-cpp from github, commit
>> >> 1c4492a111b00ef48663982171e3face1ca2192d.
>> >>
>> >> On Mon, Mar 13, 2017 at 12:10 PM, Grant Monroe <[email protected]> wrote:
>> >>
>> >>> I'm struggling to get a simple parquet writer working using the c++
>> >>> library. The source is here:
>> >>>
>> >>> https://gist.github.com/tnarg/8878a38d4a22104328c4d289319f9ac1
>> >>>
>> >>> and I'm compiling like so
>> >>>
>> >>> g++ --std=c++11 -o writer writer.cc -lparquet -larrow -larrow_io
>> >>>
>> >>> When I run this program, I get the following error
>> >>>
>> >>> gmonroe@foo:~$ ./writer
>> >>> terminate called after throwing an instance of
>> 'parquet::ParquetException'
>> >>>   what():  Less than the number of expected rows written in the current
>> >>> column chunk
>> >>> Aborted (core dumped)
>> >>>
>> >>> If I change NUM_ROWS_PER_ROW_GROUP=3, this writer succeeds. This
>> suggests
>> >>> that every column needs to contain N values such that N
>> >>> % NUM_ROWS_PER_ROW_GROUP = 0 and N > 0. For an arbitrarily complex set
>> of
>> >>> values the only reasonable choice for NUM_ROWS_PER_ROW_GROUP is 1.
>> >>>
>> >>> Is this a bug in the c++ library or am I missing something in the API?
>> >>>
>> >>> Regards,
>> >>> Grant Monroe
>> >>>
>>
>
>
>
> --
> regards,
> Deepak Majeti

Reply via email to