hi Grant,

The value [1, 2, 3] is only 1 value, not 3. The "Number of rows"
passed to the row group is with respect to top level records, *not*
counting repeated fields.

>From https://blog.twitter.com/2013/dremel-made-simple-with-parquet, I
believe the correct data to write is:

rep level | def level  | value
0         | 1          | 1
1         | 1          | 2
1         | 1          | 3

parquet-cpp knows from this data that the 3 values are part of only
one logical record

Does that make sense?

Thanks
Wes

On Thu, Mar 16, 2017 at 3:40 PM, Grant Monroe <[email protected]> wrote:
> Hi Deepak,
>
>> Can you use the master branch or the 1.0.0-rc5 release and try again? You
>> will just get the error and not the core dump.
>
> Upgrading to master does indeed remove the abort().
>
>> Just to clarify, the NUM_ROWS_PER_ROW_GROUP value is NOT an upper bound to
>> the total number of rows in a RowGroup. The number of rows being added must
>> be exactly equal to the NUM_ROWS_PER_ROW_GROUP value.
>
> I can see that from the error message. My question is, given the example JSON 
> object
>
> {
> "foo": false,
> "bars": [1,2,3]
> }
>
> how might I store this using the parquet-cpp API? I have one column with 1 
> value and another with 3. The only general solution I can see would be to use 
>  NUM_ROWS_PER_ROW_GROUP=1 which seems like nonsense. What am I missing? 
> Sample code would be helpful.
>
> Thanks,
> Grant
>

Reply via email to