Okay, cool. I was just missing that "NUM_ROWS_PER_ROW_GROUP" was logic rows, 
not number of entries per column. So in the case of my example 
NUM_ROWS_PER_ROW_GROUP=1 is correct. Thanks!

On 2017-03-16 15:51 (-0400), Wes McKinney <[email protected]> wrote: 
> hi Grant,
> 
> The value [1, 2, 3] is only 1 value, not 3. The "Number of rows"
> passed to the row group is with respect to top level records, *not*
> counting repeated fields.
> 
> From https://blog.twitter.com/2013/dremel-made-simple-with-parquet, I
> believe the correct data to write is:
> 
> rep level | def level  | value
> 0         | 1          | 1
> 1         | 1          | 2
> 1         | 1          | 3
> 
> parquet-cpp knows from this data that the 3 values are part of only
> one logical record
> 
> Does that make sense?
> 
> Thanks
> Wes
> 
> On Thu, Mar 16, 2017 at 3:40 PM, Grant Monroe <[email protected]> wrote:
> > Hi Deepak,
> >
> >> Can you use the master branch or the 1.0.0-rc5 release and try again? You
> >> will just get the error and not the core dump.
> >
> > Upgrading to master does indeed remove the abort().
> >
> >> Just to clarify, the NUM_ROWS_PER_ROW_GROUP value is NOT an upper bound to
> >> the total number of rows in a RowGroup. The number of rows being added must
> >> be exactly equal to the NUM_ROWS_PER_ROW_GROUP value.
> >
> > I can see that from the error message. My question is, given the example 
> > JSON object
> >
> > {
> > "foo": false,
> > "bars": [1,2,3]
> > }
> >
> > how might I store this using the parquet-cpp API? I have one column with 1 
> > value and another with 3. The only general solution I can see would be to 
> > use  NUM_ROWS_PER_ROW_GROUP=1 which seems like nonsense. What am I missing? 
> > Sample code would be helpful.
> >
> > Thanks,
> > Grant
> >
> 

Reply via email to