hi Grant, The value [1, 2, 3] is only 1 value, not 3. The "Number of rows" passed to the row group is with respect to top level records, *not* counting repeated fields.
>From https://blog.twitter.com/2013/dremel-made-simple-with-parquet, I believe the correct data to write is: rep level | def level | value 0 | 1 | 1 1 | 1 | 2 1 | 1 | 3 parquet-cpp knows from this data that the 3 values are part of only one logical record Does that make sense? Thanks Wes On Thu, Mar 16, 2017 at 3:40 PM, Grant Monroe <[email protected]> wrote: > Hi Deepak, > >> Can you use the master branch or the 1.0.0-rc5 release and try again? You >> will just get the error and not the core dump. > > Upgrading to master does indeed remove the abort(). > >> Just to clarify, the NUM_ROWS_PER_ROW_GROUP value is NOT an upper bound to >> the total number of rows in a RowGroup. The number of rows being added must >> be exactly equal to the NUM_ROWS_PER_ROW_GROUP value. > > I can see that from the error message. My question is, given the example JSON > object > > { > "foo": false, > "bars": [1,2,3] > } > > how might I store this using the parquet-cpp API? I have one column with 1 > value and another with 3. The only general solution I can see would be to use > NUM_ROWS_PER_ROW_GROUP=1 which seems like nonsense. What am I missing? > Sample code would be helpful. > > Thanks, > Grant >
