As an example, you can look at https://github.com/apache/parquet-cpp/blob/master/examples/reader-writer.cc#L140 The int64_field column has a list of size 2 in every row.
On Thu, Mar 16, 2017 at 3:56 PM, Wes McKinney <[email protected]> wrote: > The definition levels depend on the array encoding -- so to account > for nullable lists and nullable values, the actual definition levels > (based on the schema) may range from 1 to 3. > > I found this exposition in the Impala codebase really useful: > > https://github.com/apache/incubator-impala/blob/master/ > be/src/exec/hdfs-parquet-scanner.h#L78 > > > On Thu, Mar 16, 2017 at 3:51 PM, Wes McKinney <[email protected]> wrote: > > hi Grant, > > > > The value [1, 2, 3] is only 1 value, not 3. The "Number of rows" > > passed to the row group is with respect to top level records, *not* > > counting repeated fields. > > > > From https://blog.twitter.com/2013/dremel-made-simple-with-parquet, I > > believe the correct data to write is: > > > > rep level | def level | value > > 0 | 1 | 1 > > 1 | 1 | 2 > > 1 | 1 | 3 > > > > parquet-cpp knows from this data that the 3 values are part of only > > one logical record > > > > Does that make sense? > > > > Thanks > > Wes > > > > On Thu, Mar 16, 2017 at 3:40 PM, Grant Monroe <[email protected]> wrote: > >> Hi Deepak, > >> > >>> Can you use the master branch or the 1.0.0-rc5 release and try again? > You > >>> will just get the error and not the core dump. > >> > >> Upgrading to master does indeed remove the abort(). > >> > >>> Just to clarify, the NUM_ROWS_PER_ROW_GROUP value is NOT an upper > bound to > >>> the total number of rows in a RowGroup. The number of rows being added > must > >>> be exactly equal to the NUM_ROWS_PER_ROW_GROUP value. > >> > >> I can see that from the error message. My question is, given the > example JSON object > >> > >> { > >> "foo": false, > >> "bars": [1,2,3] > >> } > >> > >> how might I store this using the parquet-cpp API? I have one column > with 1 value and another with 3. The only general solution I can see would > be to use NUM_ROWS_PER_ROW_GROUP=1 which seems like nonsense. What am I > missing? Sample code would be helpful. > >> > >> Thanks, > >> Grant > >> > -- regards, Deepak Majeti
