Antoine Pitrou created PARQUET-1269: ---------------------------------------
Summary: [C++] Scanning fails with list columns Key: PARQUET-1269 URL: https://issues.apache.org/jira/browse/PARQUET-1269 Project: Parquet Issue Type: Bug Components: parquet-cpp Reporter: Antoine Pitrou {code:python} >>> list_arr = pa.array([[1, 2], [3, 4, 5]]) >>> int_arr = pa.array([10, 11]) >>> table = pa.Table.from_arrays([int_arr, list_arr], ['ints', 'lists']) >>> bio = io.BytesIO() >>> pq.write_table(table, bio) >>> bio.seek(0) 0 >>> reader = pq.ParquetReader() >>> reader.open(bio) >>> reader.scan_contents() Traceback (most recent call last): File "<ipython-input-23-58e977f6d60b>", line 1, in <module> reader.scan_contents() File "_parquet.pyx", line 753, in pyarrow._parquet.ParquetReader.scan_contents File "error.pxi", line 79, in pyarrow.lib.check_status ArrowIOError: Parquet error: Total rows among columns do not match {code} ScanFileContents() claims it returns the "number of semantic rows" but apparently it actually counts the number of physical elements? -- This message was sent by Atlassian JIRA (v7.6.3#76005)