Thomas Friedrich created PARQUET-324:
----------------------------------------
Summary: row count incorrect if data file has more than 2^31 rows
Key: PARQUET-324
URL: https://issues.apache.org/jira/browse/PARQUET-324
Project: Parquet
Issue Type: Bug
Components: parquet-mr
Affects Versions: 1.7.0, 1.8.0
Reporter: Thomas Friedrich
Priority: Minor
If a parquet file has more than 2^31 rows, the row count written into the file
metadata is incorrect.
The cause of the problem is the use of an int instead of long data type for
numRows in ParquetMetadataConverter, toParquetMetadata:
int numRows = 0;
for (BlockMetaData block : blocks) {
numRows += block.getRowCount();
addRowGroup(parquetMetadata, rowGroups, block);
}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)