colin fang created PARQUET-1546:
-----------------------------------
Summary: page level min / max written by parquet-cpp is not
recognized by parquet-tools
Key: PARQUET-1546
URL: https://issues.apache.org/jira/browse/PARQUET-1546
Project: Parquet
Issue Type: Bug
Components: parquet-cpp
Reporter: colin fang
test parquet is created by
{{n = 1000000}}
{{x = [1.0, 2.0, 3.0, 4.0, 5.0, 5.0, None] * n}}
{{y = [u'é', u'é', u'é', u'é'] * n + [u'a', None, u'a'] * n}}{{z =
np.random.rand(len(x)).tolist()}}
{{df = pd.DataFrame(\{'x': x, 'y': y, 'z':
z})}}{{df.to_parquet('test_arrow.parquet', use_dictionary=False,
row_group_size=1900100)}}
output from parquet-tools
{{y TV=1900100 RL=0 DL=1}}
{{
----------------------------------------------------------------------------}}
{{ page 0: DLE:RLE RLE:RLE VLE:PLAIN ST:[min: é, max: é, num_nulls: 0]
SZ:1050632 VC:175104}}
{{ page 1: DLE:RLE RLE:RLE VLE:PLAIN ST:[num_nulls: 90072, min/max not defined]
SZ:1083218 VC:294912}}
{{ page 2: DLE:RLE RLE:RLE VLE:PLAIN ST:[min: a, max: a, num_nulls: 105131]
SZ:1091359 VC:315392}}
In the above "min/max not defined"
The parquet generated by `parquet-mr` has the correct page min max.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)