colin fang created PARQUET-1546:
-----------------------------------

             Summary: page level min / max written by parquet-cpp  is not 
recognized by parquet-tools
                 Key: PARQUET-1546
                 URL: https://issues.apache.org/jira/browse/PARQUET-1546
             Project: Parquet
          Issue Type: Bug
          Components: parquet-cpp
            Reporter: colin fang


test parquet is created by

{{n = 1000000}}
{{x = [1.0, 2.0, 3.0, 4.0, 5.0, 5.0, None] * n}}
{{y = [u'é', u'é', u'é', u'é'] * n + [u'a', None, u'a'] * n}}{{z = 
np.random.rand(len(x)).tolist()}}
{{df = pd.DataFrame(\{'x': x, 'y': y, 'z': 
z})}}{{df.to_parquet('test_arrow.parquet', use_dictionary=False, 
row_group_size=1900100)}}

 

output from parquet-tools

 

{{y TV=1900100 RL=0 DL=1}}
{{ 
----------------------------------------------------------------------------}}
{{ page 0: DLE:RLE RLE:RLE VLE:PLAIN ST:[min: é, max: é, num_nulls: 0] 
SZ:1050632 VC:175104}}
{{ page 1: DLE:RLE RLE:RLE VLE:PLAIN ST:[num_nulls: 90072, min/max not defined] 
SZ:1083218 VC:294912}}
{{ page 2: DLE:RLE RLE:RLE VLE:PLAIN ST:[min: a, max: a, num_nulls: 105131] 
SZ:1091359 VC:315392}}

 

In the above "min/max not defined"

The parquet generated by `parquet-mr` has the correct page min  max.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to