Andrew Lamb created PARQUET-2473:
------------------------------------

             Summary: Clarify parquet-format with respect to repeated fields 
across boundaries
                 Key: PARQUET-2473
                 URL: https://issues.apache.org/jira/browse/PARQUET-2473
             Project: Parquet
          Issue Type: Improvement
          Components: parquet-site
            Reporter: Andrew Lamb


Several implementors have reported that the parquet spec is currently unclear 
as to when repeated fields can span page boundaries (aka can a logical record 
be split across a page and/or row group boundary)

 

Discussion on list: 
[https://lists.apache.org/thread/rd8twnvg4bg3558r507rzpxckcxt5wdn]

 

The conclusion seems to be that the records can't be split across boundaries 
for "v2 data pages" or if there is a page index. 

 

We should clarify the spec to make this clear



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to