Andrew Lamb created PARQUET-2473:
------------------------------------
Summary: Clarify parquet-format with respect to repeated fields
across boundaries
Key: PARQUET-2473
URL: https://issues.apache.org/jira/browse/PARQUET-2473
Project: Parquet
Issue Type: Improvement
Components: parquet-site
Reporter: Andrew Lamb
Several implementors have reported that the parquet spec is currently unclear
as to when repeated fields can span page boundaries (aka can a logical record
be split across a page and/or row group boundary)
Discussion on list:
[https://lists.apache.org/thread/rd8twnvg4bg3558r507rzpxckcxt5wdn]
The conclusion seems to be that the records can't be split across boundaries
for "v2 data pages" or if there is a page index.
We should clarify the spec to make this clear
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]