[
https://issues.apache.org/jira/browse/PARQUET-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17678888#comment-17678888
]
ASF GitHub Bot commented on PARQUET-2231:
-----------------------------------------
wjones127 commented on code in PR #189:
URL: https://github.com/apache/parquet-format/pull/189#discussion_r1081899568
##########
Encodings.md:
##########
@@ -280,16 +280,19 @@ concatenated back to back. The expected savings is from
the cost of encoding the
and possibly better compression in the data (it is no longer interleaved with
the lengths).
The data stream looks like:
-
+```
<Delta Encoded Lengths> <Byte Array Data>
+```
-For example, if the data was "Hello", "World", "Foobar", "ABCDEF":
+For example, if the data was "Hello", "World", "Foobar", "ABCDEF"
-The encoded data would be DeltaEncoding(5, 5, 6, 6) "HelloWorldFoobarABCDEF"
+The encoded data would be comprised of the following segments:
Review Comment:
```suggestion
then the encoded data would be comprised of the following segments:
```
> [Format] Encoding spec incorrect for DELTA_BYTE_ARRAY
> -----------------------------------------------------
>
> Key: PARQUET-2231
> URL: https://issues.apache.org/jira/browse/PARQUET-2231
> Project: Parquet
> Issue Type: Bug
> Components: parquet-format
> Reporter: Antoine Pitrou
> Assignee: Antoine Pitrou
> Priority: Critical
> Fix For: format-2.10.0
>
>
> The spec says that DELTA_BYTE_ARRAY is only supported for BYTE_ARRAY, but in
> parquet-mr it has been allowed for FIXED_LEN_BYTE_ARRAY as well since 2015.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)