raulcd commented on PR #88:
URL: https://github.com/apache/parquet-testing/pull/88#issuecomment-3044413356
I have added two new columns:
|column_name|min|max|
| -------- | ------- |----|
|utf8_full_truncation|"Al"|"Kf"|
|binary_full_truncation|"0x416C" | "0x4B66"|
|utf8_partial_truncation|"Al" | "🚀Kevin Bacon"|
|binary_partial_truncation|"0x416C" | "0xFFFF0102"|
See:
```
$ java -jar parquet-cli/target/parquet-cli-1.16.0-SNAPSHOT-runtime.jar meta
/home/raulcd/code/parquet_truncate_file_generator/binary_truncated_min_max.parquet
File path:
/home/raulcd/code/parquet_truncate_file_generator/binary_truncated_min_max.parquet
Created by: parquet-rs version 55.1.0
Properties:
ARROW:schema:
/////zgBAAAQAAAAAAAKAAwACgAJAAQACgAAABAAAAAAAQQACAAIAAAABAAIAAAABAAAAAQAAADMAAAAgAAAAEQAAAAEAAAAVP///xQAAAAMAAAAAAAABAwAAAAAAAAARP///xkAAABiaW5hcnlfcGFydGlhbF90cnVuY2F0aW9uAAAAkP///xQAAAAMAAAAAAAABQwAAAAAAAAAgP///xcAAAB1dGY4X3BhcnRpYWxfdHJ1bmNhdGlvbgDI////FAAAAAwAAAAAAAAEDAAAAAAAAAC4////FgAAAGJpbmFyeV9mdWxsX3RydW5jYXRpb24AABAAFAAQAAAADwAEAAAACAAQAAAAGAAAAAwAAAAAAAAFEAAAAAAAAAAEAAQABAAAABQAAAB1dGY4X2Z1bGxfdHJ1bmNhdGlvbgAAAAA=
Schema:
message arrow_schema {
required binary utf8_full_truncation (STRING);
required binary binary_full_truncation;
required binary utf8_partial_truncation (STRING);
required binary binary_partial_truncation;
}
Row group 0: count: 12 82.83 B records start: 4 total(compressed): 994 B
total(uncompressed):994 B
--------------------------------------------------------------------------------
type encodings count avg size nulls
min / max
utf8_full_truncation BINARY _ BB_ 12 20.83 B 0
"Al" / "Kf"
binary_full_truncation BINARY _ BB_ 12 20.83 B 0
"0x416C" / "0x4B66"
utf8_partial_truncation BINARY _ BB_ 12 21.50 B 0
"Al" / "🚀Kevin Bacon"
binary_partial_truncation BINARY _ BB_ 12 19.67 B 0
"0x416C" / "0xFFFF0102"
```
and the data:
```
$ java -jar parquet-cli/target/parquet-cli-1.16.0-SNAPSHOT-runtime.jar met
/hcat
/home/raulcd/code/parquet_truncate_file_generator/binary_truncated_min_max.parquet
{"utf8_full_truncation": "Blart Versenwald III", "binary_full_truncation":
"Blart Versenwald III", "utf8_partial_truncation": "Blart Versenwald III",
"binary_partial_truncation": "Blart Versenwald III"}
{"utf8_full_truncation": "Alice Johnson", "binary_full_truncation": "Alice
Johnson", "utf8_partial_truncation": "Alice Johnson",
"binary_partial_truncation": "Alice Johnson"}
{"utf8_full_truncation": "Bob Smith", "binary_full_truncation": "Bob Smith",
"utf8_partial_truncation": "Bob Smith", "binary_partial_truncation": "Bob
Smith"}
{"utf8_full_truncation": "Charlie Brown", "binary_full_truncation": "Charlie
Brown", "utf8_partial_truncation": "Charlie Brown",
"binary_partial_truncation": "Charlie Brown"}
{"utf8_full_truncation": "Diana Prince", "binary_full_truncation": "Diana
Prince", "utf8_partial_truncation": "Diana Prince",
"binary_partial_truncation": "Diana Prince"}
{"utf8_full_truncation": "Edward Norton", "binary_full_truncation": "Edward
Norton", "utf8_partial_truncation": "Edward Norton",
"binary_partial_truncation": "Edward Norton"}
{"utf8_full_truncation": "Fiona Apple", "binary_full_truncation": "Fiona
Apple", "utf8_partial_truncation": "Fiona Apple", "binary_partial_truncation":
"Fiona Apple"}
{"utf8_full_truncation": "George Lucas", "binary_full_truncation": "George
Lucas", "utf8_partial_truncation": "George Lucas", "binary_partial_truncation":
"George Lucas"}
{"utf8_full_truncation": "Helen Keller", "binary_full_truncation": "Helen
Keller", "utf8_partial_truncation": "Helen Keller",
"binary_partial_truncation": "Helen Keller"}
{"utf8_full_truncation": "Ivan Drago", "binary_full_truncation": "Ivan
Drago", "utf8_partial_truncation": "Ivan Drago", "binary_partial_truncation":
"Ivan Drago"}
{"utf8_full_truncation": "Julia Roberts", "binary_full_truncation": "Julia
Roberts", "utf8_partial_truncation": "Julia Roberts",
"binary_partial_truncation": "Julia Roberts"}
{"utf8_full_truncation": "Kevin Bacon", "binary_full_truncation": "Kevin
Bacon", "utf8_partial_truncation": "🚀Kevin Bacon", "binary_partial_truncation":
"ÿÿ\u0001\u0002"}
```
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]