raulcd commented on PR #88:
URL: https://github.com/apache/parquet-testing/pull/88#issuecomment-3044413356

   I have added two new columns:
   |column_name|min|max|
   | -------- | ------- |----|
   |utf8_full_truncation|"Al"|"Kf"|
   |binary_full_truncation|"0x416C" | "0x4B66"|
   |utf8_partial_truncation|"Al" | "🚀Kevin Bacon"|
   |binary_partial_truncation|"0x416C" | "0xFFFF0102"|
   
   See:
   ```
   $ java -jar parquet-cli/target/parquet-cli-1.16.0-SNAPSHOT-runtime.jar meta 
/home/raulcd/code/parquet_truncate_file_generator/binary_truncated_min_max.parquet
   
   File path:  
/home/raulcd/code/parquet_truncate_file_generator/binary_truncated_min_max.parquet
   Created by: parquet-rs version 55.1.0
   Properties:
     ARROW:schema: 
/////zgBAAAQAAAAAAAKAAwACgAJAAQACgAAABAAAAAAAQQACAAIAAAABAAIAAAABAAAAAQAAADMAAAAgAAAAEQAAAAEAAAAVP///xQAAAAMAAAAAAAABAwAAAAAAAAARP///xkAAABiaW5hcnlfcGFydGlhbF90cnVuY2F0aW9uAAAAkP///xQAAAAMAAAAAAAABQwAAAAAAAAAgP///xcAAAB1dGY4X3BhcnRpYWxfdHJ1bmNhdGlvbgDI////FAAAAAwAAAAAAAAEDAAAAAAAAAC4////FgAAAGJpbmFyeV9mdWxsX3RydW5jYXRpb24AABAAFAAQAAAADwAEAAAACAAQAAAAGAAAAAwAAAAAAAAFEAAAAAAAAAAEAAQABAAAABQAAAB1dGY4X2Z1bGxfdHJ1bmNhdGlvbgAAAAA=
   Schema:
   message arrow_schema {
     required binary utf8_full_truncation (STRING);
     required binary binary_full_truncation;
     required binary utf8_partial_truncation (STRING);
     required binary binary_partial_truncation;
   }
   
   
   Row group 0:  count: 12  82.83 B records  start: 4  total(compressed): 994 B 
total(uncompressed):994 B 
   
--------------------------------------------------------------------------------
                              type      encodings count     avg size   nulls   
min / max
   utf8_full_truncation       BINARY    _ BB_     12        20.83 B    0       
"Al" / "Kf"
   binary_full_truncation     BINARY    _ BB_     12        20.83 B    0       
"0x416C" / "0x4B66"
   utf8_partial_truncation    BINARY    _ BB_     12        21.50 B    0       
"Al" / "🚀Kevin Bacon"
   binary_partial_truncation  BINARY    _ BB_     12        19.67 B    0       
"0x416C" / "0xFFFF0102"
   ```
   and the data:
   ```
   $ java -jar parquet-cli/target/parquet-cli-1.16.0-SNAPSHOT-runtime.jar met 
/hcat 
/home/raulcd/code/parquet_truncate_file_generator/binary_truncated_min_max.parquet
   {"utf8_full_truncation": "Blart Versenwald III", "binary_full_truncation": 
"Blart Versenwald III", "utf8_partial_truncation": "Blart Versenwald III", 
"binary_partial_truncation": "Blart Versenwald III"}
   {"utf8_full_truncation": "Alice Johnson", "binary_full_truncation": "Alice 
Johnson", "utf8_partial_truncation": "Alice Johnson", 
"binary_partial_truncation": "Alice Johnson"}
   {"utf8_full_truncation": "Bob Smith", "binary_full_truncation": "Bob Smith", 
"utf8_partial_truncation": "Bob Smith", "binary_partial_truncation": "Bob 
Smith"}
   {"utf8_full_truncation": "Charlie Brown", "binary_full_truncation": "Charlie 
Brown", "utf8_partial_truncation": "Charlie Brown", 
"binary_partial_truncation": "Charlie Brown"}
   {"utf8_full_truncation": "Diana Prince", "binary_full_truncation": "Diana 
Prince", "utf8_partial_truncation": "Diana Prince", 
"binary_partial_truncation": "Diana Prince"}
   {"utf8_full_truncation": "Edward Norton", "binary_full_truncation": "Edward 
Norton", "utf8_partial_truncation": "Edward Norton", 
"binary_partial_truncation": "Edward Norton"}
   {"utf8_full_truncation": "Fiona Apple", "binary_full_truncation": "Fiona 
Apple", "utf8_partial_truncation": "Fiona Apple", "binary_partial_truncation": 
"Fiona Apple"}
   {"utf8_full_truncation": "George Lucas", "binary_full_truncation": "George 
Lucas", "utf8_partial_truncation": "George Lucas", "binary_partial_truncation": 
"George Lucas"}
   {"utf8_full_truncation": "Helen Keller", "binary_full_truncation": "Helen 
Keller", "utf8_partial_truncation": "Helen Keller", 
"binary_partial_truncation": "Helen Keller"}
   {"utf8_full_truncation": "Ivan Drago", "binary_full_truncation": "Ivan 
Drago", "utf8_partial_truncation": "Ivan Drago", "binary_partial_truncation": 
"Ivan Drago"}
   {"utf8_full_truncation": "Julia Roberts", "binary_full_truncation": "Julia 
Roberts", "utf8_partial_truncation": "Julia Roberts", 
"binary_partial_truncation": "Julia Roberts"}
   {"utf8_full_truncation": "Kevin Bacon", "binary_full_truncation": "Kevin 
Bacon", "utf8_partial_truncation": "🚀Kevin Bacon", "binary_partial_truncation": 
"ÿÿ\u0001\u0002"}
   ```
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to