acdha opened a new issue, #3095: URL: https://github.com/apache/parquet-java/issues/3095
### Describe the bug, including details regarding any error messages, version, and platform. Using Parquet CLI 1.15.0 via Mac Homebrew, I noticed some surprising behaviour with the `parquet-cli` and nested columns. `parquet schema catalog.parquet` returns a schema showing the nested types (I've trimmed the field list slightly): ```avro { "type" : "record", "name" : "schema", "fields" : [ { "name" : "item_id", "type" : "string" }, { "name" : "title", "type" : [ "null", "string" ], "default" : null }, { "name" : "language", "type" : [ "null", "string" ], "default" : null }, { "name" : "subjects", "type" : [ "null", { "type" : "array", "items" : { "type" : "record", "name" : "list", "fields" : [ { "name" : "element", "type" : "string" } ] } } ], "default" : null }, { "name" : "authors", "type" : [ "null", { "type" : "array", "items" : { "type" : "record", "name" : "list", "namespace" : "list2", "fields" : [ { "name" : "element", "type" : "string" } ] } } ], "default" : null } ] } ``` `parquet dictionary -c subjects.list.element catalog.parquet` will return the expected values for those fields as well: ``` Row group 0 dictionary for "subjects.list.element": 0: "Bestsellers" 1: "Biography" 2: "Fantasy Fiction" 3: "Music Theory" 4: "Disability" 5: "Family" 6: "Young Adult" ``` However, when using `cat` or `head` to display the file contents those fields are displayed as null: ``` {"bmc_id": "id1", "title": null, "language": "en", "subjects": null, "authors": null} {"bmc_id": "id2", "title": null, "language": "en", "subjects": null, "authors": null,} {"bmc_id": "id3", "title": null, "language": "en", "subjects": null, "authors": null} ``` Other tools like PyArrow or Pandas do display those values as arrays. I created this as a bug because it _looks_ like it's working and if those fields are nullable, there's no way to tell whether the null value is correct. ### Component(s) _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@parquet.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@parquet.apache.org For additional commands, e-mail: issues-h...@parquet.apache.org