mishbahr opened a new issue #11967: URL: https://github.com/apache/arrow/issues/11967
I'm writing some DataFrame to binary parquet format with one or more entire null object columns. If I then load the parquet dataset with `use_legacy_dataset=False` ```python parquet_dataset = pq.ParquetDataset(root_path, use_legacy_dataset=False, **kwargs) type(parquet) pyarrow.parquet._ParquetDatasetV2 ``` It returns an `_ParquetDatasetV2` instance and when I check the schema. ```python type(parquet_dataset.schema) pyarrow.lib.Schema ``` If I load the same file but with `use_legacy_dataset=True` ```python parquet_dataset2 = pq.ParquetDataset(root_path, use_legacy_dataset=True, **kwargs) ``` The schema for the file is an instance of `ParquetSchema` ```python type(parquet_dataset2.schema) pyarrow._parquet.ParquetSchema ``` This is as I would expect and I'm aware that I can get the "arrow schema" like this. ```python arrow_schema = parquet_dataset2.schema.to_arrow_schema() type(arrow_schema) pyarrow.lib.Schema ``` i.e same format as when I use `use_legacy_dataset=False` For an instance of `ParquetSchema`, I can get details of any column. e.g ```python parquet_dataset2.schema[13] <ParquetColumnSchema> name: col13 path: col13 max_definition_level: 1 max_repetition_level: 0 physical_type: INT96 logical_type: None converted_type (legacy): NONE ``` Here the "physical_type" for this column is INT96. ```python parquet.schema[13].physical_type 'INT32' ``` For an instance of `pyarrow.lib.Schema`, if I get the "data type" for the same column. ```python parquet_dataset.schema.field("col13").type DataType(null) ``` i.e with no information about what the "data type" is supposed to be. This information is available in the Parquet file. But how do I access it? Is there way to convert instance of `pyarrow.lib.Schema` -> `pyarrow._parquet.ParquetSchema`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org