l45k opened a new issue, #9081:
URL: https://github.com/apache/arrow-datafusion/issues/9081

   ### Describe the bug
   
   I try to load a parquet file with some metadata in its schema. Both 
`ctx.read_parquet` and `ctx.register_parquet` will not return the schema with 
the metadata, even if `ParquetReadOptions::default().skip_metadata(false)` is 
provided. 
   
   ### To Reproduce
   
   To test, I create a table with metadata and write it to parquet using 
PyArrow:
   ```Python
   import pyarrow as pa
   import pyarrow.parquet as pq
   
   table = pa.table(
     [
     pa.array([1.0, 1.0, 1.0, 1.0, 1.0], type=pa.int64()),
     ],
     names=["col1"],
     metadata = {"col1": "Some metadata"}
   )
   pq.write_table(table, "test.parquet")
     
   print(pq.read_table("test.parquet").schema)
   ```
   Output
   ```
   col1: int64
   -- schema metadata --
   col1: 'Some metadata'
   ```
   
   Next, reading or registering the file with DataFusion:
   ```rust
   #[tokio::main]
   async fn main() {
     let ctx = SessionContext::new();
   
     let table = ctx.read_parquet("test.parquet", 
ParquetReadOptions::default().skip_metadata(false)).await.unwrap();
     println!("Schema {:?}", table.schema());
     println!("Metadata {:?}", table.schema().metadata());
   
     ctx.register_parquet("t", "test.parquet", 
ParquetReadOptions::default().skip_metadata(false)).await.unwrap();
     println!("Schema {:?}", ctx.table("t").await.unwrap().schema());
     println!("Metadata {:?}", 
ctx.table("t").await.unwrap().schema().metadata());
   }
   ```
   Output:
   ```
   Schema DFSchema { fields: [DFField { qualifier: Some(Bare { table: "?table?" 
}), field: Field { name: "col1", data_type: Int64, nullable: true, dict_id: 0, 
dict_is_ordered: false, metadata: {} } }], metadata: {}, 
functional_dependencies: FunctionalDependencies { deps: [] } }
   Metadata {}
   Schema DFSchema { fields: [DFField { qualifier: Some(Bare { table: "t" }), 
field: Field { name: "col1", data_type: Int64, nullable: true, dict_id: 0, 
dict_is_ordered: false, metadata: {} } }], metadata: {}, 
functional_dependencies: FunctionalDependencies { deps: [] } }
   Metadata {}
   ```
   
   
   ### Expected behavior
   
   I would expect the metadata to be the same DataFusion and PyArrow.
   
   ### Additional context
   
   I use the following DataFusion and PyArrow versions:
   `datafusion = { version = "35.0.0", features = ["parquet"] }`
   `pyarrow:15.0.0`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to