aersam commented on issue #36593:
URL: https://github.com/apache/arrow/issues/36593#issuecomment-1662215807

   Seems using replace_schema does not work. The dataset always uses those 
column names to query the parquet, meaning the column names must match the ones 
in physical files. What really is needed is a separation between physical 
column name and logical column name. This would be really great, especially 
since parquet is a bit limited in what column names are allowed. 
   The best would be to have a "column mapping" in the 
[fragment](https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Fragment.html)
  which would map the schema column names to physical column names. This would 
allow making queries with parquets with different physical column for the same 
logical column name. I guess that's a bit complex regarding the filters... but 
still would be great.
   
   If we'd want to abstract Apache Iceberg oder Delta Lake Tables with the 
dataset, this would be needed (both support such column mapping stuff)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to