westonpace commented on issue #9194:
URL: https://github.com/apache/arrow/issues/9194#issuecomment-761062704
Do you know the data type of the missing column? If so, you can use the
datasets API to read the table. The datasets API can take in a expected schema
that has all columns that might be asked for. This allows for dataset
evolution where you have a master schema for a collection of files but
individual files might not have all the columns.
```
import pandas as pd
import pyarrow as pa
import pyarrow.dataset as pads
read_columns = ['a','b','X']
df = pd.DataFrame({'a': [1, 2, 3], 'b': ['foo', 'bar', 'jar']})
file_name = '/tmp/my_df.pq'
df.to_parquet(file_name)
schema = pa.schema([
('a', pa.int64()),
('b', pa.string()),
('X', pa.int32())
])
# df = pd.read_parquet(file_name, columns = read_columns)
ds = pads.dataset([file_name], schema=schema)
table = ds.to_table()
print(table)
print(table.column('X').to_pylist())
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]