phpsxg opened a new issue, #14763:
URL: https://github.com/apache/arrow/issues/14763
### Describe the usage question you have. Please include as many useful
details as possible.
write_to_dataset the dataframe and report the following error:
```
File "pyarrow\_dataset.pyx", line 476, in
pyarrow._dataset.InMemoryDataset.__init__
pyarrow.lib.ArrowTypeError: Item has schema
issuer_code: string
trade_date: string
industry_code: string
sector_code: string
sector_name: string
industry_name: string
__index_level_0__: int64
-- schema metadata --
pandas: '{"index_columns": ["__index_level_0__"], "column_indexes": [{"na' +
1067
which does not match expected schema
issuer_code: string
-- field metadata --
table_field: 'PARTY_CODE'
trade_date: string
-- field metadata --
table_field: 'START_DATE'
industry_code: string
-- field metadata --
table_field: 'INDU_SYS_CODE'
sector_code: string
-- field metadata --
table_field: 'INDU_CODE_1ST'
sector_name: string
-- field metadata --
table_field: 'INDU_NAME_1ST'
industry_name: string
-- field metadata --
table_field: 'PARA_NAME'
-- schema metadata --
issuer_code: '主体编码'
trade_date: '变动起始日'
industry_code: '行业分类体系内部编码'
sector_code: '一级行业分类代码'
sector_name: '一级行业分类名称'
industry_name: '常量名称'
Process finished with exit code 1
```
I found it because I use a column in the dataframe as an index, and if I
want to keep that index, how to define pa.schema in this case?
The currently defined schema is as follows:
```
schema = pa.schema([
pa.field(Dimension.ISSUER_CODE, pa.string(),
metadata={b'table_field': b'PARTY_CODE'}),
pa.field(Dimension.TRADE_DATE, pa.string(),
metadata={b'table_field': b'START_DATE'}),
pa.field(Dimension.INDUSTRY_CODE, pa.string(),
metadata={b'table_field': b'INDU_SYS_CODE'}),
pa.field(Dimension.SECTOR_CODE, pa.string(),
metadata={b'table_field': b'INDU_CODE_1ST'}),
pa.field(Dimension.SECTOR_NAME, pa.string(),
metadata={b'table_field': b'INDU_NAME_1ST'}),
pa.field(Dimension.INDUSTRY_NAME, pa.string(),
metadata={b'table_field': b'PARA_NAME'}),
],
metadata={
Dimension.ISSUER_CODE: '主体编码',
Dimension.TRADE_DATE: '变动起始日',
Dimension.INDUSTRY_CODE: '行业分类体系内部编码',
Dimension.SECTOR_CODE: '一级行业分类代码',
Dimension.SECTOR_NAME: '一级行业分类名称',
Dimension.INDUSTRY_NAME: '常量名称',
}
)
```
### Component(s)
Parquet, Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]