cgbur commented on issue #716:
URL: https://github.com/apache/iceberg-python/issues/716#issuecomment-2101184351
Here is a complete example recreating the error. Here I am using polars to
make the table which results in the same schema that I am producing with
pyarrow.
```python
import polars as pl
from pyiceberg.catalog.sql import SqlCatalog
import pyarrow.parquet as pq
import os
import shutil
pl.DataFrame(
{
"a": [[{"a": 1}, {"a": 2}], [{"a": 3}]],
}
).write_parquet("example.parquet")
warehouse_path = "/tmp/warehouse"
# wipe the warehouse
if os.path.exists(warehouse_path):
shutil.rmtree(warehouse_path)
os.makedirs(warehouse_path)
catalog = SqlCatalog(
"default",
**{
"uri": f"sqlite:///{warehouse_path}/pyiceberg_catalog.db",
"warehouse": f"file://{warehouse_path}",
},
)
df = pq.read_table("example.parquet")
catalog.create_namespace("default")
table = catalog.create_table(
"default.webserver",
schema=df.schema,
)
table.add_files(["example.parquet"])
```
And here is the error. The top two lines were debug statements showing how
the mapping file has the incorrect path.
```python
print(f"column mappings {len(parquet_column_mapping)}")
print(parquet_column_mapping)
```
```
column mappings 1
{'a.list.element.a': 3}
Traceback (most recent call last):
File "/home/cgbur/pyice-test/failure.py", line 36, in <module>
table.add_files(["example.parquet"])
File
"/local/home/cgbur/pyice-test/iceberg-python/pyiceberg/table/__init__.py", line
1355, in add_files
tx.add_files(file_paths=file_paths,
snapshot_properties=snapshot_properties)
File
"/local/home/cgbur/pyice-test/iceberg-python/pyiceberg/table/__init__.py", line
462, in add_files
for data_file in data_files:
File
"/local/home/cgbur/pyice-test/iceberg-python/pyiceberg/table/__init__.py", line
2737, in _parquet_files_to_data_files
yield from parquet_files_to_data_files(io=io,
table_metadata=table_metadata, file_paths=iter(file_paths))
File
"/local/home/cgbur/pyice-test/iceberg-python/pyiceberg/io/pyarrow.py", line
1869, in parquet_files_to_data_files
statistics = data_file_statistics_from_parquet_metadata(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/local/home/cgbur/pyice-test/iceberg-python/pyiceberg/io/pyarrow.py", line
1734, in data_file_statistics_from_parquet_metadata
field_id = parquet_column_mapping[column.path_in_schema]
~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'a.list.item.a'
```
You can see how the parquet `path_in_schema` using the `item` instead of
`element`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]