alaturqua opened a new issue, #6973:
URL: https://github.com/apache/iceberg/issues/6973
### Feature Request / Improvement
We use ORC file format to store our iceberg tables on azure storage.
Currently PyIceberg supports parquet format but not ORC.
This is a request to have ORC file format support in PyIceberg.
```
[17](vscode-notebook-cell:/c%3A/Projects/pandas_snowflake/notebooks/pyiceberg_test.ipynb#W0sZmlsZQ%3D%3D?line=16)
tbl.location()
--->
[18](vscode-notebook-cell:/c%3A/Projects/pandas_snowflake/notebooks/pyiceberg_test.ipynb#W0sZmlsZQ%3D%3D?line=17)
tbl.scan().to_pandas()
File C:\Projects\incubator-iceberg\python\pyiceberg\table\__init__.py:409,
in DataScan.to_pandas(self, **kwargs)
408 def to_pandas(self, **kwargs: Any) -> pd.DataFrame:
--> 409 return self.to_arrow().to_pandas(**kwargs)
File C:\Projects\incubator-iceberg\python\pyiceberg\table\__init__.py:404,
in DataScan.to_arrow(self)
401 def to_arrow(self) -> pa.Table:
402 from pyiceberg.io.pyarrow import project_table
--> 404 return project_table(
405 self.plan_files(), self.table, self.row_filter,
self.projection(), case_sensitive=self.case_sensitive
406 )
File C:\Projects\incubator-iceberg\python\pyiceberg\io\pyarrow.py:558, in
project_table(tasks, table, row_filter, projected_schema, case_sensitive)
551 projected_field_ids = {
552 id for id in projected_schema.field_ids if not
isinstance(projected_schema.find_type(id), (MapType, ListType))
553 }.union(extract_field_ids(bound_row_filter))
555 with ThreadPool() as pool:
556 tables = [
557 table
...
File c:\Python\Python39\lib\site-packages\pyarrow\_parquet.pyx:1227, in
pyarrow._parquet.ParquetReader.open()
File c:\Python\Python39\lib\site-packages\pyarrow\error.pxi:100, in
pyarrow.lib.check_status()
ArrowInvalid: Parquet magic bytes not found in footer. Either the file is
corrupted or this is not a parquet file.
```
### Query engine
None
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]