MehulBatra commented on PR #790:
URL: https://github.com/apache/iceberg-python/pull/790#issuecomment-2149033357
Hi @Fokko and @HonahX
✅ I have modified the read logic to read the orc file-based iceberg table
and wrote an integration test too it is working great.
Would love Some guidance on:
1. I'm having trouble with scoping unit tests. Some examples/directions
would be helpful.
2. Secondly, I couldn’t find a way to create an orc file-based iceberg table
via glue client(except by passing the properties with format), but this is
making parquet data files only when I am appending the data ( Is it due to
datafile and deletefile logic that they are by default taking parquet file
format)
I might be missing something can you point me in the right direction?
```
from pyiceberg.catalog import load_catalog
from decimal import Decimal
import pyarrow as pa
catalog = load_catalog("default")
namespace = 'demo_ns'
table_name = 'test_table_dummy_orc_demo'
pylist = [{'decimal_col': Decimal('32768.1'), 'int_col': 1, 'string_col':
"demo_one"},
{'decimal_col': Decimal('44456.1'), 'int_col': 2, 'string_col':
"demo_two"}]
arrow_schema = pa.schema(
[
pa.field('decimal_col', pa.decimal128(33, 1)),
pa.field('int_col', pa.int32()),
pa.field('string_col', pa.string()),
],
)
arrow_table = pa.Table.from_pylist(pylist, schema=arrow_schema)
new_table = catalog.create_table(
identifier=f'{namespace}.{table_name}',
schema=arrow_schema,
properties={
'format': 'orc'
}
table.append(arrow_table)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]