Re: [PR] Add support for orc format [iceberg-python]

via GitHub Wed, 05 Jun 2024 00:03:44 -0700


MehulBatra commented on PR #790:
URL: https://github.com/apache/iceberg-python/pull/790#issuecomment-2149033357


   Hi @Fokko and @HonahX 
    ✅ I have modified the read logic to read the orc file-based iceberg table 
and wrote an integration test too it is working great.
   
   Would love Some guidance  on:
   
   1. I'm having trouble with scoping unit tests. Some examples/directions 
would be helpful.
   
   2. Secondly, I couldn’t find a way to create an orc file-based iceberg table 
via glue client(except by passing the properties with format), but this is 
making parquet data files only when I am appending the data ( Is it due to 
datafile and deletefile logic that they are by default taking parquet file 
format)
   I might be missing something can you point me in the right direction?
   
   ```
   from pyiceberg.catalog import load_catalog
   from decimal import Decimal
   import pyarrow as pa
   
   catalog = load_catalog("default")
   namespace = 'demo_ns'
   table_name = 'test_table_dummy_orc_demo'
   pylist = [{'decimal_col': Decimal('32768.1'), 'int_col': 1, 'string_col': 
"demo_one"},
             {'decimal_col': Decimal('44456.1'), 'int_col': 2, 'string_col': 
"demo_two"}]
   arrow_schema = pa.schema(
       [
           pa.field('decimal_col', pa.decimal128(33, 1)),
           pa.field('int_col',  pa.int32()),
           pa.field('string_col', pa.string()),
       ],
   )
   arrow_table = pa.Table.from_pylist(pylist, schema=arrow_schema)
   new_table = catalog.create_table(
       identifier=f'{namespace}.{table_name}',
       schema=arrow_schema,
       properties={
           'format': 'orc'
       }
   
   table.append(arrow_table)
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add support for orc format [iceberg-python]

Reply via email to