Thanks for the update on PyIceberg's new features. It's exciting to see the
progress!

I have a quick question: Currently, as I understand, PyIceberg operates
within a single process. Are there any plans to expand its capabilities to
support distributed computation, particularly for write operations? If so,
which distributed framework are we considering for integration - perhaps
Ray or something similar?

Yufei


On Fri, Jan 26, 2024 at 8:48 AM Ryan Blue <b...@tabular.io> wrote:

> It's great to see all the progress in PyIceberg. Thanks to everyone that's
> been contributing!
>
> I'm all for getting a release out as soon as possible and following up
> with more features in the write path in 0.7.0.
>
> On Fri, Jan 26, 2024 at 5:22 AM Fokko Driesprong <fo...@apache.org> wrote:
>
>> Hey everyone,
>>
>> I want to discuss the 0.6.0 release that will bring a lot of
>> functionality to the public:
>>
>>    - Write support for writing to unpartitioned tables
>>       - Includes snapshot generation
>>       - Constructing Avro writer trees
>>    - Support writing metadata which allows to commit support for the
>>    Hive, Sql, and Glue catalog.
>>    - Support for name-mapping
>>    - Easy evolution of schema using the union_by_name method
>>    - And a lot of bug fixes and improvements
>>
>> The write support is still limited, for example, partitioned writes or
>> tables with sort-orders are not supported. Also, as Ryan mentioned during
>> the last community sync, we're doing fast appends by default, and we're
>> unable to compact yet. I've created issues on Github
>> <https://github.com/apache/iceberg-python/issues> to track all these
>> limitations. However, I think it is good to get the current work out to the
>> public so they can try it and we can uncover any impediments as soon as
>> possible. And we can follow up with 0.7.0.
>>
>> Kind regards,
>> Fokko Driesprong
>>
>
>
> --
> Ryan Blue
> Tabular
>

Reply via email to