Thanks for the update on PyIceberg's new features. It's exciting to see the progress!
I have a quick question: Currently, as I understand, PyIceberg operates within a single process. Are there any plans to expand its capabilities to support distributed computation, particularly for write operations? If so, which distributed framework are we considering for integration - perhaps Ray or something similar? Yufei On Fri, Jan 26, 2024 at 8:48 AM Ryan Blue <b...@tabular.io> wrote: > It's great to see all the progress in PyIceberg. Thanks to everyone that's > been contributing! > > I'm all for getting a release out as soon as possible and following up > with more features in the write path in 0.7.0. > > On Fri, Jan 26, 2024 at 5:22 AM Fokko Driesprong <fo...@apache.org> wrote: > >> Hey everyone, >> >> I want to discuss the 0.6.0 release that will bring a lot of >> functionality to the public: >> >> - Write support for writing to unpartitioned tables >> - Includes snapshot generation >> - Constructing Avro writer trees >> - Support writing metadata which allows to commit support for the >> Hive, Sql, and Glue catalog. >> - Support for name-mapping >> - Easy evolution of schema using the union_by_name method >> - And a lot of bug fixes and improvements >> >> The write support is still limited, for example, partitioned writes or >> tables with sort-orders are not supported. Also, as Ryan mentioned during >> the last community sync, we're doing fast appends by default, and we're >> unable to compact yet. I've created issues on Github >> <https://github.com/apache/iceberg-python/issues> to track all these >> limitations. However, I think it is good to get the current work out to the >> public so they can try it and we can uncover any impediments as soon as >> possible. And we can follow up with 0.7.0. >> >> Kind regards, >> Fokko Driesprong >> > > > -- > Ryan Blue > Tabular >