Agreed Fokko :) I've cherry-picked the commits in the 0.7.1 milestone into a new branch, and created this PR: https://github.com/apache/iceberg-python/pull/1031
If this looks good, I can start the release for 0.7.1. Sung On Fri, Aug 9, 2024 at 5:05 AM Fokko Driesprong <fo...@apache.org> wrote: > Hey Sung, > > That's a great find. I just merged the PR, and it would be good to get the > release process rolling to get #1026 > <https://github.com/apache/iceberg-python/pull/1026> out to the users. > > Kind regards, > Fokko > > Op do 8 aug 2024 om 23:20 schreef Sung Yun <sungwy...@gmail.com>: > >> Thank you for reporting the issues and putting in the fixes Fokko and >> André. >> >> We also identified a correctness issue with applying positional deletes >> on merge-on-read tables that I think also must be included into this >> release. Here's the PR that resolves the issue: >> https://github.com/apache/iceberg-python/pull/1026 >> >> Sung >> >> On Thu, Aug 8, 2024 at 9:29 AM André Luis Anastácio >> <ndrl...@proton.me.invalid> wrote: >> >>> I fixed an overwrite error that, I think, would be good to include in >>> the 0.7.1 release https://github.com/apache/iceberg-python/pull/1023 >>> >>> André Anastácio >>> >>> On Thursday, August 8th, 2024 at 4:29 AM, Fokko Driesprong < >>> fo...@apache.org> wrote: >>> >>> Thanks everyone for the input here, and I agree that the aforementioned >>> #995 <https://github.com/apache/iceberg-python/pull/995/> and #997 >>> <https://github.com/apache/iceberg-python/pull/997/> by Sung, and #526 >>> <https://github.com/apache/iceberg-python/pull/526> by André would also >>> be good to include (I've added the milestone there). I have two minor ones >>> that are also good candidates to add to 0.7.1: >>> >>> - Allow setting <http://goog_2004148629>write.parquet.row-group-limit >>> <https://github.com/apache/iceberg-python/pull/1016> >>> - Allow setting <http://goog_2004148635>write.parquet.page-row-limit >>> <https://github.com/apache/iceberg-python/pull/1017> >>> >>> Kind regards, >>> Fokko >>> >>> >>> Op di 6 aug 2024 om 21:17 schreef André Luis Anastácio >>> <ndrl...@proton.me.invalid>: >>> >>>> What do you think about adding the fix that excludes PyIceberg support >>>> for Python 3.9.7 in the 0.7.1 release?[1] It already doesn't work, so this >>>> is just to avoid any new issues. >>>> >>>> - [1]: https://github.com/apache/iceberg-python/pull/526 >>>> >>>> André Anastácio >>>> >>>> >>>> On Tuesday, August 6th, 2024 at 4:06 PM, Sung Yun <sun...@apache.org> >>>> wrote: >>>> >>>> > Sounds good folks! Thank you for sharing your thoughts. We'll work on >>>> getting the patch release out, and continue the discussion on upgrading the >>>> PyArrow version to 17.0.0 in time for 0.8.0 release. >>>> > >>>> > Just adding these two more fixes that were introduced that I think we >>>> should pull into the patch release. These were added to the GitHub >>>> milestone for 0.7.1, but just cross posting here for awareness: >>>> > >>>> > - Table scan fails when result is empty: >>>> https://github.com/apache/iceberg-python/pull/997 >>>> > - Fix RestCatalog ListNamespace to correctly make use of the expected >>>> Rest Catalog response: >>>> https://github.com/apache/iceberg-python/pull/997 >>>> > >>>> > Sung >>>> > >>>> > On 2024/08/06 18:29:50 Kevin Liu wrote: >>>> > >>>> > > > Typically we only push patches into the minor versions, we could >>>> also go >>>> > > > to version 0.8.0 immediately. >>>> > > >>>> > > The issues above sound like patches to me, fixing issues discovered >>>> during >>>> > > the 0.7.0 release. Is there a reason to move to 0.8.0? >>>> > > >>>> > > > I'm still on the fence regarding 17.0.0 upgrade. There are clear >>>> > > > functional upsides, but I feel that constraining PyIceberg to >>>> just one >>>> > > > published version would make the adoption of PyIceberg difficult >>>> for our >>>> > > > users. >>>> > > >>>> > > +1 on this concern. Is it possible to make the Arrow 17.0.0 upgrade >>>> > > optional first? So that folks who want the upgrade can test it out. >>>> > > >>>> > > Thanks, >>>> > > Kevin Liu >>>> > > >>>> > > On Fri, Aug 2, 2024 at 11:33 AM Sung Yun sun...@apache.org wrote: >>>> > > >>>> > > > Hi Fokko, >>>> > > > >>>> > > > That makes sense, thank you for the suggestion! The issue was >>>> quite severe >>>> > > > for us that we had to fork the repo and have a fix ourselves in >>>> order to >>>> > > > run PyIceberg without our applications going OOM. So I think >>>> there will be >>>> > > > value in getting the proposed config property out as early as >>>> possible for >>>> > > > the larger community. >>>> > > > >>>> > > > I'm still on the fence regarding 17.0.0 upgrade. There are clear >>>> > > > functional upsides, but I feel that constraining PyIceberg to >>>> just one >>>> > > > published version would make the adoption of PyIceberg difficult >>>> for our >>>> > > > users. Users writing new applications won't have trouble with it, >>>> but users >>>> > > > intending to use PyIceberg in an existing application may have to >>>> upgrade >>>> > > > their PyArrow versions which could be a deterrent (or a welcome >>>> nudge). >>>> > > > Would it be worth starting that discussion on a separate thread? >>>> > > > >>>> > > > Sung >>>> > > > >>>> > > > On 2024/08/02 17:57:17 Fokko Driesprong wrote: >>>> > > > >>>> > > > > Hey Sung, >>>> > > > > >>>> > > > > Typically we only push patches into the minor versions, we >>>> could also go >>>> > > > > to >>>> > > > > version 0.8.0 immediately. >>>> > > > > >>>> > > > > Regarding the memory consumption, thanks for putting those >>>> numbers >>>> > > > > together! I would also love to get #929 >>>> > > > > https://github.com/apache/iceberg-python/pull/929, so we can >>>> push down >>>> > > > > the large/small type to PyArrow (only for to_arrow), and apply >>>> #986 >>>> > > > > https://github.com/apache/iceberg-python/pull/986 on top if >>>> you want >>>> > > > > to >>>> > > > > force it to either small or large types. >>>> > > > > >>>> > > > > WDYT? >>>> > > > > >>>> > > > > Kind regards, >>>> > > > > Fokko >>>> > > > > >>>> > > > > Op vr 2 aug 2024 om 19:46 schreef Sung Yun sun...@apache.org: >>>> > > > > >>>> > > > > > Hi folks, >>>> > > > > > >>>> > > > > > We identified inefficient memory usage hikes with the current >>>> way of >>>> > > > > > upcasting pyarrow types to large_<type> on read, when reading >>>> tables >>>> > > > > > with >>>> > > > > > certain characteristics. A detailed set of example benchmarks >>>> of this >>>> > > > > > issue >>>> > > > > > is on the google document linked on PR #986: >>>> > > > > > https://github.com/apache/iceberg-python/pull/986 >>>> > > > > > >>>> > > > > > The proposed solution introduces a config to override this >>>> behavior to >>>> > > > > > use >>>> > > > > > small types instead, and I'd like to add this into the patch >>>> release to >>>> > > > > > give users better control over their memory usage. >>>> > > > > > >>>> > > > > > Also, this is just a gentle reminder that this DISCUSS thread >>>> is still >>>> > > > > > open for any new issues that are identified from 0.7.0 >>>> release, that we >>>> > > > > > should fix in the patch release. >>>> > > > > > >>>> > > > > > Thank you, >>>> > > > > > Sung >>>> > > > > > >>>> > > > > > On 2024/07/30 23:57:04 Sung Yun wrote: >>>> > > > > > >>>> > > > > > > Hi folks, >>>> > > > > > > >>>> > > > > > > We are starting to compile the list of issues to fix and >>>> port into >>>> > > > > > > the >>>> > > > > > > 0.7.1 release. >>>> > > > > > > >>>> > > > > > > The current list of known issues is as follows: >>>> > > > > > > >>>> > > > > > > Fix pydantic warning on table commit: #972 >>>> > > > > > > https://github.com/apache/iceberg-python/pull/972 (thanks >>>> for the >>>> > > > > > > quick >>>> > > > > > > fix ndrluis!) >>>> > > > > > > Issue when rewriting an unpartitioned table: #979 >>>> > > > > > > https://github.com/apache/iceberg-python/issues/979 >>>> > > > > > > Issue when evolving and writing in the same transaction: >>>> #980 >>>> > > > > > > https://github.com/apache/iceberg-python/issues/980 >>>> > > > > > > >>>> > > > > > > Please feel free to respond to this thread with any issues >>>> that >>>> > > > > > > should be >>>> > > > > > > tracked for the patch release. >>>> > > > > > > >>>> > > > > > > Thank you! >>>> > > > > > > Sung >>>> >>> >>>