Agreed Fokko :)

I've cherry-picked the commits in the 0.7.1 milestone into a new branch,
and created this PR: https://github.com/apache/iceberg-python/pull/1031

If this looks good, I can start the release for 0.7.1.

Sung

On Fri, Aug 9, 2024 at 5:05 AM Fokko Driesprong <fo...@apache.org> wrote:

> Hey Sung,
>
> That's a great find. I just merged the PR, and it would be good to get the
> release process rolling to get #1026
> <https://github.com/apache/iceberg-python/pull/1026> out to the users.
>
> Kind regards,
> Fokko
>
> Op do 8 aug 2024 om 23:20 schreef Sung Yun <sungwy...@gmail.com>:
>
>> Thank you for reporting the issues and putting in the fixes Fokko and
>> André.
>>
>> We also identified a correctness issue with applying positional deletes
>> on merge-on-read tables that I think also must be included into this
>> release. Here's the PR that resolves the issue:
>> https://github.com/apache/iceberg-python/pull/1026
>>
>> Sung
>>
>> On Thu, Aug 8, 2024 at 9:29 AM André Luis Anastácio
>> <ndrl...@proton.me.invalid> wrote:
>>
>>> I fixed an overwrite error that, I think, would be good to include in
>>> the 0.7.1 release https://github.com/apache/iceberg-python/pull/1023
>>>
>>> André Anastácio
>>>
>>> On Thursday, August 8th, 2024 at 4:29 AM, Fokko Driesprong <
>>> fo...@apache.org> wrote:
>>>
>>> Thanks everyone for the input here, and I agree that the aforementioned
>>> #995 <https://github.com/apache/iceberg-python/pull/995/> and #997
>>> <https://github.com/apache/iceberg-python/pull/997/> by Sung, and #526
>>> <https://github.com/apache/iceberg-python/pull/526> by André would also
>>> be good to include (I've added the milestone there). I have two minor ones
>>> that are also good candidates to add to 0.7.1:
>>>
>>>    - Allow setting <http://goog_2004148629>write.parquet.row-group-limit
>>>    <https://github.com/apache/iceberg-python/pull/1016>
>>>    - Allow setting <http://goog_2004148635>write.parquet.page-row-limit
>>>    <https://github.com/apache/iceberg-python/pull/1017>
>>>
>>> Kind regards,
>>> Fokko
>>>
>>>
>>> Op di 6 aug 2024 om 21:17 schreef André Luis Anastácio
>>> <ndrl...@proton.me.invalid>:
>>>
>>>> What do you think about adding the fix that excludes PyIceberg support
>>>> for Python 3.9.7 in the 0.7.1 release?[1] It already doesn't work, so this
>>>> is just to avoid any new issues.
>>>>
>>>> - [1]: https://github.com/apache/iceberg-python/pull/526
>>>>
>>>> André Anastácio
>>>>
>>>>
>>>> On Tuesday, August 6th, 2024 at 4:06 PM, Sung Yun <sun...@apache.org>
>>>> wrote:
>>>>
>>>> > Sounds good folks! Thank you for sharing your thoughts. We'll work on
>>>> getting the patch release out, and continue the discussion on upgrading the
>>>> PyArrow version to 17.0.0 in time for 0.8.0 release.
>>>> >
>>>> > Just adding these two more fixes that were introduced that I think we
>>>> should pull into the patch release. These were added to the GitHub
>>>> milestone for 0.7.1, but just cross posting here for awareness:
>>>> >
>>>> > - Table scan fails when result is empty:
>>>> https://github.com/apache/iceberg-python/pull/997
>>>> > - Fix RestCatalog ListNamespace to correctly make use of the expected
>>>> Rest Catalog response:
>>>> https://github.com/apache/iceberg-python/pull/997
>>>> >
>>>> > Sung
>>>> >
>>>> > On 2024/08/06 18:29:50 Kevin Liu wrote:
>>>> >
>>>> > > > Typically we only push patches into the minor versions, we could
>>>> also go
>>>> > > > to version 0.8.0 immediately.
>>>> > >
>>>> > > The issues above sound like patches to me, fixing issues discovered
>>>> during
>>>> > > the 0.7.0 release. Is there a reason to move to 0.8.0?
>>>> > >
>>>> > > > I'm still on the fence regarding 17.0.0 upgrade. There are clear
>>>> > > > functional upsides, but I feel that constraining PyIceberg to
>>>> just one
>>>> > > > published version would make the adoption of PyIceberg difficult
>>>> for our
>>>> > > > users.
>>>> > >
>>>> > > +1 on this concern. Is it possible to make the Arrow 17.0.0 upgrade
>>>> > > optional first? So that folks who want the upgrade can test it out.
>>>> > >
>>>> > > Thanks,
>>>> > > Kevin Liu
>>>> > >
>>>> > > On Fri, Aug 2, 2024 at 11:33 AM Sung Yun sun...@apache.org wrote:
>>>> > >
>>>> > > > Hi Fokko,
>>>> > > >
>>>> > > > That makes sense, thank you for the suggestion! The issue was
>>>> quite severe
>>>> > > > for us that we had to fork the repo and have a fix ourselves in
>>>> order to
>>>> > > > run PyIceberg without our applications going OOM. So I think
>>>> there will be
>>>> > > > value in getting the proposed config property out as early as
>>>> possible for
>>>> > > > the larger community.
>>>> > > >
>>>> > > > I'm still on the fence regarding 17.0.0 upgrade. There are clear
>>>> > > > functional upsides, but I feel that constraining PyIceberg to
>>>> just one
>>>> > > > published version would make the adoption of PyIceberg difficult
>>>> for our
>>>> > > > users. Users writing new applications won't have trouble with it,
>>>> but users
>>>> > > > intending to use PyIceberg in an existing application may have to
>>>> upgrade
>>>> > > > their PyArrow versions which could be a deterrent (or a welcome
>>>> nudge).
>>>> > > > Would it be worth starting that discussion on a separate thread?
>>>> > > >
>>>> > > > Sung
>>>> > > >
>>>> > > > On 2024/08/02 17:57:17 Fokko Driesprong wrote:
>>>> > > >
>>>> > > > > Hey Sung,
>>>> > > > >
>>>> > > > > Typically we only push patches into the minor versions, we
>>>> could also go
>>>> > > > > to
>>>> > > > > version 0.8.0 immediately.
>>>> > > > >
>>>> > > > > Regarding the memory consumption, thanks for putting those
>>>> numbers
>>>> > > > > together! I would also love to get #929
>>>> > > > > https://github.com/apache/iceberg-python/pull/929, so we can
>>>> push down
>>>> > > > > the large/small type to PyArrow (only for to_arrow), and apply
>>>> #986
>>>> > > > > https://github.com/apache/iceberg-python/pull/986 on top if
>>>> you want
>>>> > > > > to
>>>> > > > > force it to either small or large types.
>>>> > > > >
>>>> > > > > WDYT?
>>>> > > > >
>>>> > > > > Kind regards,
>>>> > > > > Fokko
>>>> > > > >
>>>> > > > > Op vr 2 aug 2024 om 19:46 schreef Sung Yun sun...@apache.org:
>>>> > > > >
>>>> > > > > > Hi folks,
>>>> > > > > >
>>>> > > > > > We identified inefficient memory usage hikes with the current
>>>> way of
>>>> > > > > > upcasting pyarrow types to large_<type> on read, when reading
>>>> tables
>>>> > > > > > with
>>>> > > > > > certain characteristics. A detailed set of example benchmarks
>>>> of this
>>>> > > > > > issue
>>>> > > > > > is on the google document linked on PR #986:
>>>> > > > > > https://github.com/apache/iceberg-python/pull/986
>>>> > > > > >
>>>> > > > > > The proposed solution introduces a config to override this
>>>> behavior to
>>>> > > > > > use
>>>> > > > > > small types instead, and I'd like to add this into the patch
>>>> release to
>>>> > > > > > give users better control over their memory usage.
>>>> > > > > >
>>>> > > > > > Also, this is just a gentle reminder that this DISCUSS thread
>>>> is still
>>>> > > > > > open for any new issues that are identified from 0.7.0
>>>> release, that we
>>>> > > > > > should fix in the patch release.
>>>> > > > > >
>>>> > > > > > Thank you,
>>>> > > > > > Sung
>>>> > > > > >
>>>> > > > > > On 2024/07/30 23:57:04 Sung Yun wrote:
>>>> > > > > >
>>>> > > > > > > Hi folks,
>>>> > > > > > >
>>>> > > > > > > We are starting to compile the list of issues to fix and
>>>> port into
>>>> > > > > > > the
>>>> > > > > > > 0.7.1 release.
>>>> > > > > > >
>>>> > > > > > > The current list of known issues is as follows:
>>>> > > > > > >
>>>> > > > > > > Fix pydantic warning on table commit: #972
>>>> > > > > > > https://github.com/apache/iceberg-python/pull/972 (thanks
>>>> for the
>>>> > > > > > > quick
>>>> > > > > > > fix ndrluis!)
>>>> > > > > > > Issue when rewriting an unpartitioned table: #979
>>>> > > > > > > https://github.com/apache/iceberg-python/issues/979
>>>> > > > > > > Issue when evolving and writing in the same transaction:
>>>> #980
>>>> > > > > > > https://github.com/apache/iceberg-python/issues/980
>>>> > > > > > >
>>>> > > > > > > Please feel free to respond to this thread with any issues
>>>> that
>>>> > > > > > > should be
>>>> > > > > > > tracked for the patch release.
>>>> > > > > > >
>>>> > > > > > > Thank you!
>>>> > > > > > > Sung
>>>>
>>>
>>>

Reply via email to