If the release later in the week doesn't have any breaking API changes, perhaps it can be 48.1.0 (and thus also get the bugfix to datafusion)
On Tue, Nov 7, 2023 at 6:41 AM Raphael Taylor-Davies <r.taylordav...@googlemail.com.invalid> wrote: > I intend to cut a new arrow release later this week, I would prefer we > wait for this. > > On 07/11/2023 11:39, Andrew Lamb wrote: > > Perhaps we can create an arrow 48.1.0 patch release to include the fix? > > > > On Tue, Nov 7, 2023 at 12:48 AM Will Jones <will.jones...@gmail.com> > wrote: > > > >> Thanks for the clarification, Raphael. That likely narrows the scope of > who > >> is affected. If this bug is present in DataFusion 33, then delta-rs will > >> likely skip upgrading until 34. If we're the only downstream project > this > >> parsing issue affects, then I think it's fine to release. > >> > >> On Mon, Nov 6, 2023 at 8:22 PM Raphael Taylor-Davies > >> <r.taylordav...@googlemail.com.invalid> wrote: > >> > >>> Hi, > >>> > >>> To further clarify the bug concerns the serde compatibility feature > that > >>> allows converting a serde compatible data structure to arrow [1]. It > will > >>> not impact workloads reading JSON. > >>> > >>> I am not sure this is a sufficiently fundamental bug to warrant special > >>> concern, but happy to defer to others. > >>> > >>> Kind Regards, > >>> > >>> Raphael > >>> > >>> [1]: https://docs.rs/arrow/latest/arrow/#serde-compatibility > >>> > >>> On 7 November 2023 03:20:59 GMT, Will Jones <will.jones...@gmail.com> > >>> wrote: > >>>> Hello, > >>>> > >>>> There is an upstream bug in arrow-json that can cause the JSON reader > to > >>>> return incorrect data for large integers [1]. It was recently fixed by > >>>> Raphael within the last 24 hours, but is not included in any release. > >> The > >>>> bug was introduced in Arrow 48, which this DataFusion release will > >> expose > >>>> users to. > >>>> > >>>> Not sure what the precedent here is, but I think either we should > >> consider > >>>> either (a) seeing if we can release and upgrade Arrow to include the > >> fix, > >>>> or else (b) calling out the regression as a known bug so downstream > >>>> projects can include the path in their applications. > >>>> > >>>> Best, > >>>> > >>>> Will Jones > >>>> > >>>> [1] https://github.com/apache/arrow-rs/issues/5038 > >>>> [2] https://github.com/apache/arrow-rs/pull/5042 > >>>> > >>>> On Mon, Nov 6, 2023 at 12:25 PM Andrew Lamb <al...@influxdata.com> > >> wrote: > >>>>> +1 (the tests passed for me). I have left a comment on > >>>>> https://github.com/apache/arrow-datafusion/issues/8069 > >>>>> > >>>>> On Mon, Nov 6, 2023 at 2:02 PM Andy Grove <andygrov...@gmail.com> > >>> wrote: > >>>>>> I filed https://github.com/apache/arrow-datafusion/issues/8069 > >>>>>> > >>>>>> On Mon, Nov 6, 2023 at 11:59 AM Andy Grove <andygrov...@gmail.com> > >>>>> wrote: > >>>>>>> I see the same error when I run on my M1 Macbook Air with 16 GB > >> RAM. > >>>>>>> ---- aggregates::tests::run_first_last_multi_partitions stdout > >> ---- > >>>>>>> Error: ResourcesExhausted("Failed to allocate additional 632 bytes > >>> for > >>>>>>> GroupedHashAggregateStream[0] with 1829 bytes already allocated - > >>>>> maximum > >>>>>>> available is 605") > >>>>>>> > >>>>>>> It worked fine on my workstation with 128 GB RAM. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On Mon, Nov 6, 2023 at 11:23 AM L. C. Hsieh <vii...@gmail.com> > >>> wrote: > >>>>>>>> Hmm, ran verification script and got one failure: > >>>>>>>> > >>>>>>>> failures: > >>>>>>>> > >>>>>>>> ---- aggregates::tests::run_first_last_multi_partitions stdout > >> ---- > >>>>>>>> Error: ResourcesExhausted("Failed to allocate additional 632 > >> bytes > >>> for > >>>>>>>> GroupedHashAggregateStream[0] with 1829 bytes already allocated - > >>>>>>>> maximum available is 605") > >>>>>>>> > >>>>>>>> failures: > >>>>>>>> aggregates::tests::run_first_last_multi_partitions > >>>>>>>> > >>>>>>>> test result: FAILED. 557 passed; 1 failed; 1 ignored; 0 > >> measured; 0 > >>>>>>>> filtered out; finished in 2.21s > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On Mon, Nov 6, 2023 at 6:57 AM Andy Grove <andygrov...@gmail.com > >>>>>> wrote: > >>>>>>>>> Hi, > >>>>>>>>> > >>>>>>>>> I would like to propose a release of Apache Arrow DataFusion > >>>>>>>> Implementation, > >>>>>>>>> version 33.0.0. > >>>>>>>>> > >>>>>>>>> This release candidate is based on commit: > >>>>>>>>> 262f08778b8ec231d96792c01fc3e051640eb5d4 [1] > >>>>>>>>> The proposed release tarball and signatures are hosted at [2]. > >>>>>>>>> The changelog is located at [3]. > >>>>>>>>> > >>>>>>>>> Please download, verify checksums and signatures, run the unit > >>>>> tests, > >>>>>>>> and > >>>>>>>>> vote > >>>>>>>>> on the release. The vote will be open for at least 72 hours. > >>>>>>>>> > >>>>>>>>> Only votes from PMC members are binding, but all members of the > >>>>>>>> community > >>>>>>>>> are > >>>>>>>>> encouraged to test the release and vote with "(non-binding)". > >>>>>>>>> > >>>>>>>>> The standard verification procedure is documented at > >>>>>>>>> > >> > https://github.com/apache/arrow-datafusion/blob/main/dev/release/README.md#verifying-release-candidates > >>>>>>>>> . > >>>>>>>>> > >>>>>>>>> [ ] +1 Release this as Apache Arrow DataFusion 33.0.0 > >>>>>>>>> [ ] +0 > >>>>>>>>> [ ] -1 Do not release this as Apache Arrow DataFusion 33.0.0 > >>>>>> because... > >>>>>>>>> Here is my vote: > >>>>>>>>> > >>>>>>>>> +1 > >>>>>>>>> > >>>>>>>>> [1]: > >>>>>>>>> > >> > https://github.com/apache/arrow-datafusion/tree/262f08778b8ec231d96792c01fc3e051640eb5d4 > >>>>>>>>> [2]: > >>>>>>>>> > >> > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-33.0.0-rc1 > >>>>>>>>> [3]: > >>>>>>>>> > >> > https://github.com/apache/arrow-datafusion/blob/262f08778b8ec231d96792c01fc3e051640eb5d4/CHANGELOG.md >