Hi, To further clarify the bug concerns the serde compatibility feature that allows converting a serde compatible data structure to arrow [1]. It will not impact workloads reading JSON.
I am not sure this is a sufficiently fundamental bug to warrant special concern, but happy to defer to others. Kind Regards, Raphael [1]: https://docs.rs/arrow/latest/arrow/#serde-compatibility On 7 November 2023 03:20:59 GMT, Will Jones <will.jones...@gmail.com> wrote: >Hello, > >There is an upstream bug in arrow-json that can cause the JSON reader to >return incorrect data for large integers [1]. It was recently fixed by >Raphael within the last 24 hours, but is not included in any release. The >bug was introduced in Arrow 48, which this DataFusion release will expose >users to. > >Not sure what the precedent here is, but I think either we should consider >either (a) seeing if we can release and upgrade Arrow to include the fix, >or else (b) calling out the regression as a known bug so downstream >projects can include the path in their applications. > >Best, > >Will Jones > >[1] https://github.com/apache/arrow-rs/issues/5038 >[2] https://github.com/apache/arrow-rs/pull/5042 > >On Mon, Nov 6, 2023 at 12:25 PM Andrew Lamb <al...@influxdata.com> wrote: > >> +1 (the tests passed for me). I have left a comment on >> https://github.com/apache/arrow-datafusion/issues/8069 >> >> On Mon, Nov 6, 2023 at 2:02 PM Andy Grove <andygrov...@gmail.com> wrote: >> >> > I filed https://github.com/apache/arrow-datafusion/issues/8069 >> > >> > On Mon, Nov 6, 2023 at 11:59 AM Andy Grove <andygrov...@gmail.com> >> wrote: >> > >> > > I see the same error when I run on my M1 Macbook Air with 16 GB RAM. >> > > >> > > ---- aggregates::tests::run_first_last_multi_partitions stdout ---- >> > > Error: ResourcesExhausted("Failed to allocate additional 632 bytes for >> > > GroupedHashAggregateStream[0] with 1829 bytes already allocated - >> maximum >> > > available is 605") >> > > >> > > It worked fine on my workstation with 128 GB RAM. >> > > >> > > >> > > >> > > On Mon, Nov 6, 2023 at 11:23 AM L. C. Hsieh <vii...@gmail.com> wrote: >> > > >> > >> Hmm, ran verification script and got one failure: >> > >> >> > >> failures: >> > >> >> > >> ---- aggregates::tests::run_first_last_multi_partitions stdout ---- >> > >> Error: ResourcesExhausted("Failed to allocate additional 632 bytes for >> > >> GroupedHashAggregateStream[0] with 1829 bytes already allocated - >> > >> maximum available is 605") >> > >> >> > >> failures: >> > >> aggregates::tests::run_first_last_multi_partitions >> > >> >> > >> test result: FAILED. 557 passed; 1 failed; 1 ignored; 0 measured; 0 >> > >> filtered out; finished in 2.21s >> > >> >> > >> >> > >> >> > >> On Mon, Nov 6, 2023 at 6:57 AM Andy Grove <andygrov...@gmail.com> >> > wrote: >> > >> > >> > >> > Hi, >> > >> > >> > >> > I would like to propose a release of Apache Arrow DataFusion >> > >> Implementation, >> > >> > version 33.0.0. >> > >> > >> > >> > This release candidate is based on commit: >> > >> > 262f08778b8ec231d96792c01fc3e051640eb5d4 [1] >> > >> > The proposed release tarball and signatures are hosted at [2]. >> > >> > The changelog is located at [3]. >> > >> > >> > >> > Please download, verify checksums and signatures, run the unit >> tests, >> > >> and >> > >> > vote >> > >> > on the release. The vote will be open for at least 72 hours. >> > >> > >> > >> > Only votes from PMC members are binding, but all members of the >> > >> community >> > >> > are >> > >> > encouraged to test the release and vote with "(non-binding)". >> > >> > >> > >> > The standard verification procedure is documented at >> > >> > >> > >> >> > >> https://github.com/apache/arrow-datafusion/blob/main/dev/release/README.md#verifying-release-candidates >> > >> > . >> > >> > >> > >> > [ ] +1 Release this as Apache Arrow DataFusion 33.0.0 >> > >> > [ ] +0 >> > >> > [ ] -1 Do not release this as Apache Arrow DataFusion 33.0.0 >> > because... >> > >> > >> > >> > Here is my vote: >> > >> > >> > >> > +1 >> > >> > >> > >> > [1]: >> > >> > >> > >> >> > >> https://github.com/apache/arrow-datafusion/tree/262f08778b8ec231d96792c01fc3e051640eb5d4 >> > >> > [2]: >> > >> > >> > >> >> > >> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-33.0.0-rc1 >> > >> > [3]: >> > >> > >> > >> >> > >> https://github.com/apache/arrow-datafusion/blob/262f08778b8ec231d96792c01fc3e051640eb5d4/CHANGELOG.md >> > >> >> > > >> > >>