Hi Arrow devs, There's some bugs in the Parquet implementation which affect reading of data:
- https://issues.apache.org/jira/browse/ARROW-11269, which was opened today, and I just saw now. - an issue with list schema nulls from the parquet-format's logical types. In this case, we misinterpret the nullness of lists read from parquet-mr, potentially leading to incorrect data being read. I discovered the second bug while bashing my head trying to fix a bug in the Parquet writer (sadly spent very long on it). Anyways, I would like to work on PRs for the above 2 bugs tonight and tomorrow. @Krisztián @Andrew Lamb <al...@influxdata.com> , would we still be able to merge them in time? I've also seen the offset issues in equality checks, and am going to review/help out with them tomorrow. I haven't been feeling very well this week, so I haven't been spending much time working on Arrow. Thanks Neville On Sat, 16 Jan 2021 at 16:34, Krisztián Szűcs <szucs.kriszt...@gmail.com> wrote: > On Sat, Jan 16, 2021 at 12:51 PM Andrew Lamb <al...@influxdata.com> wrote: > > > > I just saw the RC0 candidate email -- thanks Krisztián. > > > > Does the RC0 mean that any subsequent merges to master can now proceed > > without affecting the 3.0.0 branch? > Technically we don't have a 3.0 release branch, but we can always create > one. > So yes, the merges can proceed on master. > > Thanks, Krisztian > > > > On Fri, Jan 15, 2021 at 10:22 AM Krisztián Szűcs < > szucs.kriszt...@gmail.com> > > wrote: > > > > > The spark integration test fails against spark 3.0.1 with > > > > > > 12:21:51.996 WARN org.apache.spark.scheduler.TaskSetManager: Lost task > > > 1.0 in stage 0.0 (TID 1, 5fc0f8cfe8d2, executor driver): > > > java.lang.NoClassDefFoundError: Could not initialize class > > > org.apache.spark.sql.util.ArrowUtils$ > > > ... > > > Caused by: java.lang.RuntimeException: No DefaultAllocationManager > > > found on classpath. Can't allocate Arrow buffers. Please consider > > > adding arrow-memory-netty or arrow-memory-unsafe as a dependency. > > > > > > Since this change was introduced in > > > > > > > https://github.com/apache/arrow/commit/2092e18752a9c0494799493b12eb1830052217a2 > > > which is already a part of arrow's 2.0 release, I guess this is not a > > > blocker (or at least the changes are required on spark's side?). > > > > > > Either way, I'm going to proceed with the release. > > > > > > > > > On Fri, Jan 15, 2021 at 2:53 PM Andrew Lamb <al...@influxdata.com> > wrote: > > > > > > > > That is great news Krisztián -- thank you > > > > > > > > On Fri, Jan 15, 2021 at 6:50 AM Krisztián Szűcs < > > > szucs.kriszt...@gmail.com> > > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > My plan is to cut RC0 today, just want to make sure that the spark > > > > > integration test works with spark's latest release. > > > > > > > > > > Thanks, Krisztian > > > > > > > > > > On Fri, Jan 15, 2021 at 12:35 PM Andrew Lamb <al...@influxdata.com > > > > > wrote: > > > > > > > > > > > > Hi, > > > > > > > > > > > > I apologize if I have missed this detail on previous emails; I > > > wonder if > > > > > > there is any estimate of when the Arrow 3.0 release might be > > > finalized. > > > > > > > > > > > > The Rust implementation has a few PRs we have been holding off > > > merging > > > > > > until the release goes out and I wanted to know if there was any > > > > > estimated > > > > > > timeline. > > > > > > > > > > > > The wiki shows no blocking JIRA items (nice work everyone!) any > > > longer: > > > > > > > > > https://cwiki.apache.org/confluence/display/ARROW/Arrow+3.0.0+Release > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > Andrew > > > > > > > > >