>
> - Joris believes we can go ahead and do this; the Parquet Rust
> implementation did something similar

Small note here, IIRC the origins of the code in Rust and Parquet are
different.  Rust Parquet was donated directly to the Arrow project and
developed under its auspices after donation.  Parquet-cpp integration at
the time was done with the agreement that it would still live under
governance of the Parquet PMC (with the hope of it getting split out again
at some point).  I think there has been enough code creep here that without
a significant amount of work separating out parquet C++ back out of Arrow
is likely not tenable.

I pinged the thread again to see if we can get the parquet PMC to weigh in
here.



On Wed, Apr 12, 2023 at 12:39 PM Ian Cook <i...@ursacomputing.com> wrote:

> Below is a summary of the notes from today's meeting:
>
> Attendees:
>
> - Ian Cook
> - Raúl Cumplido
> - Xuwei Fu
> - Will Jones
> - Bryce Mecum
> - Rok Mihevc
> - Sri Nadukudy
> - Ashish Paliwal
> - Dane Pitkin
> - David Dali Susanibar Arce
> - Matthew Topol
> - Joris Van den Bossche
> - Jacob Wujciak
>
>
> Discussion:
>
> 12.0.0 release
>
> - Code freeze is scheduled for later today, April 12
> - There are many nightly failures currently on main; Raúl and Jacob
> have opened several blocker issues and we might need to create more
> - Discussion of several current issues that might affect the release
>    - C# tests not finding Python
>    - PyArrow tests slowness on Windows [1]
>    - PyArrow wheels on Windows not uploading to Gemfury
> - Important items to mention in release changelog, release blog, etc.
>   - Drop support for Ubuntu 18.04 [2]
>   - Acero refactor (splitting Acero out from core Arrow library) [3]
>   - Fixed shape tensor extension type [4]
>   - Run-end encoded layout [5]
>   - Plasma removal [6] and suggested alternatives [7]
>   - Reminder about Jira to GitHub move (which happened just before the
> 11.0.0 release)
>   - Initial Swift implementation [8]
>   - nanoarrow (not technically a part of this release, but worth
> drawing attention to) [9]
>   - Also see ASF board report
>
>
> Parquet tickets are still tracked in the ASF Jira
>
> - We have to maintain a lot of code in Archery, etc. to automate the
> tracking of Parquet C++ issues which are still in Jira, even though
> there are only a few Parquet issues in each release (4 for 12.0.0)
>   - PARQUET-2201 Add stress test for RecordReader ReadRecords and
> SkipRecords. (#14879)
>   - PARQUET-2225 Allow reading dense with RecordReader (#17877)
>   - PARQUET-2232 Add an api to ColumnChunkMetaData to indicate if the
> column chunk uses a bloom filter (#33736)
>   - PARQUET-2250 Expose column descriptor through RecordReader (#34318)
> - Can we move the Parquet C++ issues from the ASF Jira to GitHub?
> - Joris believes we can go ahead and do this; the Parquet Rust
> implementation did something similar
> - There are already some Parquet issues that were reported and
> resolved in the Arrow monorepo in this release without ever being
> opened as Parquet Jira issues [10]
> - Check with Micah Kornfield, Fatemah Panah
> - There was a related Parquet mailing list discussion about this in
> February [11]
>
>
> [1] https://github.com/apache/arrow/issues/35078
> [2] https://github.com/apache/arrow/issues/33800
> [3] https://lists.apache.org/thread/5h5g9k9lvbybzl8fnbg4fppxczm42g6r
> [4]
> https://arrow.apache.org/docs/dev/format/CanonicalExtensions.html#fixed-shape-tensor
> [5]
> https://arrow.apache.org/docs/format/Columnar.html#run-end-encoded-layout
> [6] https://github.com/apache/arrow/pull/34718
> [7] https://lists.apache.org/thread/lk277x3b9gjol42sjg27bst2ggm5s0j2
> [8] https://github.com/apache/arrow/issues/20484
> [9] https://arrow.apache.org/blog/2023/03/07/nanoarrow-0.1.0-release/
> [10]
> https://github.com/apache/arrow/issues?q=is%3Aissue+label%3A%22Component%3A+Parquet%22+is%3Aclosed
> [11] https://lists.apache.org/thread/jf9wos3t6xxk6xdyx2dof1jlkbpkr56p
>
>
> On Tue, Apr 11, 2023 at 5:35 PM Ian Cook <i...@ursacomputing.com> wrote:
> >
> > Hi all,
> >
> > Our biweekly Arrow community meeting is tomorrow at 16:00 UTC / 12:00
> EDT.
> >
> > Zoom meeting URL:
> > https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09
> > Meeting ID: 876 4903 3008
> > Passcode: 958092
> >
> > The notes for this and future instances of this meeting will be
> > captured in this Google Doc:
> >
> https://docs.google.com/document/d/1xrji8fc6_24TVmKiHJB4ECX1Zy2sy2eRbBjpVJMnPmk/
> > If you plan to attend this meeting, you are welcome to edit the
> > document to add the topics that you would like to discuss.
> >
> > Thanks,
> > Ian
>

Reply via email to