Hi Kou and Dewey,

Thank you very much for your very thorough and detailed responses to all of our 
questions. This is extremely valuable feedback and the points that you made 
make alot of sense.

Sarah and I talked this over a bit more and we think that sticking with the 
overall apache/arrow project release cycle (i.e. stay in line with 15.0.0) 
makes the most sense in the long term.

@Dewey - thanks very much for highlighting the pros and cons of creating a 
separate repository. We also really appreciate the community being willing to 
try and support our development needs. That being said, we think it is probably 
best to stay in-model with the main apache/arrow release process for the time 
being rather than creating a separate repository for the MATLAB interface.

To address some related points and questions:

> Can we just mention "This is not stable yet!!!" in the documentation instead 
> of using isolated version?

Yes. This is good point and we already have a disclaimer in the README.md [1] 
for the MATLAB interface which says: "Warning The MATLAB interface is under 
active development and should be considered experimental."

> It's better that we use CI for this like other binary packages such as 
> .deb/.rpm/.wheel/.jar/...

This makes sense and we agree. We will follow up with PRs to add the necessary 
MATLAB packaging scripts and CI workflow files.

> Does the MLTBX file include Apache Arrow C++ binaries too like .wheel/.jar?

Yes. The MLTBX file will package the Apache Arrow C++ binaries, similar to the 
Java JARs / Python wheels.

> MATLAB doesn't provide the official package repository such as PyPI for 
> Python and https://rubygems.org/ for Ruby, right?

The equivalent to pypi.org or rubygems.org for MATLAB would be the MathWorks 
File Exchange [2].

> If the official package repository for MATLAB doesn't exist, JFrog is better 
> because the MLTBX file will be large (Apache Arrow C++ binaries are large).

As noted above, the "official package repository" for MATLAB would be the 
MathWorks File Exchange. File Exchange has tight integration with GitHub [3]. 
When a new release is available in GitHub Releases, the associated File 
Exchange entry will be automatically updated.

We believe we could leverage this integration between File Exchange and GitHub 
Releases to automate the MATLAB interface release process. This approach might 
look like:

1. Upload MLTBX to JFrog Artifactory
2. Run a post release script that would:
2.1 Download MLTBX from JFrog Artifactory
2.2 Upload to GitHub Releases (e.g. apache/arrow-matlab - see discussion below)
2.3 Linked File Exchange entry will be automatically updated

One open question about this approach: which GitHub repository should we use 
for hosting the MLTBX via GitHub Releases?

We don't think using the main apache/arrow GitHub Releases area is the right 
approach. So, would it make sense to create a separate "bridge" repository just 
for hosting the latest MLTBX files? Should this be an ASF associated repository 
like apache/arrow-matlab or would a MathWorks associated repository like 
mathworks/arrow-matlab be OK? We aren't sure what makes the most sense here, 
but welcome any suggestions.

> We may want to use the status page for it: 
> https://arrow.apache.org/docs/status.html

Thanks for highlighting this. This makes sense, and we can follow up with a PR 
to add MATLAB to the status page.

> How about creating https://arrow.apache.org/docs/matlab/ ? We can use Sphinx 
> like the Python docs https://arrow.apache.org/docs/python/ or another 
> documentation tools like the R docs https://arrow.apache.org/docs/r/ . If we 
> use Sphinx, we can create 
> https://github.com/apache/arrow/tree/main/docs/source/matlab/

This makes sense and eventually we want to have comprehensive documentation in 
line with other language bindings using Sphinx. In addition to comprehensive 
documentation, we were also hoping that we could host release notes in a place 
that is easily accessible from the MLTBX download location. File Exchange 
entries have a "Version History" which includes release notes from the 
"backing" GitHub Releases area. So, this would probably be a sensible location 
to put the release notes. Also, including MATLAB updates in Apache Arrow 
release blog posts (e.g. 
https://arrow.apache.org/blog/2023/11/01/14.0.0-release/) may also be helpful.

--

We really appreciate all of the community's guidance on navigating the release 
process!

We will get started on integrating with the existing release tooling.

[1] https://github.com/apache/arrow/tree/main/matlab#status
[2] https://www.mathworks.com/matlabcentral/fileexchange
[3] https://www.mathworks.com/matlabcentral/content/fx/about.html#Why_GitHub

Best Regards,

Kevin Gurney
________________________________
From: Dewey Dunnington <de...@voltrondata.com.INVALID>
Sent: Tuesday, November 7, 2023 8:53 PM
To: dev@arrow.apache.org <dev@arrow.apache.org>
Cc: Sarah Gilmore <sgilm...@mathworks.com>; Lei Hou <lei...@mathworks.com>
Subject: Re: [DISCUSS][MATLAB] Proposal for incremental point releases of the 
MATLAB interface

For argument's sake, I might suggest that the process you described in
your initial note would probably work best in another repo: you would
be able to iterate faster and release/version at your own pace. The
flexibility you get from moving to a separate repo comes at the cost
of extra responsibility: you have to set up your own CI, manage your
own issues, and set up your own release verification scripts + release
votes on the mailing list. Because you bind Arrow C++, you would have
to take sufficient steps to ensure that the Arrow C++ developers are
made aware of changes that break the Matlab bindings and vice versa
(i.e., test against dev Arrow C++ in a CI job).

Setting up that infrastructure for apache/arrow-nanoarrow took ~a week
of development time, and it now takes ~half a day to release a new
version (it took more for the first few versions, and the matlab
version has considerably higher complexity). Probably the biggest
barrier to releasing from another repo is that you have to ensure a
critical mass of PMC members can/will run your release verification
script and vote.

I happen to feel that it's the PMC's/wider community's responsibility
to help language binding contributors adopt a workflow that suits
their needs. If active Matlab contributors agree that they want to
release version 0.1 from another repo, (I feel that) we're here to
help you do that. If the active contributors want to stay in
apache/arrow, there is less flexibility about what you release and
when; however, the release process is well-defined.

On Tue, Nov 7, 2023 at 8:43 PM Sutou Kouhei <k...@clear-code.com> wrote:
>
> Hi,
>
> > As a point of reference, we noticed that PyArrow is on
> > version 14.0.0, but it feels "misleading" to say that the
> > MATLAB interface is at version 14.0.0 when we haven't yet
> > implemented or stabilized all core Arrow APIs.
>
> I can understand this but I suggest that we use the same
> version as other packages in apache/arrow. Because:
>
> * Using isolated version increases release complexity.
> * Using isolated version may introduce another
> "misleading"/"confusion": For example, "the MATLAB
> interface 1.0.0 uses Apache Arrow C++ 20.0.0" may be
> misleading/confused:
> * The MATLAB interface 1.0.0 doesn't use Apache Arrow C++
> 1.0.0.
> * It may be difficult to find the corresponding
> Apache Arrow C++ version from the MATLAB interface
> version.
>
> Can we just mention "This is not stable yet!!!" in the
> documentation instead of using isolated version?
>
> We may want to use the status page for it:
> https://arrow.apache.org/docs/status.html<https://arrow.apache.org/docs/status.html>
>
> > 1. Manually build the MATLAB interface on Windows, macOS, and Linux
>
> It's better that we use CI for this like other binary
> packages such as .deb/.rpm/.wheel/.jar/...
>
> If we release the MATLAB interface separately, which Apache
> Arrow C++ version is used? If we release the MATALB
> interface right now, is Apache Arrow C++ 14.0.0 (the latest
> release) used or is Apache Arrow C++ main (not released yet)
> used? The MATLAB interface on main will depend on Apache
> Arrow C++ main, we may not be able to use the latest release
> for the MATLAB interface on main.
>
> > 2. Combine all of the cross platform build artifacts into
> > a single MLTBX file [1] for distribution
>
> Does the MLTBX file include Apache Arrow C++ binaries too
> like .wheel/.jar?
>
> > 3. Host the MLTBX somewhere that is easliy accessible for download
>
> MATLAB doesn't provide the official package repository such
> as PyPI for Python and https://rubygems.org/<https://rubygems.org> for Ruby, 
> right?
>
> > 1. Is there a recommended location where we can host the MLTBX file? e.g. 
> > GitHub Releases [2], JFrog [3], etc.?
>
> If the official package repository for MATLAB doesn't exist,
> JFrog is better because the MLTBX file will be large (Apache
> Arrow C++ binaries are large).
>
> > 2. Is there a recommended location for hosting release notes?
>
> How about creating 
> https://arrow.apache.org/docs/matlab/<https://arrow.apache.org/docs/matlab> ?
> We can use Sphinx like the Python docs
> https://arrow.apache.org/docs/python/<https://arrow.apache.org/docs/python> 
> or another
> documentation tools like the R docs
> https://arrow.apache.org/docs/r/<https://arrow.apache.org/docs/r> .
> If we use Sphinx, we can create
> https://github.com/apache/arrow/tree/main/docs/source/matlab/<https://github.com/apache/arrow/tree/main/docs/source/matlab>
> .
>
> > 3. Is there a recommended cadence for incremental point releases?
>
> I suggest avoiding separated release as above.
>
> > 4. Are there any notable ASF procedures [4] [5] (e.g. voting on a new 
> > release proposal) that we should be aware of as we consider creating an 
> > initial release?
>
> We don't need additional task for an initial release.
>
> > 5. How should the Arrow project release (i.e. 14.0.0)
> > relate to the MATLAB interface version (i.e. 0.1)? As a
> > point of reference, we noticed that PyArrow is on
> > version 14.0.0, but it feels "misleading" to say that
> > the MATLAB interface is at version 14.0.0 when we
> > haven't yet implemented or stabilized all core Arrow
> > APIs. Is there any precedent for using independent
> > release versions for language bindings which are not
> > fully stabilized and are also part of the main
> > apache/arrow repository?
>
> We don't have any precedent for using independent release
> versions for language bindings. All language bindings used
> the same version.
>
> Apache Arrow JavaScript isn't a language bindings but it
> used separated release and isolated versions before
> 0.4.1. It joined apache/arrow release after 0.4.1. (The next
> version of Apache Arrow JavaScript 0.4.1 is 13.0.0.)
>
> > We've noticed that Arrow-related projects which are not
> > part of the main apache/arrow GitHub repository
> > (e.g. DataFusion) follow a mailing list-based voting and
> > release process. However, it's not clear whether it makes
> > sense to follow this process for the MATLAB interface
> > since it is part of the main apache/arrow repository.
>
> If we want to use separated release for the MATLAB
> interface, we should follow the same release process as
> apache/arrow and other apache/arrow-* because it's the
> standard ASF release process.
>
>
> Thanks,
> --
> kou
>
> In 
> <mn2pr05mb649619998eae9579cceba692ae...@mn2pr05mb6496.namprd05.prod.outlook.com>
> "[DISCUSS][MATLAB] Proposal for incremental point releases of the MATLAB 
> interface" on Tue, 7 Nov 2023 20:31:31 +0000,
> Kevin Gurney <kgur...@mathworks.com.INVALID> wrote:
>
> > Hi All,
> >
> > A considerable amount of new functionality has been added to the MATLAB 
> > interface over the last few months. We appreciate all the community's 
> > support in making this possible and are happy to see all the progress that 
> > is being made.
> >
> > At this point, we would like to create an initial "0.1" release of the 
> > MATLAB interface. Incremental point releases will enable MATLAB users to 
> > provide early feedback. In addition, learning how to navigate the release 
> > process is an important step towards eventually releasing a stable 1.0 
> > version of the MATLAB interface.
> >
> > Our proposed approach to creating an initial release would be to:
> >
> > 1. Manually build the MATLAB interface on Windows, macOS, and Linux
> > 2. Combine all of the cross platform build artifacts into a single MLTBX 
> > file [1] for distribution
> > 3. Host the MLTBX somewhere that is easliy accessible for download
> >
> > For reference - MLTBX is a standard packaging format for MATLAB which 
> > enables simple "one-click" installation - analogous to a Python pip package 
> > or a Ruby gem.
> >
> > Creating an MLTBX file manually should be relatively low effort. However, 
> > in the long term, we would love to enable semi-automated "push button" 
> > releases via GitHub Actions (and possibly even "nightly builds").
> >
> > Since this is our first time creating a release of the MATLAB interface, we 
> > wanted to draw on the community's expertise to answer a few questions:
> >
> > 1. Is there a recommended location where we can host the MLTBX file? e.g. 
> > GitHub Releases [2], JFrog [3], etc.?
> > 2. Is there a recommended location for hosting release notes?
> > 3. Is there a recommended cadence for incremental point releases?
> > 4. Are there any notable ASF procedures [4] [5] (e.g. voting on a new 
> > release proposal) that we should be aware of as we consider creating an 
> > initial release?
> > 5. How should the Arrow project release (i.e. 14.0.0) relate to the MATLAB 
> > interface version (i.e. 0.1)? As a point of reference, we noticed that 
> > PyArrow is on version 14.0.0, but it feels "misleading" to say that the 
> > MATLAB interface is at version 14.0.0 when we haven't yet implemented or 
> > stabilized all core Arrow APIs. Is there any precedent for using 
> > independent release versions for language bindings which are not fully 
> > stabilized and are also part of the main apache/arrow repository?
> >
> > We've noticed that Arrow-related projects which are not part of the main 
> > apache/arrow GitHub repository (e.g. DataFusion) follow a mailing 
> > list-based voting and release process. However, it's not clear whether it 
> > makes sense to follow this process for the MATLAB interface since it is 
> > part of the main apache/arrow repository.
> >
> > We sincerely appreciate the community's help and guidance on this topic!
> >
> > Please let us know if you have any questions.
> >
> > [1] 
> > https://www.mathworks.com/help/matlab/creating-help.html?s_tid=CRUX_lftnav
> > [2] 
> > https://github.com/apache/arrow/releases<https://github.com/apache/arrow/releases>
> > [3] 
> > https://apache.jfrog.io/ui/native/arrow/<https://apache.jfrog.io/ui/native/arrow>
> > [4] 
> > https://www.apache.org/foundation/voting.html<https://www.apache.org/foundation/voting.html>
> > [5] 
> > https://www.apache.org/legal/release-policy.html#release-approval<https://www.apache.org/legal/release-policy.html#release-approval>
> >
> > Best Regards,
> >
> > Kevin Gurney

Reply via email to