Would it be possible to create some JIRA issues and/or documents (e.g.
a Google document) to enumerate the tasks required to enable R users
to install the package on each target platform? Based on the
information available on the Arrow mailing list, it isn't clear to me
at all where things stand. It would be helpful for everyone to be able
to have visibility into the progress toward that goal

On Sun, Jan 6, 2019 at 11:30 AM Wes McKinney <wesmck...@gmail.com> wrote:
>
> hi,
>
> On Sun, Jan 6, 2019 at 11:13 AM Jeroen Ooms <jeroeno...@gmail.com> wrote:
> >
> > On Sun, Jan 6, 2019 at 5:39 PM Wes McKinney <wesmck...@gmail.com> wrote:
> > >
> > > hi Jeroen,
> > >
> > > On Sun, Jan 6, 2019 at 10:28 AM Jeroen Ooms <jeroeno...@gmail.com> wrote:
> > > >
> > > > On 2019/01/02 17:08:58, Wes McKinney <w...@gmail.com> wrote:
> > > > > hi folks,>
> > > > >
> > > > > With 0.12 around the corner and significant progress on the R 
> > > > > bindings>
> > > > > project (sufficient for Spark integration [1]), I am wondering how>
> > > > > everyday R users are going to be able to install the software>
> > > > > respectively on Linux, macOS, and Windows. Thoughts about the 
> > > > > strategy>
> > > > > for this?>
> > > >
> > > > The R packaging is a bit different than python. For Windows and macOS,
> > > > we can statically link external libs into the R package, to ship a
> > > > standalone binary R package without any runtime dependencies. On
> > > > Linux, R requires the system package manager (apt/yum) to provide
> > > > external libs. The R package manager doesn't work well with libs from
> > > > Conda.
> > >
> > > How do R libraries handle (or not handle) symbol conflicts if
> > > everything is statically linked?
> >
> > Not sure what you mean. R packages on Mac/Win statically their system
> > dependencies; there should be no interference with other packages. In
> > the case of arrow, we build the R package using libarrow.a (which
> > already contains the required boost libs), and then the resulting R
> > binary package consists a single dll/dylib containing both the R
> > bindings + libarrow, without any external runtime dll dependencies.
> >
>
> To limit the scope of the question, to read Parquet files, libarrow.a
> has a few transitive dependencies
>
> * zlib
> * snappy
> * Thrift
>
> There are some other incoming dependencies that can't be avoided in
> the future (that is, if R wants to be a first-class citizen in where
> this project is headed)
>
> * LLVM
> * re2
> * More compressors: bz2, zstd, lz4
> * gRPC (which depends on Protobuf, OpenSSL)
>
> Basically, the entire list in
> https://github.com/apache/arrow/tree/master/cpp/thirdparty#arrow-c-thirdparty-dependencies.
>
> >
> > > There might be some collaboration opportunity with Kouhei or others
> > > who have been working on msys2 packaging, which AFAIK is going to be
> > > nearly the same toolchain
> >
> > Yes I based the build on Kouhei's build script (see the first line of
> > the PKGBUILD file in the rwinlib repo), however I disabled some extra
> > features which complicate the process, so that it looks more like the
> > homebrew configuration.
> >
> > > Keep in mind that the #1 use case for the Python package right now is
> > > to read and write Parquet files, which requires compression libraries
> > > and Thrift. In the short term, I would expect the same to be true of
> > > the R package, so failing to package Parquet will mean to cripple the
> > > package.
> >
> > Which compression libraries exactly do we need to build with parquet
> > support? Can we build arrow using vendored thrift, or do we need to
> > build thrift separately? If this is important, we should send a PR to
> > homebrew to enable this feature in their builds.
> >
> > I am not familiar with arrow yet, how do I test if parquet works using
> > the R package?
>
> A first patch for this was merged here
> https://github.com/apache/arrow/commit/5723adad7ad80c95ba8fcb55d40186d6a29edb74
>
> >
> > > How would you propose to make this happen on a practical timeline (3
> > > months or less)? This requirement (getting packages into an official
> > > Linux distro) is significantly more onerous than any of the other
> > > platforms we are packaging for.
> >
> > You need to find a Debian maintainer that is willing to upload the
> > package. I don't know the details of the process either. I think the
> > .deb has to pass lintian and they require some degree of API
> > stability. If you plan to make backward incompatible api changes in
> > arrow 0.13, then publishing to Debian may be premature.
>
> At a glance this seems problematic. I think we're going to have to
> find a way around this to depend on .deb/.rpm packages from Bintray or
> something similar. If it turns out that R package can only depend on
> API-stable binaries on Linux, that seems like an issue that should be
> raised outside of this project
>
> - Wes

Reply via email to