[
https://issues.apache.org/jira/browse/ARROW-14039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418306#comment-17418306
]
Weston Pace edited comment on ARROW-14039 at 9/21/21, 8:20 PM:
---------------------------------------------------------------
> being able to run in low resource settings enables wider adoption as a
> standard backend
I may be misunderstanding still but I think the discussion is about building
and not running. I absolutely agree that Arrow should be able to run with
minimal memory and that might be worth defining a limit for.
> for example R will download and build Arrow on Linux when one installs the R
> bindings.
I believe R always compiles the bindings but it shouldn't compile arrow-cpp if
the package is already present. For example, if the user has already installed
the CentOS8 Arrow package from the EPEL. The one exception might be golang
(statically compiles everything) but it has pretty strong cross compilation
support.
> The requirement is not meant to be very precise, more a suggestion as to what
> to expect. It is possible to add memory use monitoring to the CI builds,
> though again this would need maintenance. We want someone installing Arrow
> (at least the debug build as virtual machines with less than 1Gb are rare) to
> know that if the build is proceeding very slowly and they have limited RAM,
> swapping from RAM is the likely reason for the slow build.
What if we just add a generic statement:
_Arrow C++ is a complex project that needs to handle many different data types,
vectorization architectures, and compiler differences. Building Arrow C++
requires a considerable amount of CPU and RAM. When installing Arrow on a
system with limited resources we recommend compiling the binaries on a capable
build machine or downloading prebuilt binaries from package managers._
If you want to replace "considerable amount of CPU and RAM" with "potentially
more than 4GB of RAM" (or insert your number here) I wouldn't really be
opposed. I think my concern would be more with a phrase like "at most 4GB of
RAM" because we have no way of reliably backing that up other than "On these
build machines with these configurations it took less than 4GB" and that isn't
really the same thing.
was (Author: westonpace):
> being able to run in low resource settings enables wider adoption as a
> standard backend
I may be misunderstanding still but I think the discussion is about building
and not running. I absolutely agree that Arrow should be able to run with
minimal memory and that might be worth defining a limit for.
> for example R will download and build Arrow on Linux when one installs the R
> bindings.
I believe R always compiles the bindings but it shouldn't compile arrow-cpp if
the package is already present. For example, if the user has already installed
the CentOS8 Arrow package from the EPEL. The one exception might be golang
(statically compiles everything) but it has pretty strong cross compilation
support.
> The requirement is not meant to be very precise, more a suggestion as to what
> to expect. It is possible to add memory use monitoring to the CI builds,
> though again this would need maintenance. We want someone installing Arrow
> (at least the debug build as virtual machines with less than 1Gb are rare) to
> know that if the build is proceeding very slowly and they have limited RAM,
> swapping from RAM is the likely reason for the slow build.
What if we just add a generic statement:
_Arrow C++ is a complex project that needs to handle many different data types,
vectorization architectures, and compiler differences. Building Arrow C++
requires a considerable amount of CPU and RAM. When installing Arrow on a
system with limited resources we recommend compiling the binaries on a capable
build machine or downloading prebuilt binaries from package managers.
_
If you want to replace "considerable amount of CPU and RAM" with "potentially
more than 4GB of RAM" (or insert your number here) I wouldn't really be
opposed. I think my concern would be more with a phrase like "at most 4GB of
RAM" because we have no way of reliably backing that up other than "On these
build machines with these configurations it took less than 4GB" and that isn't
really the same thing.
> [C++] [Docs] Indicate memory required for installation
> ------------------------------------------------------
>
> Key: ARROW-14039
> URL: https://issues.apache.org/jira/browse/ARROW-14039
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Benson Muite
> Assignee: Benson Muite
> Priority: Trivial
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Would be helpful to add typical memory required for installation. A single
> core is sufficient for processing power, monitoring with SAR indicates that
> about 3 Gb of RAM are needed for debug build and 1Gb of RAM for release build.
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)