[
https://issues.apache.org/jira/browse/ARROW-14677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17445361#comment-17445361
]
Martin Morgan commented on ARROW-14677:
---------------------------------------
To be a bit more complete, when I look at libraries in the package installed
from CRAN I see
{code:java}
> system2('otool', c('-L', system.file('libs/arrow.so', package='arrow')))
/Users/ma38727/Library/R/x86_64/4.1/library/arrow/libs/arrow.so:
arrow.so (compatibility version 0.0.0, current version 0.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version
1252.50.4)
/usr/lib/libcurl.4.dylib (compatibility version 7.0.0, current version
9.0.0)
/Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libR.dylib
(compatibility version 4.1.0, current version 4.1.0)
/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation
(compatibility version 150.0.0, current version 1452.23.0)
/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version
400.9.0){code}
When installing, here's what I see
{code:java}
Using autobrew bundle: apache-arrow-6.0.0-high_sierra.tar.xz
PKG_CFLAGS=-I/private/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T/RtmpTkKYB6/R.INSTALL4c0b3a2ebab4/arrow/.deps/include
-DARROW_R_WITH_PARQUET -DARROW_R_WITH_DATASET -DARROW_R_WITH_JSON
-DARROW_R_WITH_S3 -DARROW_R_WITH_ARROW
PKG_LIBS=-L/private/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T/RtmpTkKYB6/R.INSTALL4c0b3a2ebab4/arrow/.deps/lib
-lparquet -larrow_dataset -larrow -larrow_bundled_dependencies -lthrift -llz4
-lsnappy -lzstd -laws-cpp-sdk-config -laws-cpp-sdk-transfer
-laws-cpp-sdk-identity-management -laws-cpp-sdk-cognito-identity
-laws-cpp-sdk-sts -laws-cpp-sdk-s3 -laws-cpp-sdk-core -laws-c-event-stream
-laws-checksums -laws-c-common -lpthread -lcurl{code}
In compile, I see lines like
{code:java}
g++ -std=gnu++11 -I"/Users/ma38727/bin/R-devel/include" -DNDEBUG
-I/private/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T/RtmpTkKYB6/R.INSTALL4c0b3a2ebab4/arrow/.deps/include
-DARROW_R_WITH_PARQUET -DARROW_R_WITH_DATASET -DARROW_R_WITH_JSON
-DARROW_R_WITH_S3 -DARROW_R_WITH_ARROW -I../inst/include/ -I/usr/local/include
-fPIC -g -O2 -c RTasks.cpp -o RTasks.o{code}
At linking, I see
{code:java}
g++ -std=gnu++11 -dynamiclib -Wl,-headerpad_max_install_names -undefined
dynamic_lookup -single_module -multiply_defined suppress
-L/Users/ma38727/bin/R-devel/lib -L/usr/local/lib -o arrow.so RTasks.o altrep.o
array.o array_to_vector.o arraydata.o arrowExports.o buffer.o chunkedarray.o
compression.o compute-exec.o compute.o config.o csv.o dataset.o datatype.o
expression.o feather.o field.o filesystem.o imports.o io.o json.o memorypool.o
message.o parquet.o py-to-r.o r_to_arrow.o recordbatch.o recordbatchreader.o
recordbatchwriter.o scalar.o schema.o symbols.o table.o threadpool.o
type_infer.o
-L/private/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T/RtmpTkKYB6/R.INSTALL4c0b3a2ebab4/arrow/.deps/lib
-lparquet -larrow_dataset -larrow -larrow_bundled_dependencies -lthrift -llz4
-lsnappy -lzstd -laws-cpp-sdk-config -laws-cpp-sdk-transfer
-laws-cpp-sdk-identity-management -laws-cpp-sdk-cognito-identity
-laws-cpp-sdk-sts -laws-cpp-sdk-s3 -laws-cpp-sdk-core -laws-c-event-stream
-laws-checksums -laws-c-common -lpthread -lcurl
-L/Users/ma38727/bin/R-devel/lib -lR -lintl -Wl,-framework
-Wl,CoreFoundation{code}
A promising commit (unrelated to autobrew) is
[https://github.com/apache/arrow/commit/225d9547d2363bd0eb8c85bdd0dd98a6014069d7]
but trying to install a nightly build lead to
{code:java}
install.packages("arrow", repos = "https://arrow-r-nightly.s3.amazonaws.com")
Installing package into '/Users/ma38727/Library/R/4.2/Bioc/3.15/library'
(as 'lib' is unspecified)
trying URL
'https://arrow-r-nightly.s3.amazonaws.com/src/contrib/arrow_6.0.0.20211116.tar.gz'
Content type 'binary/octet-stream' length 4562535 bytes (4.4 MB)
==================================================
downloaded 4.4 MB
* installing *source* package 'arrow' ...
** using staged installation
*** Downloading apache-arrow
**** Using local manifest for apache-arrow
Wed Nov 17 11:51:53 EST 2021: Auto-brewing apache-arrow in
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow...
==> Tapping autobrew/core from https://github.com/autobrew/homebrew-core
Tapped 2 commands and 4636 formulae (4,885 files, 12.7MB).
aws-sdk-cpp
lz4
snappy
openssl
thrift
zstd
==> Downloading
https://autobrew.github.io/bottles/aws-sdk-cpp-1.7.364.high_sierra.bottle.2.tar.gz
Already downloaded:
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T/downloads/f34d7866b963ebd58ba2d413affc720463646a52887fc411e7cbe697b8267e2d--aws-sdk-cpp-1.7.364.high_sierra.bottle.2.tar.gz
==> Pouring aws-sdk-cpp-1.7.364.high_sierra.bottle.2.tar.gz
==> Skipping post_install step for autobrew...
🍺
/private/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T/build-apache-arrow/Cellar/aws-sdk-cpp/1.7.364:
967 files, 21.8MB
==> Downloading
https://bintray-archive.github.io/bottles/lz4-1.8.3.high_sierra.bottle.tar.gz
Already downloaded:
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T/downloads/0a946e5671f0e86faaeef98ebbf6a7f2fba9b7bbf042a5783ffd363c98f3c4bd--lz4-1.8.3.high_sierra.bottle.tar.gz
==> Pouring lz4-1.8.3.high_sierra.bottle.tar.gz
==> Skipping post_install step for autobrew...
🍺
/private/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T/build-apache-arrow/Cellar/lz4/1.8.3:
22 files, 487.9KB
==> Downloading
https://bintray-archive.github.io/bottles/snappy-1.1.7_1.high_sierra.bottle.tar.gz
Already downloaded:
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T/downloads/9387790272f8e7c54155bbc01c0babe956c7a1636780e8f0bd53d8a6bfc37494--snappy-1.1.7_1.high_sierra.bottle.tar.gz
==> Pouring snappy-1.1.7_1.high_sierra.bottle.tar.gz
==> Skipping post_install step for autobrew...
🍺
/private/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T/build-apache-arrow/Cellar/snappy/1.1.7_1:
18 files, 118.1KB
==> Downloading
https://bintray-archive.github.io/bottles/openssl-1.0.2p.high_sierra.bottle.tar.gz
Already downloaded:
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T/downloads/c284b406ac6052e9bf10cb9a91525c50697aee845f612f95ccb2fef66f906244--openssl-1.0.2p.high_sierra.bottle.tar.gz
==> Pouring openssl-1.0.2p.high_sierra.bottle.tar.gz
==> Skipping post_install step for autobrew...
🍺
/private/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T/build-apache-arrow/Cellar/openssl/1.0.2p:
1,793 files, 12.3MB
==> Downloading
https://bintray-archive.github.io/bottles/thrift-0.11.0.high_sierra.bottle.tar.gz
Already downloaded:
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T/downloads/eb661987039a17dd353cf11bcd67ec1c6bfafd8cf3f59657ef1e54b7880e796f--thrift-0.11.0.high_sierra.bottle.tar.gz
==> Pouring thrift-0.11.0.high_sierra.bottle.tar.gz
==> Skipping post_install step for autobrew...
🍺
/private/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T/build-apache-arrow/Cellar/thrift/0.11.0:
102 files, 7MB
==> Downloading
https://autobrew.github.io/bottles/zstd-1.5.0.high_sierra.bottle.tar.gz
Already downloaded:
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T/downloads/32a33d7fff1ab256be5045c729d6311a3e0258a97f47717cbbf240b5aec07be4--zstd-1.5.0.high_sierra.bottle.tar.gz
==> Pouring zstd-1.5.0.high_sierra.bottle.tar.gz
==> Skipping post_install step for autobrew...
🍺
/private/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T/build-apache-arrow/Cellar/zstd/1.5.0:
26 files, 4.4MB
Error: The following flags:
--HEAD, --build-from-source
require building tools, but none are installed.
Install the Command Line Tools:
xcode-select --install
created
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewaws-c-common.a
created
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewaws-c-event-stream.a
created
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewaws-checksums.a
created
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewaws-cpp-sdk-cognito-identity.a
created
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewaws-cpp-sdk-config.a
created
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewaws-cpp-sdk-core.a
created
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewaws-cpp-sdk-identity-management.a
created
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewaws-cpp-sdk-s3.a
created
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewaws-cpp-sdk-sts.a
created
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewaws-cpp-sdk-transfer.a
created
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewtesting-resources.a
created
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewlz4.a
created
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewcrypto.a
created
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewssl.a
created
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewsnappy.a
created
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewthrift.a
created
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewthriftz.a
created
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewzstd.a
------------------------- NOTE ---------------------------
There was an issue preparing the Arrow C++ libraries.
See https://arrow.apache.org/docs/r/articles/install.html
---------------------------------------------------------
ERROR: configuration failed for package 'arrow'
* removing '/Users/ma38727/Library/R/4.2/Bioc/3.15/library/arrow'
* restoring previous '/Users/ma38727/Library/R/4.2/Bioc/3.15/library/arrow'
The downloaded source packages are in
'/private/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T/RtmpDFwn98/downloaded_packages'
Warning message:
In install.packages("arrow", repos =
"https://arrow-r-nightly.s3.amazonaws.com") :
installation of package 'arrow' had non-zero exit status
{code}
but xcode command line tools are installed and current.
> [R][C++] macOS R package arrow segfault on `open_dataset()`
> -----------------------------------------------------------
>
> Key: ARROW-14677
> URL: https://issues.apache.org/jira/browse/ARROW-14677
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, R
> Affects Versions: 6.0.0
> Reporter: Martin Morgan
> Priority: Major
>
> Following a slack post
> (https://ropensci.slack.com/archives/C026GCWKA/p1636588933095400), accessing
> a public bucket with the R client
> {code:java}
> df <-
> arrow::open_dataset("s3://gbif-open-data-af-south-1/occurrence/2021-11-01/occurrence.parquet/")
> {code}
> leads to a segfault
> {code:java}
> *** caught segfault ***
> address 0x0, cause 'unknown'
> Traceback:
> 1: dataset__DatasetFactory_Finish1(self, unify_schemas)
> 2: factory$Finish(schema, isTRUE(unify_schemas))
> 3: doTryCatch(return(expr), name, parentenv, handler)
> 4: tryCatchOne(expr, names, parentenv, handlers[[1L]])
> 5: tryCatchList(expr, classes, parentenv, handlers)
> 6: tryCatch(factory$Finish(schema, isTRUE(unify_schemas)), error = function(e)
> { handle_parquet_io_error(e, format)}
> )
> 7:
> arrow::open_dataset("s3://gbif-open-data-af-south-1/occurrence/2021-11-01/occurrence.parquet/")
>
> {code}
> The arrow portion of the lldb traceback is
> {code:java}
> (lldb) thread backtrace
> thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS
> (code=EXC_I386_GPFLT) frame #0: 0x000000012ab2029c
> libthrift-0.15.0.dylib`std::__1::shared_ptr<apache::thrift::async::TAsyncProcessor>::~shared_ptr()
> + 46
> frame #1: 0x0000000128bb6ac2 arrow.so`void
> parquet::DeserializeThriftUnencryptedMsg<parquet::format::FileMetaData>(unsigned
> char const*, unsigned int*, parquet::format::FileMetaData*) + 309
> frame #2: 0x0000000128bb5f49
> arrow.so`parquet::FileMetaData::FileMetaDataImpl::FileMetaDataImpl(void
> const*, unsigned int*, std::__1::shared_ptr<parquet::InternalFileDecryptor>)
> + 517
> frame #3: 0x0000000128bace0d
> arrow.so`parquet::FileMetaData::FileMetaData(void const*, unsigned int*,
> std::__1::shared_ptr<parquet::InternalFileDecryptor>) + 85
> frame #4: 0x0000000128bacd1b arrow.so`parquet::FileMetaData::Make(void
> const*, unsigned int*, std::__1::shared_ptr<parquet::InternalFileDecryptor>)
> + 89
> frame #5: 0x0000000128b9cb4a
> arrow.so`parquet::SerializedFile::ParseUnencryptedFileMetadata(std::__1::shared_ptr<arrow::Buffer>
> const&, unsigned int) + 118
> frame #6: 0x0000000128b9df43
> arrow.so`parquet::SerializedFile::ParseMetaData() + 607
> frame #7: 0x0000000128b9dc6c
> arrow.so`parquet::ParquetFileReader::Contents::Open(std::_1::shared_ptr<arrow::io::RandomAccessFile>,
> parquet::ReaderProperties const&,
> std::_1::shared_ptr<parquet::FileMetaData>) + 214
> frame #8: 0x0000000128b9eb72
> arrow.so`parquet::ParquetFileReader::Open(std::_1::shared_ptr<arrow::io::RandomAccessFile>,
> parquet::ReaderProperties const&,
> std::_1::shared_ptr<parquet::FileMetaData>) + 58
> frame #9: 0x0000000128c8a988
> arrow.so`arrow::dataset::ParquetFileFormat::GetReader(arrow::dataset::FileSource
> const&, arrow::dataset::ScanOptions*) const + 286
> frame #10: 0x0000000128c8a72e
> arrow.so`arrow::dataset::ParquetFileFormat::Inspect(arrow::dataset::FileSource
> const&) const + 44
> frame #11: 0x0000000128c0b994
> arrow.so`arrow::dataset::FileSystemDatasetFactory::InspectSchemas(arrow::dataset::InspectOptions)
> + 336
> frame #12: 0x0000000128c09079
> arrow.so`arrow::dataset::DatasetFactory::Inspect(arrow::dataset::InspectOptions)
> + 43
> frame #13: 0x0000000128c0c1cf
> arrow.so`arrow::dataset::FileSystemDatasetFactory::Finish(arrow::dataset::FinishOptions)
> + 541
> frame #14: 0x0000000128a66805
> arrow.so`dataset__DatasetFactoryFinish1(std::_1::shared_ptr<arrow::dataset::DatasetFactory>
> const&, bool) + 69
> frame #15: 0x0000000128a105aa arrow.so`arrow_dataset_DatasetFactory_Finish1 +
> 154 {code}
> arrow was installed from source on
> {code:java}
> > sessionInfo()
> R Under development (unstable) (2021-10-28 r81109)
> Platform: x86_64-apple-darwin19.6.0 (64-bit)
> Running under: macOS Catalina 10.15.7
> Matrix products: default
> BLAS: /Users/ma38727/bin/R-devel/lib/libRblas.dylib
> LAPACK: /Users/ma38727/bin/R-devel/lib/libRlapack.dylib
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
> other attached packages:
> [1] arrow_6.0.0.2
> loaded via a namespace (and not attached):
> [1] tidyselect_1.1.1 bit_4.0.4 compiler_4.2.0
> [4] BiocManager_1.30.16 magrittr_2.0.1 assertthat_0.2.1
> [7] R6_2.5.1 glue_1.5.0 bit64_4.0.5
> [10] vctrs_0.3.8 rlang_0.4.12 purrr_0.3.4
> {code}
> During package installation, the one step that was 'new' to me was the use of
> autobrew
> {code:java}
> *** Downloading apache-arrow
> Using autobrew bundle: apache-arrow-6.0.0-high_sierra.tar.xz{code}
> I'm not sure how to validate that this use is consistent with my brew
> installation.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)