[ 
https://issues.apache.org/jira/browse/ARROW-14677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17445361#comment-17445361
 ] 

Martin Morgan commented on ARROW-14677:
---------------------------------------

To be a bit more complete, when I look at libraries in the package installed 
from CRAN I see
{code:java}
> system2('otool', c('-L', system.file('libs/arrow.so', package='arrow')))
/Users/ma38727/Library/R/x86_64/4.1/library/arrow/libs/arrow.so:
    arrow.so (compatibility version 0.0.0, current version 0.0.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 
1252.50.4)
    /usr/lib/libcurl.4.dylib (compatibility version 7.0.0, current version 
9.0.0)
    /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libR.dylib 
(compatibility version 4.1.0, current version 4.1.0)
    
/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation 
(compatibility version 150.0.0, current version 1452.23.0)
    /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 
400.9.0){code}
When installing, here's what I see
{code:java}
Using autobrew bundle: apache-arrow-6.0.0-high_sierra.tar.xz
PKG_CFLAGS=-I/private/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T/RtmpTkKYB6/R.INSTALL4c0b3a2ebab4/arrow/.deps/include
 -DARROW_R_WITH_PARQUET -DARROW_R_WITH_DATASET -DARROW_R_WITH_JSON 
-DARROW_R_WITH_S3 -DARROW_R_WITH_ARROW
PKG_LIBS=-L/private/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T/RtmpTkKYB6/R.INSTALL4c0b3a2ebab4/arrow/.deps/lib
 -lparquet -larrow_dataset -larrow -larrow_bundled_dependencies -lthrift -llz4 
-lsnappy -lzstd -laws-cpp-sdk-config -laws-cpp-sdk-transfer 
-laws-cpp-sdk-identity-management -laws-cpp-sdk-cognito-identity 
-laws-cpp-sdk-sts -laws-cpp-sdk-s3 -laws-cpp-sdk-core -laws-c-event-stream 
-laws-checksums -laws-c-common -lpthread -lcurl{code}
In compile, I see lines like
{code:java}
g++ -std=gnu++11 -I"/Users/ma38727/bin/R-devel/include" -DNDEBUG 
-I/private/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T/RtmpTkKYB6/R.INSTALL4c0b3a2ebab4/arrow/.deps/include
 -DARROW_R_WITH_PARQUET -DARROW_R_WITH_DATASET -DARROW_R_WITH_JSON 
-DARROW_R_WITH_S3 -DARROW_R_WITH_ARROW -I../inst/include/  -I/usr/local/include 
  -fPIC  -g -O2  -c RTasks.cpp -o RTasks.o{code}
At linking, I see
{code:java}
g++ -std=gnu++11 -dynamiclib -Wl,-headerpad_max_install_names -undefined 
dynamic_lookup -single_module -multiply_defined suppress 
-L/Users/ma38727/bin/R-devel/lib -L/usr/local/lib -o arrow.so RTasks.o altrep.o 
array.o array_to_vector.o arraydata.o arrowExports.o buffer.o chunkedarray.o 
compression.o compute-exec.o compute.o config.o csv.o dataset.o datatype.o 
expression.o feather.o field.o filesystem.o imports.o io.o json.o memorypool.o 
message.o parquet.o py-to-r.o r_to_arrow.o recordbatch.o recordbatchreader.o 
recordbatchwriter.o scalar.o schema.o symbols.o table.o threadpool.o 
type_infer.o 
-L/private/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T/RtmpTkKYB6/R.INSTALL4c0b3a2ebab4/arrow/.deps/lib
 -lparquet -larrow_dataset -larrow -larrow_bundled_dependencies -lthrift -llz4 
-lsnappy -lzstd -laws-cpp-sdk-config -laws-cpp-sdk-transfer 
-laws-cpp-sdk-identity-management -laws-cpp-sdk-cognito-identity 
-laws-cpp-sdk-sts -laws-cpp-sdk-s3 -laws-cpp-sdk-core -laws-c-event-stream 
-laws-checksums -laws-c-common -lpthread -lcurl 
-L/Users/ma38727/bin/R-devel/lib -lR -lintl -Wl,-framework 
-Wl,CoreFoundation{code}
A promising commit (unrelated to autobrew) is 
[https://github.com/apache/arrow/commit/225d9547d2363bd0eb8c85bdd0dd98a6014069d7]
 but trying to install a nightly build lead to

 
{code:java}
 install.packages("arrow", repos = "https://arrow-r-nightly.s3.amazonaws.com";)
Installing package into '/Users/ma38727/Library/R/4.2/Bioc/3.15/library'
(as 'lib' is unspecified)
trying URL 
'https://arrow-r-nightly.s3.amazonaws.com/src/contrib/arrow_6.0.0.20211116.tar.gz'
Content type 'binary/octet-stream' length 4562535 bytes (4.4 MB)
==================================================
downloaded 4.4 MB
* installing *source* package 'arrow' ...
** using staged installation
*** Downloading apache-arrow
**** Using local manifest for apache-arrow
Wed Nov 17 11:51:53 EST 2021: Auto-brewing apache-arrow in 
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow...
==> Tapping autobrew/core from https://github.com/autobrew/homebrew-core
Tapped 2 commands and 4636 formulae (4,885 files, 12.7MB).
aws-sdk-cpp
lz4
snappy
openssl
thrift
zstd
==> Downloading 
https://autobrew.github.io/bottles/aws-sdk-cpp-1.7.364.high_sierra.bottle.2.tar.gz
Already downloaded: 
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T/downloads/f34d7866b963ebd58ba2d413affc720463646a52887fc411e7cbe697b8267e2d--aws-sdk-cpp-1.7.364.high_sierra.bottle.2.tar.gz
==> Pouring aws-sdk-cpp-1.7.364.high_sierra.bottle.2.tar.gz
==> Skipping post_install step for autobrew...
🍺  
/private/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T/build-apache-arrow/Cellar/aws-sdk-cpp/1.7.364:
 967 files, 21.8MB
==> Downloading 
https://bintray-archive.github.io/bottles/lz4-1.8.3.high_sierra.bottle.tar.gz
Already downloaded: 
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T/downloads/0a946e5671f0e86faaeef98ebbf6a7f2fba9b7bbf042a5783ffd363c98f3c4bd--lz4-1.8.3.high_sierra.bottle.tar.gz
==> Pouring lz4-1.8.3.high_sierra.bottle.tar.gz
==> Skipping post_install step for autobrew...
🍺  
/private/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T/build-apache-arrow/Cellar/lz4/1.8.3:
 22 files, 487.9KB
==> Downloading 
https://bintray-archive.github.io/bottles/snappy-1.1.7_1.high_sierra.bottle.tar.gz
Already downloaded: 
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T/downloads/9387790272f8e7c54155bbc01c0babe956c7a1636780e8f0bd53d8a6bfc37494--snappy-1.1.7_1.high_sierra.bottle.tar.gz
==> Pouring snappy-1.1.7_1.high_sierra.bottle.tar.gz
==> Skipping post_install step for autobrew...
🍺  
/private/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T/build-apache-arrow/Cellar/snappy/1.1.7_1:
 18 files, 118.1KB
==> Downloading 
https://bintray-archive.github.io/bottles/openssl-1.0.2p.high_sierra.bottle.tar.gz
Already downloaded: 
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T/downloads/c284b406ac6052e9bf10cb9a91525c50697aee845f612f95ccb2fef66f906244--openssl-1.0.2p.high_sierra.bottle.tar.gz
==> Pouring openssl-1.0.2p.high_sierra.bottle.tar.gz
==> Skipping post_install step for autobrew...
🍺  
/private/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T/build-apache-arrow/Cellar/openssl/1.0.2p:
 1,793 files, 12.3MB
==> Downloading 
https://bintray-archive.github.io/bottles/thrift-0.11.0.high_sierra.bottle.tar.gz
Already downloaded: 
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T/downloads/eb661987039a17dd353cf11bcd67ec1c6bfafd8cf3f59657ef1e54b7880e796f--thrift-0.11.0.high_sierra.bottle.tar.gz
==> Pouring thrift-0.11.0.high_sierra.bottle.tar.gz
==> Skipping post_install step for autobrew...
🍺  
/private/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T/build-apache-arrow/Cellar/thrift/0.11.0:
 102 files, 7MB
==> Downloading 
https://autobrew.github.io/bottles/zstd-1.5.0.high_sierra.bottle.tar.gz
Already downloaded: 
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T/downloads/32a33d7fff1ab256be5045c729d6311a3e0258a97f47717cbbf240b5aec07be4--zstd-1.5.0.high_sierra.bottle.tar.gz
==> Pouring zstd-1.5.0.high_sierra.bottle.tar.gz
==> Skipping post_install step for autobrew...
🍺  
/private/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T/build-apache-arrow/Cellar/zstd/1.5.0:
 26 files, 4.4MB
Error: The following flags:
  --HEAD, --build-from-source
require building tools, but none are installed.
Install the Command Line Tools:
  xcode-select --install
created 
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewaws-c-common.a
created 
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewaws-c-event-stream.a
created 
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewaws-checksums.a
created 
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewaws-cpp-sdk-cognito-identity.a
created 
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewaws-cpp-sdk-config.a
created 
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewaws-cpp-sdk-core.a
created 
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewaws-cpp-sdk-identity-management.a
created 
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewaws-cpp-sdk-s3.a
created 
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewaws-cpp-sdk-sts.a
created 
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewaws-cpp-sdk-transfer.a
created 
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewtesting-resources.a
created 
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewlz4.a
created 
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewcrypto.a
created 
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewssl.a
created 
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewsnappy.a
created 
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewthrift.a
created 
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewthriftz.a
created 
/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//build-apache-arrow/lib/libbrewzstd.a
------------------------- NOTE ---------------------------
There was an issue preparing the Arrow C++ libraries.
See https://arrow.apache.org/docs/r/articles/install.html
---------------------------------------------------------
ERROR: configuration failed for package 'arrow'
* removing '/Users/ma38727/Library/R/4.2/Bioc/3.15/library/arrow'
* restoring previous '/Users/ma38727/Library/R/4.2/Bioc/3.15/library/arrow'
The downloaded source packages are in
    
'/private/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T/RtmpDFwn98/downloaded_packages'
Warning message:
In install.packages("arrow", repos = 
"https://arrow-r-nightly.s3.amazonaws.com";) :
  installation of package 'arrow' had non-zero exit status
{code}
but xcode command line tools are installed and current.

> [R][C++] macOS R package arrow segfault on `open_dataset()`
> -----------------------------------------------------------
>
>                 Key: ARROW-14677
>                 URL: https://issues.apache.org/jira/browse/ARROW-14677
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, R
>    Affects Versions: 6.0.0
>            Reporter: Martin Morgan
>            Priority: Major
>
> Following a slack post 
> (https://ropensci.slack.com/archives/C026GCWKA/p1636588933095400), accessing 
> a public bucket with the R client
> {code:java}
> df <- 
> arrow::open_dataset("s3://gbif-open-data-af-south-1/occurrence/2021-11-01/occurrence.parquet/")
> {code}
> leads to a segfault
> {code:java}
>   *** caught segfault ***
> address 0x0, cause 'unknown'
> Traceback:
> 1: dataset__DatasetFactory_Finish1(self, unify_schemas)
> 2: factory$Finish(schema, isTRUE(unify_schemas))
> 3: doTryCatch(return(expr), name, parentenv, handler)
> 4: tryCatchOne(expr, names, parentenv, handlers[[1L]])
> 5: tryCatchList(expr, classes, parentenv, handlers)
> 6: tryCatch(factory$Finish(schema, isTRUE(unify_schemas)), error = function(e)
> { handle_parquet_io_error(e, format)}
> )
> 7: 
> arrow::open_dataset("s3://gbif-open-data-af-south-1/occurrence/2021-11-01/occurrence.parquet/")
>  
> {code}
> The arrow portion of the lldb traceback is
> {code:java}
> (lldb) thread backtrace
> thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS 
> (code=EXC_I386_GPFLT) frame #0: 0x000000012ab2029c 
> libthrift-0.15.0.dylib`std::__1::shared_ptr<apache::thrift::async::TAsyncProcessor>::~shared_ptr()
>  + 46
> frame #1: 0x0000000128bb6ac2 arrow.so`void 
> parquet::DeserializeThriftUnencryptedMsg<parquet::format::FileMetaData>(unsigned
>  char const*, unsigned int*, parquet::format::FileMetaData*) + 309
> frame #2: 0x0000000128bb5f49 
> arrow.so`parquet::FileMetaData::FileMetaDataImpl::FileMetaDataImpl(void 
> const*, unsigned int*, std::__1::shared_ptr<parquet::InternalFileDecryptor>) 
> + 517
> frame #3: 0x0000000128bace0d 
> arrow.so`parquet::FileMetaData::FileMetaData(void const*, unsigned int*, 
> std::__1::shared_ptr<parquet::InternalFileDecryptor>) + 85
> frame #4: 0x0000000128bacd1b arrow.so`parquet::FileMetaData::Make(void 
> const*, unsigned int*, std::__1::shared_ptr<parquet::InternalFileDecryptor>) 
> + 89
> frame #5: 0x0000000128b9cb4a 
> arrow.so`parquet::SerializedFile::ParseUnencryptedFileMetadata(std::__1::shared_ptr<arrow::Buffer>
>  const&, unsigned int) + 118
> frame #6: 0x0000000128b9df43 
> arrow.so`parquet::SerializedFile::ParseMetaData() + 607
> frame #7: 0x0000000128b9dc6c 
> arrow.so`parquet::ParquetFileReader::Contents::Open(std::_1::shared_ptr<arrow::io::RandomAccessFile>,
>  parquet::ReaderProperties const&, 
> std::_1::shared_ptr<parquet::FileMetaData>) + 214
> frame #8: 0x0000000128b9eb72 
> arrow.so`parquet::ParquetFileReader::Open(std::_1::shared_ptr<arrow::io::RandomAccessFile>,
>  parquet::ReaderProperties const&, 
> std::_1::shared_ptr<parquet::FileMetaData>) + 58
> frame #9: 0x0000000128c8a988 
> arrow.so`arrow::dataset::ParquetFileFormat::GetReader(arrow::dataset::FileSource
>  const&, arrow::dataset::ScanOptions*) const + 286
> frame #10: 0x0000000128c8a72e 
> arrow.so`arrow::dataset::ParquetFileFormat::Inspect(arrow::dataset::FileSource
>  const&) const + 44
> frame #11: 0x0000000128c0b994 
> arrow.so`arrow::dataset::FileSystemDatasetFactory::InspectSchemas(arrow::dataset::InspectOptions)
>  + 336
> frame #12: 0x0000000128c09079 
> arrow.so`arrow::dataset::DatasetFactory::Inspect(arrow::dataset::InspectOptions)
>  + 43
> frame #13: 0x0000000128c0c1cf 
> arrow.so`arrow::dataset::FileSystemDatasetFactory::Finish(arrow::dataset::FinishOptions)
>  + 541
> frame #14: 0x0000000128a66805 
> arrow.so`dataset__DatasetFactoryFinish1(std::_1::shared_ptr<arrow::dataset::DatasetFactory>
>  const&, bool) + 69
> frame #15: 0x0000000128a105aa arrow.so`arrow_dataset_DatasetFactory_Finish1 + 
> 154 {code}
> arrow was installed from source on
> {code:java}
> > sessionInfo()
> R Under development (unstable) (2021-10-28 r81109)
> Platform: x86_64-apple-darwin19.6.0 (64-bit)
> Running under: macOS Catalina 10.15.7
> Matrix products: default
> BLAS: /Users/ma38727/bin/R-devel/lib/libRblas.dylib
> LAPACK: /Users/ma38727/bin/R-devel/lib/libRlapack.dylib
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
> other attached packages:
> [1] arrow_6.0.0.2
> loaded via a namespace (and not attached):
> [1] tidyselect_1.1.1 bit_4.0.4 compiler_4.2.0
> [4] BiocManager_1.30.16 magrittr_2.0.1 assertthat_0.2.1
> [7] R6_2.5.1 glue_1.5.0 bit64_4.0.5
> [10] vctrs_0.3.8 rlang_0.4.12 purrr_0.3.4
> {code}
> During package installation, the one step that was 'new' to me was the use of 
> autobrew
> {code:java}
> *** Downloading apache-arrow
> Using autobrew bundle: apache-arrow-6.0.0-high_sierra.tar.xz{code}
> I'm not sure how to validate that this use is consistent with my brew 
> installation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to