[ 
https://issues.apache.org/jira/browse/ARROW-14677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442342#comment-17442342
 ] 

Neal Richardson commented on ARROW-14677:
-----------------------------------------

Thanks for the report! autobrew pulls a bundle of static libraries. I'm not 
sure how it would clash with your local {{brew}} itself; the only thing I could 
think of would be if there were an issue with system/brew libcurl or openssl, 
which are not bundled and are required by the aws-sdk-cpp that reads from S3. 
Some thoughts:

1. Is there a reason you can't use the binary package from CRAN? (That is built 
with autobrew too, for what it's worth.)
2. You could try a source install and set the env var FORCE_BUNDLED_BUILD=true. 
This would build libarrow from source instead of using the prebuilt autobrew 
bundle. (I'd also recommend setting ARROW_R_DEV=true to get some output from 
the libarrow build, if for no other reason than to see that it is progressing.)
3. Can you download one or two of those parquet files from S3 and try to 
open_dataset() on them on your local filesystem? The backtrace points at thrift 
but I'm wondering if that's misleading.

It would be interesting to know if any/all of those segfault for you.

> [R][C++] macOS R package arrow segfault on `open_dataset()`
> -----------------------------------------------------------
>
>                 Key: ARROW-14677
>                 URL: https://issues.apache.org/jira/browse/ARROW-14677
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, R
>    Affects Versions: 6.0.0
>            Reporter: Martin Morgan
>            Priority: Major
>
> Following a slack post 
> (https://ropensci.slack.com/archives/C026GCWKA/p1636588933095400), accessing 
> a public bucket with the R client
> {code:java}
> df <- 
> arrow::open_dataset("s3://gbif-open-data-af-south-1/occurrence/2021-11-01/occurrence.parquet/")
> {code}
> leads to a segfault
> {code:java}
>   *** caught segfault ***
> address 0x0, cause 'unknown'
> Traceback:
> 1: dataset__DatasetFactory_Finish1(self, unify_schemas)
> 2: factory$Finish(schema, isTRUE(unify_schemas))
> 3: doTryCatch(return(expr), name, parentenv, handler)
> 4: tryCatchOne(expr, names, parentenv, handlers[[1L]])
> 5: tryCatchList(expr, classes, parentenv, handlers)
> 6: tryCatch(factory$Finish(schema, isTRUE(unify_schemas)), error = function(e)
> { handle_parquet_io_error(e, format)}
> )
> 7: 
> arrow::open_dataset("s3://gbif-open-data-af-south-1/occurrence/2021-11-01/occurrence.parquet/")
>  
> {code}
> The arrow portion of the lldb traceback is
> {code:java}
> (lldb) thread backtrace
> thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS 
> (code=EXC_I386_GPFLT) frame #0: 0x000000012ab2029c 
> libthrift-0.15.0.dylib`std::__1::shared_ptr<apache::thrift::async::TAsyncProcessor>::~shared_ptr()
>  + 46
> frame #1: 0x0000000128bb6ac2 arrow.so`void 
> parquet::DeserializeThriftUnencryptedMsg<parquet::format::FileMetaData>(unsigned
>  char const*, unsigned int*, parquet::format::FileMetaData*) + 309
> frame #2: 0x0000000128bb5f49 
> arrow.so`parquet::FileMetaData::FileMetaDataImpl::FileMetaDataImpl(void 
> const*, unsigned int*, std::__1::shared_ptr<parquet::InternalFileDecryptor>) 
> + 517
> frame #3: 0x0000000128bace0d 
> arrow.so`parquet::FileMetaData::FileMetaData(void const*, unsigned int*, 
> std::__1::shared_ptr<parquet::InternalFileDecryptor>) + 85
> frame #4: 0x0000000128bacd1b arrow.so`parquet::FileMetaData::Make(void 
> const*, unsigned int*, std::__1::shared_ptr<parquet::InternalFileDecryptor>) 
> + 89
> frame #5: 0x0000000128b9cb4a 
> arrow.so`parquet::SerializedFile::ParseUnencryptedFileMetadata(std::__1::shared_ptr<arrow::Buffer>
>  const&, unsigned int) + 118
> frame #6: 0x0000000128b9df43 
> arrow.so`parquet::SerializedFile::ParseMetaData() + 607
> frame #7: 0x0000000128b9dc6c 
> arrow.so`parquet::ParquetFileReader::Contents::Open(std::_1::shared_ptr<arrow::io::RandomAccessFile>,
>  parquet::ReaderProperties const&, 
> std::_1::shared_ptr<parquet::FileMetaData>) + 214
> frame #8: 0x0000000128b9eb72 
> arrow.so`parquet::ParquetFileReader::Open(std::_1::shared_ptr<arrow::io::RandomAccessFile>,
>  parquet::ReaderProperties const&, 
> std::_1::shared_ptr<parquet::FileMetaData>) + 58
> frame #9: 0x0000000128c8a988 
> arrow.so`arrow::dataset::ParquetFileFormat::GetReader(arrow::dataset::FileSource
>  const&, arrow::dataset::ScanOptions*) const + 286
> frame #10: 0x0000000128c8a72e 
> arrow.so`arrow::dataset::ParquetFileFormat::Inspect(arrow::dataset::FileSource
>  const&) const + 44
> frame #11: 0x0000000128c0b994 
> arrow.so`arrow::dataset::FileSystemDatasetFactory::InspectSchemas(arrow::dataset::InspectOptions)
>  + 336
> frame #12: 0x0000000128c09079 
> arrow.so`arrow::dataset::DatasetFactory::Inspect(arrow::dataset::InspectOptions)
>  + 43
> frame #13: 0x0000000128c0c1cf 
> arrow.so`arrow::dataset::FileSystemDatasetFactory::Finish(arrow::dataset::FinishOptions)
>  + 541
> frame #14: 0x0000000128a66805 
> arrow.so`dataset__DatasetFactoryFinish1(std::_1::shared_ptr<arrow::dataset::DatasetFactory>
>  const&, bool) + 69
> frame #15: 0x0000000128a105aa arrow.so`arrow_dataset_DatasetFactory_Finish1 + 
> 154 {code}
> arrow was installed from source on
> {code:java}
> > sessionInfo()
> R Under development (unstable) (2021-10-28 r81109)
> Platform: x86_64-apple-darwin19.6.0 (64-bit)
> Running under: macOS Catalina 10.15.7
> Matrix products: default
> BLAS: /Users/ma38727/bin/R-devel/lib/libRblas.dylib
> LAPACK: /Users/ma38727/bin/R-devel/lib/libRlapack.dylib
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
> other attached packages:
> [1] arrow_6.0.0.2
> loaded via a namespace (and not attached):
> [1] tidyselect_1.1.1 bit_4.0.4 compiler_4.2.0
> [4] BiocManager_1.30.16 magrittr_2.0.1 assertthat_0.2.1
> [7] R6_2.5.1 glue_1.5.0 bit64_4.0.5
> [10] vctrs_0.3.8 rlang_0.4.12 purrr_0.3.4
> {code}
> During package installation, the one step that was 'new' to me was the use of 
> autobrew
> {code:java}
> *** Downloading apache-arrow
> Using autobrew bundle: apache-arrow-6.0.0-high_sierra.tar.xz{code}
> I'm not sure how to validate that this use is consistent with my brew 
> installation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to