thisisnic opened a new issue, #36807:
URL: https://github.com/apache/arrow/issues/36807

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   I'm using the Arrow R package version 12.0.1.1 and am getting segfault when 
trying to read a Parquet file.  Here's the output with the debugger attached:
   
   ```
   > library(fs)
   library(arrow)
   library(dplyr)
   [New Thread 0x7ffff33ff640 (LWP 480350)]
   [New Thread 0x7fffe99ff640 (LWP 480356)]
   Some features are not enabled in this build of Arrow. Run `arrow_info()` for 
more information.
   
   Attaching package: ‘arrow’
   
   The following object is masked from ‘package:utils’:
   
       timestamp
   
   
   Attaching package: ‘dplyr’
   
   The following objects are masked from ‘package:stats’:
   
       filter, lag
   
   The following objects are masked from ‘package:base’:
   
       intersect, setdiff, setequal, union
   
   > all_files <- dir_ls("/data/nyc-taxi", recurse=TRUE)
   parquet_files <- all_files[endsWith(all_files, "parquet")]
   > parquet_files[86]
   /data/nyc-taxi/year=2016/month=10/part-0.parquet
   > ds <- open_dataset(parquet_files[86]) %>% head(6) %>% collect()
   [New Thread 0x7fffe9007640 (LWP 480358)]
   [New Thread 0x7fffe8806640 (LWP 480359)]
   [New Thread 0x7fffd7b7f640 (LWP 480360)]
   [New Thread 0x7fffd6b7f640 (LWP 480361)]
   [New Thread 0x7fffd637e640 (LWP 480362)]
   [New Thread 0x7fffd5b7d640 (LWP 480363)]
   [New Thread 0x7fffd537c640 (LWP 480364)]
   [New Thread 0x7fffd4b7b640 (LWP 480365)]
   [New Thread 0x7fffcd7ff640 (LWP 480366)]
   [New Thread 0x7fffccffe640 (LWP 480367)]
   [New Thread 0x7fffb3fff640 (LWP 480368)]
   [New Thread 0x7fffb37fe640 (LWP 480369)]
   [New Thread 0x7fffb2ffd640 (LWP 480370)]
   > nrow(ds)
   [1] 6
   > parquet_files[87]
   /data/nyc-taxi/year=2016/month=11/part-0.parquet
   > ds <- open_dataset(parquet_files[87]) %>% head(6) %>% collect()
   > 
   Thread 13 "R" received signal SIGSEGV, Segmentation fault.
   [Switching to Thread 0x7fffccffe640 (LWP 480367)]
   0x00007ffff00fbf38 in 
arrow::internal::Executor::Submit<parquet::arrow::(anonymous 
namespace)::FileReaderImpl::DecodeRowGroups(std::shared_ptr<parquet::arrow::(anonymous
 namespace)::FileReaderImpl>, const std::vector<int>&, const std::vector<int>&, 
arrow::internal::Executor*)::<lambda(size_t, 
std::shared_ptr<parquet::arrow::ColumnReaderImpl>)>&, long unsigned int&, 
std::shared_ptr<parquet::arrow::ColumnReaderImpl> >(arrow::internal::TaskHints, 
arrow::StopToken, struct {...} &) (this=0x2e1c00000008, hints=..., 
stop_token=..., func=...) at 
/home/nic2/arrow/cpp/src/arrow/util/thread_pool.h:159
   159      ARROW_RETURN_NOT_OK(SpawnReal(hints, std::move(task), 
std::move(stop_token),
   ```
   
   If I read in the file via `read_parquet()`, I don't have a problem and it 
loads fine.  Happy to supply the file if necessary, though wasn't sure it's 
possible/desirable to attach a 150Mb file to an issue ticket.
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to