[ https://issues.apache.org/jira/browse/ARROW-15656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503172#comment-17503172 ]
Dewey Dunnington commented on ARROW-15656: ------------------------------------------ I'll narrow it down further tomorrow, but the "tests/testthat/test-dataset-csv.R" file looks like it triggers a few: {noformat} echo 'devtools::test(filter = "^dataset-csv")' | R -d "valgrind --tool=memcheck --leak-check=full" --no-save > dataset-csv.txt 2>&1 {noformat} {noformat} ==528658== Memcheck, a memory error detector ==528658== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==528658== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info ==528658== Command: /usr/lib/R/bin/exec/R --no-save ==528658== R version 4.1.2 (2021-11-01) -- "Bird Hippie" Copyright (C) 2021 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > devtools::test(filter = "^dataset-csv") ℹ Loading arrow ℹ Testing arrow ✓ | F W S OK | Context ✓ | 43 | dataset-csv [32.4s] ══ Results ═════════════════════════════════════════════════════════════════════ Duration: 32.5 s [ FAIL 0 | WARN 0 | SKIP 0 | PASS 43 ] > ==528658== ==528658== HEAP SUMMARY: ==528658== in use at exit: 171,384,295 bytes in 36,343 blocks ==528658== total heap usage: 581,604 allocs, 545,261 frees, 635,268,796 bytes allocated ==528658== ==528658== 32 bytes in 1 blocks are possibly lost in loss record 66 of 3,049 ==528658== at 0x483BE63: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==528658== by 0x1AA18246: allocate (new_allocator.h:114) ==528658== by 0x1AA18246: allocate (alloc_traits.h:443) ==528658== by 0x1AA18246: _M_allocate (stl_vector.h:343) ==528658== by 0x1AA18246: _M_create_storage (stl_vector.h:358) ==528658== by 0x1AA18246: _Vector_base (stl_vector.h:302) ==528658== by 0x1AA18246: vector (stl_vector.h:552) ==528658== by 0x1AA18246: schema_(std::vector<std::shared_ptr<arrow::Field>, std::allocator<std::shared_ptr<arrow::Field> > > const&) (schema.cpp:28) ==528658== by 0x1A941804: _arrow_schema_ (arrowExports.cpp:6908) ==528658== by 0x494152F: ??? (in /usr/lib/R/lib/libR.so) ==528658== by 0x4941A64: ??? (in /usr/lib/R/lib/libR.so) ==528658== by 0x4998082: Rf_eval (in /usr/lib/R/lib/libR.so) ==528658== by 0x499B4F2: ??? (in /usr/lib/R/lib/libR.so) ==528658== by 0x4997E3A: Rf_eval (in /usr/lib/R/lib/libR.so) ==528658== by 0x499980E: ??? (in /usr/lib/R/lib/libR.so) ==528658== by 0x499A701: Rf_applyClosure (in /usr/lib/R/lib/libR.so) ==528658== by 0x4997B6E: Rf_eval (in /usr/lib/R/lib/libR.so) ==528658== by 0x499B4F2: ??? (in /usr/lib/R/lib/libR.so) ==528658== ==528658== 104 bytes in 1 blocks are possibly lost in loss record 145 of 3,049 ==528658== at 0x483BE63: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==528658== by 0x15FCAC82: arrow::Schema::Schema(std::vector<std::shared_ptr<arrow::Field>, std::allocator<std::shared_ptr<arrow::Field> > >, std::shared_ptr<arrow::KeyValueMetadata const>) (in /home/dewey/.r-arrow-dev-build/dist/lib/libarrow.so.800.0.0) ==528658== by 0x15FCB6D3: arrow::schema(std::vector<std::shared_ptr<arrow::Field>, std::allocator<std::shared_ptr<arrow::Field> > >, std::shared_ptr<arrow::KeyValueMetadata const>) (in /home/dewey/.r-arrow-dev-build/dist/lib/libarrow.so.800.0.0) ==528658== by 0x1AA182CD: schema_(std::vector<std::shared_ptr<arrow::Field>, std::allocator<std::shared_ptr<arrow::Field> > > const&) (schema.cpp:28) ==528658== by 0x1A941804: _arrow_schema_ (arrowExports.cpp:6908) ==528658== by 0x494152F: ??? (in /usr/lib/R/lib/libR.so) ==528658== by 0x4941A64: ??? (in /usr/lib/R/lib/libR.so) ==528658== by 0x4998082: Rf_eval (in /usr/lib/R/lib/libR.so) ==528658== by 0x499B4F2: ??? (in /usr/lib/R/lib/libR.so) ==528658== by 0x4997E3A: Rf_eval (in /usr/lib/R/lib/libR.so) ==528658== by 0x499980E: ??? (in /usr/lib/R/lib/libR.so) ==528658== by 0x499A701: Rf_applyClosure (in /usr/lib/R/lib/libR.so) ==528658== ==528658== 104 bytes in 1 blocks are possibly lost in loss record 146 of 3,049 ==528658== at 0x483BE63: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==528658== by 0x15FDA7A4: std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, false> >::_M_rehash_aux(unsigned long, std::integral_constant<bool, false>) (in /home/dewey/.r-arrow-dev-build/dist/lib/libarrow.so.800.0.0) ==528658== by 0x15FDAA5A: std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, false> >::_M_insert_multi_node(std::__detail::_Hash_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, true>*, unsigned long, std::__detail::_Hash_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, true>*) (in /home/dewey/.r-arrow-dev-build/dist/lib/libarrow.so.800.0.0) ==528658== by 0x15FC9A64: arrow::(anonymous namespace)::CreateNameToIndexMap(std::vector<std::shared_ptr<arrow::Field>, std::allocator<std::shared_ptr<arrow::Field> > > const&) (in /home/dewey/.r-arrow-dev-build/dist/lib/libarrow.so.800.0.0) ==528658== by 0x15FCACBA: arrow::Schema::Schema(std::vector<std::shared_ptr<arrow::Field>, std::allocator<std::shared_ptr<arrow::Field> > >, std::shared_ptr<arrow::KeyValueMetadata const>) (in /home/dewey/.r-arrow-dev-build/dist/lib/libarrow.so.800.0.0) ==528658== by 0x15FCB6D3: arrow::schema(std::vector<std::shared_ptr<arrow::Field>, std::allocator<std::shared_ptr<arrow::Field> > >, std::shared_ptr<arrow::KeyValueMetadata const>) (in /home/dewey/.r-arrow-dev-build/dist/lib/libarrow.so.800.0.0) ==528658== by 0x1AA182CD: schema_(std::vector<std::shared_ptr<arrow::Field>, std::allocator<std::shared_ptr<arrow::Field> > > const&) (schema.cpp:28) ==528658== by 0x1A941804: _arrow_schema_ (arrowExports.cpp:6908) ==528658== by 0x494152F: ??? (in /usr/lib/R/lib/libR.so) ==528658== by 0x4941A64: ??? (in /usr/lib/R/lib/libR.so) ==528658== by 0x4998082: Rf_eval (in /usr/lib/R/lib/libR.so) ==528658== by 0x499B4F2: ??? (in /usr/lib/R/lib/libR.so) ==528658== ==528658== 112 bytes in 2 blocks are possibly lost in loss record 154 of 3,049 ==528658== at 0x483BE63: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==528658== by 0x15FC9A14: arrow::(anonymous namespace)::CreateNameToIndexMap(std::vector<std::shared_ptr<arrow::Field>, std::allocator<std::shared_ptr<arrow::Field> > > const&) (in /home/dewey/.r-arrow-dev-build/dist/lib/libarrow.so.800.0.0) ==528658== by 0x15FCACBA: arrow::Schema::Schema(std::vector<std::shared_ptr<arrow::Field>, std::allocator<std::shared_ptr<arrow::Field> > >, std::shared_ptr<arrow::KeyValueMetadata const>) (in /home/dewey/.r-arrow-dev-build/dist/lib/libarrow.so.800.0.0) ==528658== by 0x15FCB6D3: arrow::schema(std::vector<std::shared_ptr<arrow::Field>, std::allocator<std::shared_ptr<arrow::Field> > >, std::shared_ptr<arrow::KeyValueMetadata const>) (in /home/dewey/.r-arrow-dev-build/dist/lib/libarrow.so.800.0.0) ==528658== by 0x1AA182CD: schema_(std::vector<std::shared_ptr<arrow::Field>, std::allocator<std::shared_ptr<arrow::Field> > > const&) (schema.cpp:28) ==528658== by 0x1A941804: _arrow_schema_ (arrowExports.cpp:6908) ==528658== by 0x494152F: ??? (in /usr/lib/R/lib/libR.so) ==528658== by 0x4941A64: ??? (in /usr/lib/R/lib/libR.so) ==528658== by 0x4998082: Rf_eval (in /usr/lib/R/lib/libR.so) ==528658== by 0x499B4F2: ??? (in /usr/lib/R/lib/libR.so) ==528658== by 0x4997E3A: Rf_eval (in /usr/lib/R/lib/libR.so) ==528658== by 0x499980E: ??? (in /usr/lib/R/lib/libR.so) ==528658== ==528658== 224 bytes in 2 blocks are possibly lost in loss record 238 of 3,049 ==528658== at 0x483BE63: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==528658== by 0x15FBA52F: arrow::field(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::shared_ptr<arrow::DataType>, bool, std::shared_ptr<arrow::KeyValueMetadata const>) (in /home/dewey/.r-arrow-dev-build/dist/lib/libarrow.so.800.0.0) ==528658== by 0x1A9A7379: Field__initialize(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::shared_ptr<arrow::DataType> const&, bool) (field.cpp:27) ==528658== by 0x1A94301B: _arrow_Field__initialize (arrowExports.cpp:4196) ==528658== by 0x49414FF: ??? (in /usr/lib/R/lib/libR.so) ==528658== by 0x4941A64: ??? (in /usr/lib/R/lib/libR.so) ==528658== by 0x4998082: Rf_eval (in /usr/lib/R/lib/libR.so) ==528658== by 0x499B4F2: ??? (in /usr/lib/R/lib/libR.so) ==528658== by 0x4997E3A: Rf_eval (in /usr/lib/R/lib/libR.so) ==528658== by 0x499980E: ??? (in /usr/lib/R/lib/libR.so) ==528658== by 0x499A701: Rf_applyClosure (in /usr/lib/R/lib/libR.so) ==528658== by 0x4997B6E: Rf_eval (in /usr/lib/R/lib/libR.so) ==528658== ==528658== 400 bytes in 1 blocks are possibly lost in loss record 312 of 3,049 ==528658== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==528658== by 0x40149CA: allocate_dtv (dl-tls.c:286) ==528658== by 0x40149CA: _dl_allocate_tls (dl-tls.c:532) ==528658== by 0x5735322: allocate_stack (allocatestack.c:622) ==528658== by 0x5735322: pthread_create@@GLIBC_2.2.5 (pthread_create.c:660) ==528658== by 0x171DEDB3: je_arrow_private_je_pthread_create_wrapper (background_thread.c:48) ==528658== by 0x171DEDB3: background_thread_create_signals_masked (background_thread.c:365) ==528658== by 0x171DEDB3: background_thread_create_locked (background_thread.c:573) ==528658== by 0x171DEFAB: je_arrow_private_je_background_thread_create (background_thread.c:598) ==528658== by 0x4011B89: call_init.part.0 (dl-init.c:72) ==528658== by 0x4011C90: call_init (dl-init.c:30) ==528658== by 0x4011C90: _dl_init (dl-init.c:119) ==528658== by 0x4E75894: _dl_catch_exception (dl-error-skeleton.c:182) ==528658== by 0x40160BE: dl_open_worker (dl-open.c:758) ==528658== by 0x4E75837: _dl_catch_exception (dl-error-skeleton.c:208) ==528658== by 0x40155F9: _dl_open (dl-open.c:837) ==528658== by 0x51FE34B: dlopen_doit (dlopen.c:66) ==528658== ==528658== 1,200 bytes in 3 blocks are possibly lost in loss record 456 of 3,049 ==528658== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==528658== by 0x40149CA: allocate_dtv (dl-tls.c:286) ==528658== by 0x40149CA: _dl_allocate_tls (dl-tls.c:532) ==528658== by 0x5735322: allocate_stack (allocatestack.c:622) ==528658== by 0x5735322: pthread_create@@GLIBC_2.2.5 (pthread_create.c:660) ==528658== by 0x171DD925: je_arrow_private_je_pthread_create_wrapper (background_thread.c:48) ==528658== by 0x171DD925: background_thread_create_signals_masked.constprop.0 (background_thread.c:365) ==528658== by 0x171DEA70: check_background_thread_creation (background_thread.c:410) ==528658== by 0x171DEA70: background_thread0_work (background_thread.c:448) ==528658== by 0x171DEA70: background_work (background_thread.c:490) ==528658== by 0x171DEA70: background_thread_entry (background_thread.c:522) ==528658== by 0x5734608: start_thread (pthread_create.c:477) ==528658== by 0x4E34162: clone (clone.S:95) ==528658== ==528658== 14,359 (32 direct, 14,327 indirect) bytes in 1 blocks are definitely lost in loss record 1,656 of 3,049 ==528658== at 0x483BE63: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==528658== by 0x1542E9CB: arrow::MergedGenerator<arrow::dataset::EnumeratedRecordBatch>::operator()() (in /home/dewey/.r-arrow-dev-build/dist/lib/libarrow_dataset.so.800.0.0) ==528658== by 0x1542ED64: std::_Function_handler<arrow::Future<arrow::dataset::EnumeratedRecordBatch> (), arrow::MergedGenerator<arrow::dataset::EnumeratedRecordBatch> >::_M_invoke(std::_Any_data const&) (in /home/dewey/.r-arrow-dev-build/dist/lib/libarrow_dataset.so.800.0.0) ==528658== by 0x1543136D: std::_Function_handler<arrow::Future<arrow::dataset::EnumeratedRecordBatch> (), arrow::ReadaheadGenerator<arrow::dataset::EnumeratedRecordBatch> >::_M_invoke(std::_Any_data const&) (in /home/dewey/.r-arrow-dev-build/dist/lib/libarrow_dataset.so.800.0.0) ==528658== by 0x1542F462: std::_Function_handler<arrow::Future<nonstd::optional_lite::optional<arrow::compute::ExecBatch> > (), arrow::MappingGenerator<arrow::dataset::EnumeratedRecordBatch, nonstd::optional_lite::optional<arrow::compute::ExecBatch> > >::_M_invoke(std::_Any_data const&) (in /home/dewey/.r-arrow-dev-build/dist/lib/libarrow_dataset.so.800.0.0) ==528658== by 0x163197E8: arrow::compute::(anonymous namespace)::SourceNode::StartProducing()::{lambda()#1}::operator()() const (in /home/dewey/.r-arrow-dev-build/dist/lib/libarrow.so.800.0.0) ==528658== by 0x1631A1AE: arrow::compute::(anonymous namespace)::SourceNode::StartProducing() (in /home/dewey/.r-arrow-dev-build/dist/lib/libarrow.so.800.0.0) ==528658== by 0x16285369: arrow::compute::ExecPlan::StartProducing() (in /home/dewey/.r-arrow-dev-build/dist/lib/libarrow.so.800.0.0) ==528658== by 0x1A97A154: ExecPlan_run(std::shared_ptr<arrow::compute::ExecPlan> const&, std::shared_ptr<arrow::compute::ExecNode> const&, cpp11::r_vector<SEXPREC*>, long) (compute-exec.cpp:92) ==528658== by 0x1A94367B: _arrow_ExecPlan_run (arrowExports.cpp:1587) ==528658== by 0x49414E3: ??? (in /usr/lib/R/lib/libR.so) ==528658== by 0x4941A64: ??? (in /usr/lib/R/lib/libR.so) ==528658== ==528658== LEAK SUMMARY: ==528658== definitely lost: 32 bytes in 1 blocks ==528658== indirectly lost: 14,327 bytes in 144 blocks ==528658== possibly lost: 2,176 bytes in 11 blocks ==528658== still reachable: 171,367,760 bytes in 36,187 blocks ==528658== of which reachable via heuristic: ==528658== newarray : 4,264 bytes in 1 blocks ==528658== suppressed: 0 bytes in 0 blocks ==528658== Reachable blocks (those to which a pointer was found) are not shown. ==528658== To see them, rerun with: --leak-check=full --show-leak-kinds=all ==528658== ==528658== For lists of detected and suppressed errors, rerun with: -s ==528658== ERROR SUMMARY: 8 errors from 8 contexts (suppressed: 0 from 0) {noformat} > [C++] [R] Valgrind error with C-data interface > ---------------------------------------------- > > Key: ARROW-15656 > URL: https://issues.apache.org/jira/browse/ARROW-15656 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, R > Reporter: Jonathan Keane > Priority: Major > > This is currently failing on our valgrind nightly: > {code} > ==10301== by 0x49A2184: bcEval (eval.c:7107) > ==10301== by 0x498DBC8: Rf_eval (eval.c:748) > ==10301== by 0x4990937: R_execClosure (eval.c:1918) > ==10301== by 0x49905EA: Rf_applyClosure (eval.c:1844) > ==10301== Uninitialised value was created by a heap allocation > ==10301== at 0x483E0F0: memalign (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==10301== by 0x483E212: posix_memalign (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==10301== by 0xF4756DF: arrow::(anonymous > namespace)::SystemAllocator::AllocateAligned(long, unsigned char**) > (memory_pool.cc:365) > ==10301== by 0xF475859: arrow::BaseMemoryPoolImpl<arrow::(anonymous > namespace)::SystemAllocator>::Allocate(long, unsigned char**) > (memory_pool.cc:557) > ==10301== by 0xF04192E: GcMemoryPool::Allocate(long, unsigned > char**)::{lambda()#1}::operator()() const (memorypool.cpp:28) > ==10301== by 0xF041EC2: arrow::Status > GcMemoryPool::GcAndTryAgain<GcMemoryPool::Allocate(long, unsigned > char**)::{lambda()#1}>(GcMemoryPool::Allocate(long, unsigned > char**)::{lambda()#1} const&) (memorypool.cpp:46) > ==10301== by 0xF0419A3: GcMemoryPool::Allocate(long, unsigned char**) > (memorypool.cpp:28) > ==10301== by 0xF479EF7: arrow::PoolBuffer::Reserve(long) > (memory_pool.cc:921) > ==10301== by 0xF479FCD: arrow::PoolBuffer::Resize(long, bool) > (memory_pool.cc:945) > ==10301== by 0xF478A74: ResizePoolBuffer<std::unique_ptr<arrow::Buffer>, > std::unique_ptr<arrow::PoolBuffer> > (memory_pool.cc:984) > ==10301== by 0xF478A74: arrow::AllocateBuffer(long, arrow::MemoryPool*) > (memory_pool.cc:992) > ==10301== by 0xF458BAD: arrow::AllocateBitmap(long, arrow::MemoryPool*) > (buffer.cc:174) > ==10301== by 0xF38CC77: arrow::(anonymous > namespace)::ConcatenateBitmaps(std::vector<arrow::(anonymous > namespace)::Bitmap, std::allocator<arrow::(anonymous namespace)::Bitmap> > > const&, arrow::MemoryPool*, std::shared_ptr<arrow::Buffer>*) > (concatenate.cc:81) > ==10301== > test-dataset.R:852:3 [success] > {code} > https://dev.azure.com/ursacomputing/crossbow/_build/results?buildId=19519&view=logs&j=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb&t=d9b15392-e4ce-5e4c-0c8c-b69645229181 > It surfaced with > https://github.com/apache/arrow/commit/858470d928e9ce5098da7ebb1926bb3c74dadff0 > Though it could be from: > https://github.com/apache/arrow/commit/b868090f0f65a2a66bb9c3d7c0f68c5af1a4dff0 > which added some code to make a source node from the C-Data interface. > However, the first call looks like it might be the line > https://github.com/apache/arrow/blob/fa699117091917f0992225aff4e8d4c08910162a/cpp/src/arrow/compute/kernels/vector_selection.cc#L437 > -- This message was sent by Atlassian Jira (v8.20.1#820001)