[
https://issues.apache.org/jira/browse/ARROW-17252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Neal Richardson resolved ARROW-17252.
-------------------------------------
Fix Version/s: 10.0.0
Resolution: Fixed
Issue resolved by pull request 13773
[https://github.com/apache/arrow/pull/13773]
> [R] Intermittent valgrind failure
> ---------------------------------
>
> Key: ARROW-17252
> URL: https://issues.apache.org/jira/browse/ARROW-17252
> Project: Apache Arrow
> Issue Type: Improvement
> Components: R
> Reporter: Dewey Dunnington
> Assignee: Dewey Dunnington
> Priority: Major
> Labels: pull-request-available
> Fix For: 10.0.0
>
> Time Spent: 12h 40m
> Remaining Estimate: 0h
>
> A number of recent nightly builds have intermittent failures with valgrind,
> which fails because of possibly leaked memory around an exec plan. This seems
> related to a change in XXX that separated {{ExecPlan_prepare()}} from
> {{ExecPlan_run()}} and added a {{ExecPlan_read_table()}} that uses
> {{RunWithCapturedR()}}. The reported leaks vary but include ExecPlans and
> ExecNodes and fields of those objects.
> A failed run:
> https://dev.azure.com/ursacomputing/crossbow/_build/results?buildId=30310&view=logs&j=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb&t=d9b15392-e4ce-5e4c-0c8c-b69645229181&l=24980
> Some example output:
> {noformat}
> ==5249== 14,112 (384 direct, 13,728 indirect) bytes in 1 blocks are
> definitely lost in loss record 1,988 of 3,883
> ==5249== at 0x4849013: operator new(unsigned long) (in
> /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==5249== by 0x10B2902B:
> std::_Function_handler<arrow::Result<arrow::compute::ExecNode*>
> (arrow::compute::ExecPlan*, std::vector<arrow::compute::ExecNode*,
> std::allocator<arrow::compute::ExecNode*> >, arrow::compute::ExecNodeOptions
> const&),
> arrow::compute::internal::RegisterAggregateNode(arrow::compute::ExecFactoryRegistry*)::{lambda(arrow::compute::ExecPlan*,
> std::vector<arrow::compute::ExecNode*,
> std::allocator<arrow::compute::ExecNode*> >, arrow::compute::ExecNodeOptions
> const&)#1}>::_M_invoke(std::_Any_data const&, arrow::compute::ExecPlan*&&,
> std::vector<arrow::compute::ExecNode*,
> std::allocator<arrow::compute::ExecNode*> >&&,
> arrow::compute::ExecNodeOptions const&) (exec_plan.h:60)
> ==5249== by 0xFA83A0C:
> std::function<arrow::Result<arrow::compute::ExecNode*>
> (arrow::compute::ExecPlan*, std::vector<arrow::compute::ExecNode*,
> std::allocator<arrow::compute::ExecNode*> >, arrow::compute::ExecNodeOptions
> const&)>::operator()(arrow::compute::ExecPlan*,
> std::vector<arrow::compute::ExecNode*,
> std::allocator<arrow::compute::ExecNode*> >, arrow::compute::ExecNodeOptions
> const&) const (std_function.h:622)
> ==5249== 14,528 (160 direct, 14,368 indirect) bytes in 1 blocks are
> definitely lost in loss record 1,989 of 3,883
> ==5249== at 0x4849013: operator new(unsigned long) (in
> /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==5249== by 0x10096CB7: arrow::FutureImpl::Make() (future.cc:187)
> ==5249== by 0xFCB6F9A: arrow::Future<arrow::internal::Empty>::Make()
> (future.h:420)
> ==5249== by 0x101AE927: ExecPlanImpl (exec_plan.cc:50)
> ==5249== by 0x101AE927:
> arrow::compute::ExecPlan::Make(arrow::compute::ExecContext*,
> std::shared_ptr<arrow::KeyValueMetadata const>) (exec_plan.cc:355)
> ==5249== by 0xFA77BA2: ExecPlan_create(bool) (compute-exec.cpp:45)
> ==5249== by 0xF9FAE9F: _arrow_ExecPlan_create (arrowExports.cpp:868)
> ==5249== by 0x4953B60: R_doDotCall (dotcode.c:601)
> ==5249== by 0x49C2C16: bcEval (eval.c:7682)
> ==5249== by 0x499DB95: Rf_eval (eval.c:748)
> ==5249== by 0x49A0904: R_execClosure (eval.c:1918)
> ==5249== by 0x49A05B7: Rf_applyClosure (eval.c:1844)
> ==5249== by 0x49B2122: bcEval (eval.c:7094)
> ==5249==
> ==5249== 36,322 (416 direct, 35,906 indirect) bytes in 1 blocks are
> definitely lost in loss record 2,929 of 3,883
> ==5249== at 0x4849013: operator new(unsigned long) (in
> /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==5249== by 0x10214F92: arrow::compute::TaskScheduler::Make()
> (task_util.cc:421)
> ==5249== by 0x101AEA6C: ExecPlanImpl (exec_plan.cc:50)
> ==5249== by 0x101AEA6C:
> arrow::compute::ExecPlan::Make(arrow::compute::ExecContext*,
> std::shared_ptr<arrow::KeyValueMetadata const>) (exec_plan.cc:355)
> ==5249== by 0xFA77BA2: ExecPlan_create(bool) (compute-exec.cpp:45)
> ==5249== by 0xF9FAE9F: _arrow_ExecPlan_create (arrowExports.cpp:868)
> ==5249== by 0x4953B60: R_doDotCall (dotcode.c:601)
> ==5249== by 0x49C2C16: bcEval (eval.c:7682)
> ==5249== by 0x499DB95: Rf_eval (eval.c:748)
> ==5249== by 0x49A0904: R_execClosure (eval.c:1918)
> ==5249== by 0x49A05B7: Rf_applyClosure (eval.c:1844)
> ==5249== by 0x49B2122: bcEval (eval.c:7094)
> ==5249== by 0x499DB95: Rf_eval (eval.c:748)
> {noformat}
> We also occasionally get leaked Schemas, and in one case a leaked InputType
> that seemed completely unrelated to the other leaks (ARROW-17225).
> I'm wondering if these have to do with references in lambdas that get passed
> by reference? Or perhaps a cache issue? There were some instances in previous
> leaks where the backtrace to the {{new}} allocator was different between
> reported leaks.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)