[
https://issues.apache.org/jira/browse/ARROW-13608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Phillip Cloud updated ARROW-13608:
----------------------------------
Description:
The R bindings for arrow are triggering a segfault when running
{{library(arrow)}}.
After a large amount of investigation by [~jonkeane], [~npr], [~bkietz],
[~apitrou] and myself, we narrowed the problem down to what appears to be
dependence on the order of static initialization.
The order of static initialization in C++ is indeterminate
([https://en.cppreference.com/w/cpp/language/initialization], see the "Dynamic
Initialization" section), which implies that if a {{static A}} depends on a
{{static B}} declared and initialized in another translation unit, it is
perfectly legal for the compiler to initialize {{A}} _before_ {{B}} and thus
trigger undefined behavior due to {{A}} using an uninitialized {{B}}.
This is manifesting as a segmentation fault.
A "prose-level" trace is as follows:
1. The R bindings construct symbols in
[https://github.com/apache/arrow/blob/master/r/src/symbols.cpp#L79].
2. Those binding initialize a number of {{r_vector}} s, with this overload:
[https://github.com/r-lib/cpp11/blob/master/inst/include/cpp11/r_vector.hpp#L363-L369]
3. The overload references the static variable {{preserved}} and calls its
{{insert}} method.
4. {{insert}} dereferences a null pointer here:
[https://github.com/r-lib/cpp11/blob/master/inst/include/cpp11/protect.hpp#L316]
({{list_}} specifically).
I think the solution lies inside of {{cpp11}}, and that is to use the
[Construct on First Use
idiom|https://isocpp.org/wiki/faq/ctors#static-init-order-on-first-use] to
initialize {{preserved}} instead of using {{static struct}} like it does now
([https://github.com/r-lib/cpp11/blob/master/inst/include/cpp11/protect.hpp#L301]).
was:
The R bindings for arrow are triggering a segfault when running
{{library(arrow)}}.
After a large amount of investigation by [~jonkeane], [~npr], [~bkietz],
[~apitrou] and myself, we narrowed the problem down to what appears to be
dependence on the order of static initialization.
The order of static initialization in C++ is indeterminate
([https://en.cppreference.com/w/cpp/language/initialization], see the "Dynamic
Initialization" section), which implies that if a {{static A}} depends on a
{{static B}} declared and initialized in another translation unit, it is
perfectly legal for the compiler to initialize {{A}} _before_ {{B}} and thus
trigger undefined behavior due to {{A}} using an uninitialized {{B}}.
This is manifesting as a segmentation fault.
A "prose-level" trace is as follows:
1. The R bindings construct symbols in
[https://github.com/apache/arrow/blob/master/r/src/symbols.cpp#L79].
2. Those binding initialize a number of {{r_vector}} s, with this overload:
[https://github.com/r-lib/cpp11/blob/master/inst/include/cpp11/r_vector.hpp#L363-L369]
3. The overload references the static variable {{preserved}} and calls its
{{insert}} method.
4. {{insert}} dereferences a null pointer here:
[https://github.com/r-lib/cpp11/blob/master/inst/include/cpp11/protect.hpp#L316]
({{list_}} specifically).
I think the solution lies inside of `cpp11`, and that is to use the [Construct
on First Use
idiom|https://isocpp.org/wiki/faq/ctors#static-init-order-on-first-use] to
initialize `preserved` instead of using `static struct` like it does now
([https://github.com/r-lib/cpp11/blob/master/inst/include/cpp11/protect.hpp#L301]).
> [R] symbol initialization appears to be depending on undefined behavior
> -----------------------------------------------------------------------
>
> Key: ARROW-13608
> URL: https://issues.apache.org/jira/browse/ARROW-13608
> Project: Apache Arrow
> Issue Type: Bug
> Components: R
> Environment: x86_64, linux
> Reporter: Phillip Cloud
> Priority: Major
>
> The R bindings for arrow are triggering a segfault when running
> {{library(arrow)}}.
> After a large amount of investigation by [~jonkeane], [~npr], [~bkietz],
> [~apitrou] and myself, we narrowed the problem down to what appears to be
> dependence on the order of static initialization.
> The order of static initialization in C++ is indeterminate
> ([https://en.cppreference.com/w/cpp/language/initialization], see the
> "Dynamic Initialization" section), which implies that if a {{static A}}
> depends on a {{static B}} declared and initialized in another translation
> unit, it is perfectly legal for the compiler to initialize {{A}} _before_
> {{B}} and thus trigger undefined behavior due to {{A}} using an uninitialized
> {{B}}.
> This is manifesting as a segmentation fault.
> A "prose-level" trace is as follows:
> 1. The R bindings construct symbols in
> [https://github.com/apache/arrow/blob/master/r/src/symbols.cpp#L79].
> 2. Those binding initialize a number of {{r_vector}} s, with this overload:
> [https://github.com/r-lib/cpp11/blob/master/inst/include/cpp11/r_vector.hpp#L363-L369]
> 3. The overload references the static variable {{preserved}} and calls its
> {{insert}} method.
> 4. {{insert}} dereferences a null pointer here:
> [https://github.com/r-lib/cpp11/blob/master/inst/include/cpp11/protect.hpp#L316]
> ({{list_}} specifically).
> I think the solution lies inside of {{cpp11}}, and that is to use the
> [Construct on First Use
> idiom|https://isocpp.org/wiki/faq/ctors#static-init-order-on-first-use] to
> initialize {{preserved}} instead of using {{static struct}} like it does now
> ([https://github.com/r-lib/cpp11/blob/master/inst/include/cpp11/protect.hpp#L301]).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)